# Temporal Graph Analysis

## Prerequisites

First, we need to set up our Python environment that has PyTorch, PyTorch Geometric and PathpyG installed. Depending on where you are executing this notebook, this might already be (partially) done. E.g. Google Colab has PyTorch installed by default so we only need to install the remaining dependencies. The DevContainer that is part of our GitHub Repository on the other hand already has all of the necessary dependencies installed. 

In the following, we install the packages for usage in Google Colab using Jupyter magic commands. For other environments comment in or out the commands as necessary. For more details on how to install `pathpyG` especially if you want to install it with GPU-support, we refer to our [documentation](https://www.pathpy.net/dev/getting_started/). Note that `%%capture` discards the full output of the cell to not clutter this tutorial with unnecessary installation details. If you want to print the output, you can comment `%%capture` out.

In [None]:
%%capture
# !pip install torch
!pip install torch_geometric
!pip install git+https://github.com/pathpy/pathpyG.git

## Motivation and Learning Objectives

In this tutorial we will introduce the representation of temporal graph data using the `TemporalGraph` class and how such data can be used to calculate shortest time respecting paths between nodes as well temporal node cemtralities.

In [1]:
import torch
from torch_geometric.data import Data
import pathpyG as pp
import pandas as pd

pp.config['torch']['device'] = 'cpu'

We can create a temporal graph object from a list of time-stamped edges. Since `TemporalGraph` is a subclass of the `Graph` class, the internal structures are very similar:

In [2]:
tedges = [('a', 'b', 1),('a', 'b', 2), ('b', 'a', 3), ('b', 'c', 3), ('d', 'c', 4), ('a', 'b', 4), ('c', 'b', 4),
              ('c', 'd', 5), ('b', 'a', 5), ('c', 'b', 6)]
t = pp.TemporalGraph.from_edge_list(tedges)
print(t.mapping)
print(t.n)
print(t.m)

a -> 0
b -> 1
c -> 2
d -> 3

4
10


By default, all temporal graphs are directed. We can create an undirected version a temporal graph as follows:

In [3]:
x = t.to_undirected()
print(x.mapping)
print(x.n)
print(x.m)

a -> 0
b -> 1
c -> 2
d -> 3

4
20


We can also directly create a temporal graph from an instance of `pyG.TemporalData`

In [4]:
td = Data(
    edge_index = torch.Tensor([[0,1,2,0],[1,2,3,1]]).long(),
    time = torch.Tensor([0,1,2,3])
)
print(td)
t2 = pp.TemporalGraph(td)
print(t2)

Data(edge_index=[2, 4], time=[4])
Temporal Graph with 4 nodes, 3 unique edges and 4 events in [0.0, 3.0]
{'Edge Attributes': {}, 'Graph Attributes': {}, 'Node Attributes': {}}




We can restrict a temporal graph to a time window, which returns a temporal graph that only contains time-stamped edges in the given time interval.

In [12]:
t1 = t.get_window(0,4)
print(t1)
print(t1.m)
print(t1.start_time)
print(t1.end_time)

Temporal Graph with 4 nodes, 5 unique edges and 7 events in [1.0, 4.0]
{'Edge Attributes': {}, 'Graph Attributes': {'num_nodes': "<class 'int'>"}, 'Node Attributes': {}}
7
1.0
4.0


We can also extract a TemporalGraph object for a batch of temporal edges, which is defined by the start and end index of the edges defining the batch.

In [13]:
t1 = t.get_batch(1,6)
print(t1)
print(t1.m)
print(t1.start_time)
print(t1.end_time)

Temporal Graph with 4 nodes, 4 unique edges and 5 events in [2.0, 4.0]
{'Edge Attributes': {}, 'Graph Attributes': {}, 'Node Attributes': {}}
5
2.0
4.0


We can easily convert a temporal graph into a weighted time-aggregated static graph, where edge weights count the number of occurrences of an edge across all timestamps.

In [14]:
g = t.to_static_graph(weighted=True)
print(g)

Directed graph with 4 nodes and 6 edges
{'Edge Attributes': {'edge_weight': "<class 'torch.Tensor'> -> torch.Size([6])"}, 'Graph Attributes': {'num_nodes': "<class 'int'>"}, 'Node Attributes': {}}


We can also aggregate a temporal graph within a certain time window:

In [15]:
g = t.to_static_graph(time_window=(1, 3), weighted=True)
print(g)

Directed graph with 2 nodes and 1 edges
{'Edge Attributes': {'edge_weight': "<class 'torch.Tensor'> -> torch.Size([1])"}, 'Graph Attributes': {'num_nodes': "<class 'int'>"}, 'Node Attributes': {}}


Finally, we can use the class `RollingTimeWindow` to perform a rolling window analysis. The class returns an iterable object, where each iteration yields a time-aggregated weighted graph object as well as the corresponding time window.

In [16]:
r = pp.algorithms.RollingTimeWindow(t, window_size=3, step_size=1, return_window=True)
for g, w in r:
    print('Time window ', w)
    print(g)
    print(g.data.edge_index)
    print('---')

Time window  (1.0, 4.0)
Directed graph with 3 nodes and 3 edges
{'Edge Attributes': {'edge_weight': "<class 'torch.Tensor'> -> torch.Size([3])"}, 'Graph Attributes': {'num_nodes': "<class 'int'>"}, 'Node Attributes': {}}
EdgeIndex([[0, 1, 1],
           [1, 0, 2]], sparse_size=(3, 3), nnz=3, sort_order=row)
---
Time window  (2.0, 5.0)
Directed graph with 4 nodes and 5 edges
{'Edge Attributes': {'edge_weight': "<class 'torch.Tensor'> -> torch.Size([5])"}, 'Graph Attributes': {'num_nodes': "<class 'int'>"}, 'Node Attributes': {}}
EdgeIndex([[0, 1, 1, 2, 3],
           [1, 0, 2, 1, 2]], sparse_size=(4, 4), nnz=5, sort_order=row)
---
Time window  (3.0, 6.0)
Directed graph with 4 nodes and 6 edges
{'Edge Attributes': {'edge_weight': "<class 'torch.Tensor'> -> torch.Size([6])"}, 'Graph Attributes': {'num_nodes': "<class 'int'>"}, 'Node Attributes': {}}
EdgeIndex([[0, 1, 1, 2, 2, 3],
           [1, 0, 2, 1, 3, 2]], sparse_size=(4, 4), nnz=6, sort_order=row)
---
Time window  (4.0, 7.0)
Directe

We can visualize temporal graphs using the plot function just like static graphs:

In [17]:
pp.plot(t, node_label=t.nodes, edge_color='lightgray');

The source nodes, destination nodes and timestamps of time-stamped edges are stored as a `pyG TemporalData` object, which we can access in the following way.

In [18]:
t.data

Data(edge_index=[2, 10], time=[10], num_nodes=4)

In [19]:
print(t.data.edge_index)

EdgeIndex([[0, 0, 1, 1, 3, 0, 2, 2, 1, 2],
           [1, 1, 0, 2, 2, 1, 1, 3, 0, 1]], sparse_size=(4, 4), nnz=10)


In [20]:
print(t.data.time)

tensor([1., 2., 3., 3., 4., 4., 4., 5., 5., 6.])


With the generator functions `edges` and `temporal_edges` we can iterate through the time-ordered (temporal) multi-edges of a temporal graph.

In [21]:
for v, w in t.edges:
    print(v, w)

a b
a b
b a
b c
d c
a b
c b
c d
b a
c b


In [22]:
for v, w, time in t.temporal_edges:
    print(v, w, time)

a b 1.0
a b 2.0
b a 3.0
b c 3.0
d c 4.0
a b 4.0
c b 4.0
c d 5.0
b a 5.0
c b 6.0


## Extracting Time-Respecting Paths in Temporal Networks

We are often interested in time-respecting paths in a temporal graph. A time-respecting path consists of a sequence of nodes $v_0,...,v_l$ where consecutive nodes are connected by time-stamped edges that occur (i) in the right temporal ordering, and (ii) within a maximum time difference of $\delta\in \N$. 

To calculate time-respecting paths in a temporal graph, we can construct a directed acyclic graph (DAG), where each time-stamped edge $(u,v;t)$ in the temporal graph is represented by a node and two nodes representing time-stamped edges $(u,v;t_1)$ and $(v,w;t_2)$ are connected by an edge iff $0 < t_2-t_1 \leq \delta$. This implies that (i) each edge in the resulting DAG represents a time-respecting path of length two, and (ii) time-respecting paths of any lenghts are represented by paths in this DAG.

We can construct such a DAG using the function `pp.algorithms.lift_order_temporal`, which returns an edge_index. We can pass this to the constructor of a `Graph` object, which we can use to visualize the resulting DAG.

In [23]:
e_i = pp.algorithms.lift_order_temporal(t, delta=1)
dag = pp.Graph.from_edge_index(e_i)
pp.plot(dag, node_label = [f'{v}-{w}-{time}' for v, w, time in t.temporal_edges]);

100%|██████████| 6/6 [00:00<00:00, 2649.31it/s]


For $\delta=1$, this DAG with three connected components tells us that the underlying temporal graph has  the following time-respecting paths (of different lengths):

Length one:  
    a -> b  
    b -> a  
    b -> c  
    c -> b  
    c -> d  
    d -> c  

Length two:  
    a -> b -> a (twice, starting at time 2 and time 4)  
    b -> a -> b  
    a -> b -> c     
    b -> c -> b  
    c -> b -> a  
    d -> c -> d  

Length three:   
    a -> b -> a -> b  
    b -> a -> b -> a  
    a -> b -> c -> b  
    b -> c -> b -> a  
    
Length four:   
    a -> b -> a -> b -> a  
    a -> b -> c -> b -> a  

We can can use the function `pp.algorithms.temporal.temporal_shortest_paths` to calculate shortest time-respecting path distances between any pair of nodes. This also returns a predecessor matrix, which can be used to reconstruct all shortest time-respecting paths (in analogy to the Dijkstra algorithm for static graphs):

In [24]:
dist, pred = pp.algorithms.temporal.temporal_shortest_paths(t, delta=1)
print(t.mapping)
print(dist)
print(pred)

100%|██████████| 6/6 [00:00<00:00, 2528.47it/s]

a -> 0
b -> 1
c -> 2
d -> 3

[[ 0.  1.  2. inf]
 [ 1.  0.  1. inf]
 [ 2.  1.  0.  1.]
 [inf inf  1.  0.]]
[[ 0  0  1 -1]
 [ 1  1  1 -1]
 [ 1  2  2  2]
 [-1 -1  3  3]]





In the example above, the four `inf` values indicate that there is no time-respecting paths between the four node pairs (a, d), (b, d), (d,a) and (d, b). This is not something we would expect based on the (strongly connected) topology of the time-aggregated graph, which is shown below:

In [25]:
g = t.to_static_graph(weighted=True)
pp.plot(g, node_label=g.mapping.node_ids.tolist());

## Reading and writing temporal graph data

In [26]:
tedges = [('a', 'b', 1),('a', 'b', 2), ('b', 'a', 3), ('b', 'c', 3), ('d', 'c', 4), ('a', 'b', 4), ('c', 'b', 4),
              ('c', 'd', 5), ('b', 'a', 5), ('c', 'b', 6)]
t = pp.TemporalGraph.from_edge_list(tedges)
df = pp.io.temporal_graph_to_df(t)
print(df)

   v  w    t
0  c  b  6.0
1  b  a  5.0
2  c  d  5.0
3  c  b  4.0
4  a  b  4.0
5  d  c  4.0
6  b  c  3.0
7  b  a  3.0
8  a  b  2.0
9  a  b  1.0


In [27]:
t = pp.io.df_to_temporal_graph(df)
print(t)

Temporal Graph with 4 nodes, 6 unique edges and 10 events in [1.0, 6.0]
{'Edge Attributes': {}, 'Graph Attributes': {'num_nodes': "<class 'int'>"}, 'Node Attributes': {}}


In [28]:
df = pd.DataFrame([['a', 'b', 1], ['b', 'c', 2], ['a', 'c', 3]])
print(df)
t = pp.io.df_to_temporal_graph(df)
print(t)

   0  1  2
0  a  b  1
1  b  c  2
2  a  c  3
Temporal Graph with 3 nodes, 3 unique edges and 3 events in [1.0, 3.0]
{'Edge Attributes': {}, 'Graph Attributes': {'num_nodes': "<class 'int'>"}, 'Node Attributes': {}}


In [29]:
pp.io.write_csv(t, '../data/test_temporal_graph.csv')

In [30]:
t = pp.io.read_csv_temporal_graph('../data/test_temporal_graph.csv')
print(t)

Temporal Graph with 3 nodes, 6 unique edges and 6 events in [1.0, 3.0]
{'Edge Attributes': {}, 'Graph Attributes': {}, 'Node Attributes': {}}


## Temporal Centralities in Empirical Temporal Networks

`pathpyG`'s ability to calculate (shortest) time-respecting paths enables us to calulate different notions of temporal centralities for nodes in empirial temporal networks. We can read an empirical temporal graph based on CSV data, where each line contains the source, target, and timestamp of an edge as comma-separated value:

In [31]:
t_ants = pp.io.read_csv_temporal_graph('../data/ants_1_1.tedges', header=False)
print(t_ants)

Temporal Graph with 89 nodes, 1298 unique edges and 3822 events in [0.0, 1438.0]
{'Edge Attributes': {}, 'Graph Attributes': {}, 'Node Attributes': {}}


To calculate the temporal closeness centrality, which is defined based on the length of shortest time-respecting paths of a node to all other nodes, we can write the following: 

In [32]:
cl = pp.algorithms.centrality.temporal_closeness_centrality(t_ants, delta=60)
print(cl)
mx = max(cl.values())
mn = min(cl.values())
node_size = { v: 50*(x/(mx-mn)) for v, x in cl.items() }
pp.plot(t_ants, node_size=node_size, edge_color='red', edge_size=4);

100%|██████████| 883/883 [00:00<00:00, 3322.84it/s]


{'GBGR': 3024.615873015872, 'GBGW': 2999.502564102565, 'GBG_': 2928.756043956045, 'GGGG': 2859.2000000000007, 'GGGR': 2811.4253968253956, 'GGRR': 4383.866666666668, 'GGRY': 3304.400000000001, 'GGWW': 3418.311111111112, 'GGWY': 5092.2666666666655, 'GGW_': 4230.076190476191, 'GGYW': 2556.571916971917, 'GG_W': 3788.4000000000015, 'GRBR': 2977.3714285714286, 'GRGY': 3039.911111111111, 'GRWG': 4162.714285714285, 'GRYY': 2736.625396825396, 'GR_Y': 3113.7333333333336, 'GR_Y2': 4416.133333333335, 'GR__': 3305.8666666666663, 'GWRG': 3379.2000000000003, 'GYGG': 3321.2977777777783, 'GYYY': 2301.3777777777777, 'GY__': 2260.066666666665, 'G_GW': 3525.866666666667, 'G_R_': 4034.800000000001, 'G_W_': 3010.4380952380957, 'G___': 3100.533333333335, 'G___big': 2068.308913308913, 'G___small': 2351.7999999999993, 'Q': 4177.311111111112, 'RWGY': 3708.5714285714307, 'RWWG': 3030.488888888889, 'WBGG': 3781.0666666666675, 'WBGW': 3166.742857142857, 'WBYG': 2668.5999999999995, 'WGBB': 3440.171428571429, 'WGGB'

The definition of time-respecting paths depends on our maximum time difference parameter $\delta$, which implies that different values of this parameter also yield different centralities. This means that we can calculate temporal node centralities for different "time scales" of a temporal graph.

In [33]:
cl = pp.algorithms.centrality.temporal_closeness_centrality(t_ants, delta=20)
print(cl)
mx = max(cl.values())
mn = min(cl.values())
node_size = { v: 50*(x/(mx-mn)) for v, x in cl.items() }
pp.plot(t_ants, node_size=node_size, edge_color='red', edge_size=4);

100%|██████████| 883/883 [00:00<00:00, 3573.09it/s]


{'GBGR': 2207.0092796092795, 'GBGW': 1574.1780861656403, 'GBG_': 2245.6238095238077, 'GGGG': 2065.6316957552262, 'GGGR': 2052.219608851188, 'GGRR': 3774.042140822139, 'GGRY': 2541.641269841269, 'GGWW': 2203.5236655224166, 'GGWY': 4598.104761904761, 'GGW_': 3827.514285714286, 'GGYW': 1998.2796889101228, 'GG_W': 3159.310780722547, 'GRBR': 2252.3125606823146, 'GRGY': 1961.2073260073257, 'GRWG': 3434.115731505299, 'GRYY': 1769.6720992921016, 'GR_Y': 2458.6663362781, 'GR_Y2': 3750.974603174604, 'GR__': 2287.758730158729, 'GWRG': 2460.425432737198, 'GYGG': 2512.0908424908425, 'GYYY': 1082.0915750915751, 'GY__': 1348.5477329496612, 'G_GW': 2936.1425267542913, 'G_R_': 3429.2739926739923, 'G_W_': 2107.469050754098, 'G___': 2127.2349206349195, 'G___big': 440.0, 'G___small': 1357.2765347758534, 'Q': 3506.6336134453786, 'RWGY': 2859.332112332112, 'RWWG': 2268.550438842203, 'WBGG': 2896.1205924510273, 'WBGW': 2433.333613445378, 'WBYG': 1977.0355670473316, 'WGBB': 2559.064886091109, 'WGGB': 2960.019

We can also calculate the temporal betweenness centrality, which is based on the number of shortest time-respecting paths between pairs of nodes that pass through a given node. Again, this centrality score is sensitive to the time scale parameter $\delta$.

In [34]:
bw = pp.algorithms.centrality.temporal_betweenness_centrality(t_ants, delta=60)
print(bw)
mx = max(bw.values())
mn = min(bw.values())
node_size = { v: 50*(x/(mx-mn)) for v, x in bw.items() }
pp.plot(t_ants, node_size=node_size, edge_color='red', edge_size=4);

100%|██████████| 883/883 [00:00<00:00, 3475.66it/s]
100%|██████████| 89/89 [00:03<00:00, 27.00it/s]


defaultdict(<function temporal_betweenness_centrality.<locals>.<lambda> at 0x7fdf20a03640>, {'GBGR': 20.192212842427075, 'WGBB': 194.79309833949083, 'WRBB': 365.05013895123216, 'G___': 57.80632230642795, '_WYW': 427.3675796732633, '____topright': 10.760025336141013, 'YYGGmid': 663.4450747248108, '__W_': 46.375038482274306, 'GG_W': 292.080282213412, '_W__': 428.33207844162627, 'WBGW': 71.89857589572318, 'GR__': 64.19763457586924, 'RWWG': 83.25286865190012, '____almost': 134.89554914241063, 'GGGG': 78.97281365765863, 'YYGW': 319.0275353843234, 'YWGW': 84.52883527311734, 'YYWR': 139.62781952597214, 'YYRG': 30.06998326775235, 'WRWR': 130.49639766193505, 'WYGG': 274.1787721126864, 'GGYW': 9.54474098670821, 'GYGG': 206.18561458924665, 'GGWY': 1080.0676471347801, 'G_W_': 30.258200815352502, '_W_Y': 88.46660448533098, '_R__': 724.7535183751095, 'WBGG': 184.56469995815524, 'Y___': 124.00746952439442, '____brood': 261.05135853722055, 'WRRY': 227.56794665132307, '_WWY': 258.17798469631225, 'Y_WY'

In [35]:
bw = pp.algorithms.centrality.temporal_betweenness_centrality(t_ants, delta=20)
print(bw)
mx = max(bw.values())
mn = min(bw.values())
node_size = { v: 50*(x/(mx-mn)) for v, x in bw.items() }
pp.plot(t_ants, node_size=node_size, edge_color='red', edge_size=4);

100%|██████████| 883/883 [00:00<00:00, 3813.91it/s]
100%|██████████| 89/89 [00:01<00:00, 69.84it/s]


defaultdict(<function temporal_betweenness_centrality.<locals>.<lambda> at 0x7fdf20ed9b40>, {'GBGR': 271.0549203258132, 'YY__': 260.787489965592, '_WYG': 394.1302318596431, '____right': 28.503246753246753, 'Q': 1457.263110413932, 'GGRR': 393.6459799032444, 'GGGR': 201.48208556149757, 'YYGGright': 534.66060053214, 'YY_R': 266.91765439796467, 'Y___': 172.67107899804276, '____corner': 551.646068862101, 'YGWW': 1201.1725191494913, 'WG_R': 512.4899073911774, '_WGG': 662.2060823813118, 'YWGW': 71.94722222222224, '____bm': 520.5037887625123, '__BB': 146.4024424420817, '_Y__': 116.65703669247497, 'WBYG': 337.50697274935305, 'YYWR': 346.0394146748694, '_W_Y': 126.7397860593513, 'WBGG': 100.16123039327377, 'Y_WY': 709.6990969778257, '_W__': 888.9702395101842, 'GGWW': 252.45108178976466, 'RWGY': 400.3435250525273, 'WR__': 324.7003673342413, 'G_R_': 74.0687563696638, '____bot': 18.876190476190477, '____pale': 1160.042127920021, '____almost': 586.5742800306991, '____topright': 6.242857142857144, 'Y