Temporal Graph Analysis¶
Prerequisites¶
First, we need to set up our Python environment that has PyTorch, PyTorch Geometric and PathpyG installed. Depending on where you are executing this notebook, this might already be (partially) done. E.g. Google Colab has PyTorch installed by default so we only need to install the remaining dependencies. The DevContainer that is part of our GitHub Repository on the other hand already has all of the necessary dependencies installed.
In the following, we install the packages for usage in Google Colab using Jupyter magic commands. For other environments comment in or out the commands as necessary. For more details on how to install pathpyG
especially if you want to install it with GPU-support, we refer to our documentation. Note that %%capture
discards the full output of the cell to not clutter this tutorial with unnecessary installation details. If you want to print the output, you can comment %%capture
out.
%%capture
# !pip install torch
!pip install torch_geometric
!pip install git+https://github.com/pathpy/pathpyG.git
Motivation and Learning Objectives¶
In this tutorial we will introduce the representation of temporal graph data in the Temporal Graph
class and how such data can be used to calculate time respecting paths.
import torch
from torch_geometric.data import TemporalData
import pathpyG as pp
pp.config['torch']['device'] = 'cpu'
We can create a temporal graph object from a list of time-stamped edges. Since TemporalGraph is a subclass of the Graph
class, the inernal structures are very similar:
tedges = [('a', 'b', 1),('a', 'b', 2), ('b', 'a', 3), ('b', 'c', 3), ('d', 'c', 4), ('a', 'b', 4), ('c', 'b', 4),
('c', 'd', 5), ('b', 'a', 5), ('c', 'b', 6)]
t = pp.TemporalGraph.from_edge_list(tedges)
print(t.mapping)
print(t.N)
print(t.M)
a -> 0 b -> 1 c -> 2 d -> 3 4 10
By default, all temporal graphs are directed. We can create an undirected version a temporal graph as follows:
x = t.to_undirected()
print(x.mapping)
print(x.N)
print(x.M)
a -> 0 b -> 1 c -> 2 d -> 3 4 20
We can also create a temporal graph from an instance of TemporalData
td = TemporalData(
src = torch.Tensor([0,1,2,0]),
dst = torch.Tensor([1,2,3,1]),
t = torch.Tensor([0,1,2,3]))
print(td)
t2 = pp.TemporalGraph(td)
print(t2)
TemporalData(src=[4], dst=[4], t=[4]) Temporal Graph with 4 nodes, 3 unique edges and 4 events in [0.0, 3.0] Graph attributes src <class 'torch.Tensor'> -> torch.Size([4]) t <class 'torch.Tensor'> -> torch.Size([4]) dst <class 'torch.Tensor'> -> torch.Size([4])
/opt/conda/lib/python3.10/site-packages/torch_geometric/data/storage.py:450: UserWarning: Unable to accurately infer 'num_nodes' from the attribute set '{'src', 't', 'dst'}'. Please explicitly set 'num_nodes' as an attribute of 'data' to suppress this warning warnings.warn(
We can restrict a temporal graph to a time window, which will return a temporal graph that only contains time-stamped edges in the given time interval.
t1 = t.get_window(0,4)
print(t1)
print(t1.start_time)
print(t1.end_time)
Temporal Graph with 3 nodes, 3 unique edges and 4 events in [1.0, 3.0] Graph attributes src <class 'torch.Tensor'> -> torch.Size([4]) t <class 'torch.Tensor'> -> torch.Size([4]) dst <class 'torch.Tensor'> -> torch.Size([4]) 1.0 3.0
/opt/conda/lib/python3.10/site-packages/torch_geometric/data/storage.py:450: UserWarning: Unable to accurately infer 'num_nodes' from the attribute set '{'src', 't', 'dst'}'. Please explicitly set 'num_nodes' as an attribute of 'data' to suppress this warning warnings.warn(
We can convert a temporal graph into a weighted time-aggregated static graph:
g = t.to_static_graph(weighted=True)
print(g)
Undirected graph with 4 nodes and 6 (directed) edges Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([6]) Graph attributes num_nodes <class 'int'>
We can also aggregate the temporal graph in a given time window:
g = t.to_static_graph(time_window=(1, 3), weighted=True)
print(g)
Directed graph with 2 nodes and 1 edges Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([1]) Graph attributes num_nodes <class 'int'>
Finally, we can perform a rolling window analysis:
r = pp.algorithms.RollingTimeWindow(t, 3, 1, return_window=True)
for g, w in r:
print('Time window ', w)
print(g)
print(g.data.edge_index)
print('---')
Time window (1.0, 4.0) Directed graph with 3 nodes and 3 edges Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([3]) Graph attributes num_nodes <class 'int'> EdgeIndex([[0, 1, 1], [1, 2, 0]], device='cuda:0', sparse_size=(3, 3), nnz=3, sort_order=row) --- Time window (2.0, 5.0) Directed graph with 4 nodes and 5 edges Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([5]) Graph attributes num_nodes <class 'int'> EdgeIndex([[0, 1, 1, 2, 3], [1, 2, 0, 1, 2]], device='cuda:0', sparse_size=(4, 4), nnz=5, sort_order=row) --- Time window (3.0, 6.0) Undirected graph with 4 nodes and 6 (directed) edges Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([6]) Graph attributes num_nodes <class 'int'> EdgeIndex([[0, 1, 1, 2, 2, 3], [1, 2, 0, 3, 1, 2]], device='cuda:0', sparse_size=(4, 4), nnz=6, sort_order=row) --- Time window (4.0, 7.0) Directed graph with 4 nodes and 5 edges Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([5]) Graph attributes num_nodes <class 'int'> EdgeIndex([[0, 1, 2, 2, 3], [1, 0, 1, 3, 2]], device='cuda:0', sparse_size=(4, 4), nnz=5, sort_order=row) --- Time window (5.0, 8.0) Directed graph with 4 nodes and 3 edges Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([3]) Graph attributes num_nodes <class 'int'> EdgeIndex([[1, 2, 2], [0, 3, 1]], device='cuda:0', sparse_size=(4, 4), nnz=3, sort_order=row) --- Time window (6.0, 9.0) Directed graph with 3 nodes and 1 edges Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([1]) Graph attributes num_nodes <class 'int'> EdgeIndex([[2], [1]], device='cuda:0', sparse_size=(3, 3), nnz=1, sort_order=row) ---
We can visualize a temporal graph by using the pathpyG plot function.
pp.plot(t, edge_color='lightgray');
Consistent with pyG
the sources, destinations and timestamps are stored as a pyG TemporalData
object, which we can access in the following way.
t.data
TemporalData(src=[10], dst=[10], t=[10])
print(t.data.t)
tensor([1., 2., 3., 3., 4., 4., 4., 5., 5., 6.], device='cuda:0')
With the generator functions edges
and temporal_edges
we can iterate through the (temporal) edges of this graph.
for v, w in t.edges:
print(v, w)
a b a b b a b c d c a b c b c d b a c b
for v, w, time in t.temporal_edges:
print(v, w, time)
a b 1.0 a b 2.0 b a 3.0 b c 3.0 d c 4.0 a b 4.0 c b 4.0 c d 5.0 b a 5.0 c b 6.0
Extracting Time-Respecting Paths¶
We are often interested in time-respecting paths in a temporal graph. A time-respecting path is defined as a sequence of nodes $v_0,...,v_l$ where the corresponding edges occur in the right temporal ordering and with a maximum time difference of $\delta\in \N$.
To calculate time-respecting paths in a temporal graph, we can construct a directed acyclic graph (DAG), where each node is a time-stamped edge $(u,v;t)$ and two nodes representing time-stamped edges $(u,v;t_1)$ and $(v,w;t_2)$ are connected by a en edge iff $0 < t_2-t_1 \leq \delta$.
For the toy example above, we can construct such a DAG as follows:
e_i = pp.algorithms.lift_order_temporal(t, delta=1)
dag = pp.Graph.from_edge_index(e_i)
pp.plot(dag, node_label = [f'{v}-{w}-{time}' for v, w, time in t.temporal_edges]);
100%|██████████| 6/6 [00:00<00:00, 430.72it/s]
For $\delta=1$, this DAG with three connected components tells us that there are the following time-respecting paths:
Length one:
a -> b
b -> a
b -> c
c -> b
c -> d
d -> c
Length two:
a -> b -> a (twice)
b -> a -> b
a -> b -> c
b -> c -> b
c -> b -> a
d -> c -> d
Length three:
a -> b -> a -> b
b -> a -> b -> a
a -> b -> c -> b
b -> c -> b -> a
Length four:
a -> b -> a -> b -> a
a -> b -> c -> b -> a
We can use the function pp.algorithms.temporal.temporal_shortest_paths
to calculate (shortest) time-respecting path distances between any pair of nodes:
dist, pred = pp.algorithms.temporal.temporal_shortest_paths(t, delta=1)
print(t.mapping)
print(dist)
print(pred)
100%|██████████| 6/6 [00:00<00:00, 524.19it/s]
a -> 0 b -> 1 c -> 2 d -> 3 [[ 0. 1. 2. inf] [ 1. 0. 1. inf] [ 2. 1. 0. 1.] [inf inf 1. 0.]] [[ 8 0 3 -9999] [ 2 5 3 -9999] [ 8 6 -9999 7] [-9999 -9999 4 7]]
In the example above, we find that there is no time-respecting paths between the node pairs (a, d) and (b, d) and (d,a) and (d, b). The shortest temporal path from a to c requires two steps (a, b, c).
Higher-Order De Bruijn Graph Models for Time-Respecting Paths¶
m = pp.MultiOrderModel.from_temporal_graph(t, delta=1, max_order=4)
print(m.layers[1])
print(m.layers[2])
print(m.layers[3])
print(m.layers[4])
Undirected graph with 4 nodes and 6 (directed) edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([4, 1]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([6]) Graph attributes num_nodes <class 'int'> Directed graph with 6 nodes and 6 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([6, 2]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([6]) Graph attributes num_nodes <class 'int'> Directed graph with 6 nodes and 4 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([6, 3]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([4]) Graph attributes num_nodes <class 'int'> Directed graph with 4 nodes and 2 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([4, 4]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([2]) Graph attributes num_nodes <class 'int'>
pp.plot(m.layers[1], node_label=[v for v in m.layers[1].nodes])
<pathpyG.visualisations.network_plots.StaticNetworkPlot at 0x7fb838a93490>
pp.plot(m.layers[2], node_label=[v for v in m.layers[2].nodes])
<pathpyG.visualisations.network_plots.StaticNetworkPlot at 0x7fb838a92410>
pp.plot(m.layers[3], node_label=[v for v in m.layers[3].nodes])
<pathpyG.visualisations.network_plots.StaticNetworkPlot at 0x7fb83807d930>
pp.plot(m.layers[4], node_label=[v for v in m.layers[4].nodes]);
Analysis of empirical temporal graphs¶
We can read temporal graphs from CSV files that contain the source, target, and time-stamps of edges in each line:
t = pp.TemporalGraph.from_csv('../data/ants_1_1.tedges')
print(t)
Temporal Graph with 89 nodes, 947 unique edges and 1911 events in [0.0, 1438.0] Graph attributes src <class 'torch.Tensor'> -> torch.Size([1911]) t <class 'torch.Tensor'> -> torch.Size([1911]) dst <class 'torch.Tensor'> -> torch.Size([1911])
/opt/conda/lib/python3.10/site-packages/torch_geometric/data/storage.py:450: UserWarning: Unable to accurately infer 'num_nodes' from the attribute set '{'src', 't', 'dst'}'. Please explicitly set 'num_nodes' as an attribute of 'data' to suppress this warning warnings.warn(
paths = pp.algorithms.temporal_shortest_paths(t, delta=30)
100%|██████████| 883/883 [00:01<00:00, 619.46it/s]
bw = pp.algorithms.centrality.temporal_closeness_centrality(t, delta=30)
print(bw)
100%|██████████| 883/883 [00:00<00:00, 6108.69it/s]
{'Y_WY': 2789.0803418803416, 'WRBB': 2785.9333333333334, 'WRR_': 1081.142857142857, 'GGRR': 2852.965481577247, 'WG_R': 3049.515262515263, 'YWGW': 1545.3777777777777, 'YYGG': 2394.542857142857, 'GBGR': 1414.4860805860799, 'GRWG': 2171.3650793650795, 'YY_W': 2014.3365297301218, 'G_W_': 1371.3907285811308, 'G___small': 557.3333333333333, 'YW__': 176.0, 'YYGGmid': 3332.9682539682535, '____brood': 1977.3765567765563, 'GGWY': 3453.2349206349204, 'G_R_': 1433.5575091575088, '____topleft': 2165.771184371184, 'G_GW': 2422.661330173096, 'GYYY': 718.2158730158731, 'WBGW': 1971.4003597425594, 'Y__W': 982.6666666666667, 'YYYY': 413.6, 'GWRG': 1933.1047619047617, 'YGWW': 3152.5114774114777, 'WRWR': 2211.0761904761907, 'G___big': 352.0, 'GY__': 848.6708180708182, 'WRRY': 2440.7111111111117, 'GR_Y': 293.3333333333333, 'YWW_': 1767.2394383394376, '_W__': 2656.585103785103, '__W_': 1944.5428571428565, 'WBYG': 1343.583516483516, 'YGWY': 1660.7412698412697, '_R__': 2997.5174603174605, '____pale': 3136.1523809523806, 'WGBB': 2144.4600732600734, 'GR__': 1512.5449625320214, 'YYWR': 1731.1409394527036, '____bot': 1001.3499155127638, 'GR_Y2': 3175.9269841269834, '_WGG': 1958.2253968253958, 'W___': 1929.4289736407375, 'YWWW': 398.93333333333334, 'YYRB': 1740.7172161172157, '_WWY': 2212.292063492063, 'WGGB': 2359.0984126984126, 'Q': 2621.7873015873024, 'GRGY': 1126.5599870717517, '____corner': 2253.777777777778, 'YYRG': 1095.7428571428572, 'GGW_': 2330.4800976800975, 'GYGG': 1530.629914529915, '_Y__': 1652.9277938435825, 'WGWB': 2534.8727716727713, '__BB': 1730.7680868838759, '____bm': 1064.8000000000002, 'YY__': 1689.1285211520506, 'GRBR': 1754.9898656898652, '_WWW': 1424.7478182636075, '_WYG': 1981.0838827838822, 'GG_W': 2184.193650793651, 'WR__': 934.9978021978022, 'YY_R': 2112.4549450549443, 'GGGR': 1742.8245689638877, 'GRYY': 1162.6810654086196, 'GGWW': 1534.1047619047617, '_WYW': 2915.5587301587298, 'YYGGright': 1983.2628815628814, '____right': 1480.7504026477104, 'GBGW': 1504.3851037851034, 'WYGG': 1790.5627772296498, 'Y___': 1197.9579525737417, 'YYY_': 704.0, 'WBGG': 1990.8162393162388, 'RWGY': 1701.3416720534362, '____almost': 1693.4599061756958, '_W_Y': 1587.0571284924222, 'GGYW': 1470.0100840336133, 'G___': 1729.8146416766558, '____topright': 1286.647380665226, 'GGRY': 1905.4073260073255, 'RWWG': 1497.9301456145445, '_RYG': 504.5015873015874, 'GGGG': 1307.263478947878, 'WWBG': 1092.6931017786453, 'YYGW': 2237.888888888889, 'GBG_': 1410.0712202829845}
bw = pp.algorithms.centrality.temporal_betweenness_centrality(t, delta=30)
print(bw)
100%|██████████| 883/883 [00:00<00:00, 5972.88it/s] 100%|██████████| 86/86 [00:00<00:00, 288.95it/s]
defaultdict(<function temporal_betweenness_centrality.<locals>.<lambda> at 0x7fb86ad1ac20>, {'Y_WY': 66.65000000000002, 'WRR_': 16.03571428571428, 'YGWW': 283.3939578478873, 'GGRR': 211.63928571428573, 'YWGW': 6.216666666666667, 'G_GW': 134.36212121212122, 'WG_R': 168.975250954719, 'YYRB': 169.60544909428017, 'GGWY': 1718.7165771554303, '____topleft': 55.833333333333336, 'YYGGmid': 150.59735449735444, 'WGGB': 493.10410618997815, '____corner': 73.68974206371455, 'YGWY': 80.06904761904762, 'GRWG': 297.3740605276305, '____bm': 32.72777777777777, 'G_R_': 72.95192620574925, 'GR_Y2': 789.1940156557653, 'WRBB': 305.85885874100165, '_W__': 81.93484848484847, '____pale': 1073.2734366837544, 'GR__': 56.99999999999999, 'GBG_': 281.5697318519009, 'GGW_': 566.394057706365, 'WBGG': 63.72222222222224, 'GGYW': 279.5543513957306, 'YYGW': 384.16915498294793, 'GRBR': 39.28333333333333, '_WYG': 154.75528281105835, 'WGWB': 709.6708463026312, 'GGRY': 436.8124589931511, 'W___': 25.635235103124238, '____right': 30.103663003663005, '__BB': 138.19613162299913, '_Y__': 147.16903444757997, '_WGG': 101.30372156909563, 'YYWR': 47.6098615332658, '____bot': 2.0, 'WBYG': 33.38144208037825, 'GG_W': 107.11576914023713, 'Q': 904.6767045461904, 'WRRY': 107.51936464672677, 'WYGG': 385.1114573090895, '____brood': 150.27813394335135, '_WYW': 835.2570820688013, '_R__': 609.6647506098733, '__W_': 66.65154738334509, '_WWY': 217.07608701685595, '_W_Y': 520.625, 'GBGW': 81.1450354609929, 'GBGR': 119.01949427664658, 'YY_R': 200.38014858937927, 'GGGG': 200.21429898610748, 'WBGW': 25.363559608240447, 'YWW_': 8.200000000000001, 'RWWG': 139.95452637739874, '____almost': 111.97335462899981, 'YYGG': 541.2239759619857, 'WRWR': 3.506755634419301, 'G_W_': 22.333333333333332, 'WGBB': 247.32307692307694, 'GWRG': 128.73138863630572, 'Y__W': 0.0, 'YY__': 166.20315481894167, '_WWW': 171.763197511, 'GYGG': 84.68378103378103, 'GRGY': 10.257082909008458, 'G___': 103.18272382540675, 'YYGGright': 377.3777533153582, 'RWGY': 164.463475221252, 'GGWW': 342.99788730570526, 'YY_W': 86.65538322985128, 'YYY_': 5.75, 'GRYY': 77.95139793084198, 'GR_Y': -9.900299794654632, 'G___small': 0.9999999999999989, 'G___big': -2.220446049250313e-16, 'GY__': 4.000000000000001, 'WWBG': 20.212121212121207, '____topright': -18.584859584859586, 'YYYY': 0.0, 'YWWW': -23.492217906522352, 'YYRG': 9.920634920634928, 'GGGR': 16.266666666666666, 'Y___': 19.65343915343914, 'WR__': -7.402116402116404})
m = pp.MultiOrderModel.from_temporal_graph(t, delta=30, max_order=4)
print(m.layers[1])
print(m.layers[2])
print(m.layers[3])
print(m.layers[4])
Directed graph with 89 nodes and 947 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([89, 1]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([947]) Graph attributes num_nodes <class 'int'> Directed graph with 947 nodes and 1780 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([947, 2]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([1780]) Graph attributes num_nodes <class 'int'> Directed graph with 1780 nodes and 2410 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([1780, 3]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([2410]) Graph attributes num_nodes <class 'int'> Directed graph with 2410 nodes and 3292 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([2410, 4]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([3292]) Graph attributes num_nodes <class 'int'>
t = pp.TemporalGraph.from_csv('../data/manufacturing_email.tedges')
print(t)
Temporal Graph with 167 nodes, 5784 unique edges and 82927 events in [1262454016.0, 1285884544.0] Graph attributes src <class 'torch.Tensor'> -> torch.Size([82927]) t <class 'torch.Tensor'> -> torch.Size([82927]) dst <class 'torch.Tensor'> -> torch.Size([82927])
/opt/conda/lib/python3.10/site-packages/torch_geometric/data/storage.py:450: UserWarning: Unable to accurately infer 'num_nodes' from the attribute set '{'src', 't', 'dst'}'. Please explicitly set 'num_nodes' as an attribute of 'data' to suppress this warning warnings.warn(
dist, pred = pp.algorithms.temporal.temporal_shortest_paths(t, delta=240)
100%|██████████| 30598/30598 [00:33<00:00, 909.07it/s]
m = pp.MultiOrderModel.from_temporal_graph(t, delta=240, max_order=4)
print(m.layers[1])
print(m.layers[2])
print(m.layers[3])
print(m.layers[4])
Directed graph with 167 nodes and 5784 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([167, 1]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([5784]) Graph attributes num_nodes <class 'int'> Directed graph with 5784 nodes and 3542 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([5784, 2]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([3542]) Graph attributes num_nodes <class 'int'> Directed graph with 3542 nodes and 812 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([3542, 3]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([812]) Graph attributes num_nodes <class 'int'> Directed graph with 812 nodes and 156 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([812, 4]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([156]) Graph attributes num_nodes <class 'int'>