Higher-Order Models for Time-Respecting Paths in Temporal Graphs¶
Prerequisites¶
First, we need to set up our Python environment that has PyTorch, PyTorch Geometric and PathpyG installed. Depending on where you are executing this notebook, this might already be (partially) done. E.g. Google Colab has PyTorch installed by default so we only need to install the remaining dependencies. The DevContainer that is part of our GitHub Repository on the other hand already has all of the necessary dependencies installed.
In the following, we install the packages for usage in Google Colab using Jupyter magic commands. For other environments comment in or out the commands as necessary. For more details on how to install pathpyG
especially if you want to install it with GPU-support, we refer to our documentation. Note that %%capture
discards the full output of the cell to not clutter this tutorial with unnecessary installation details. If you want to print the output, you can comment %%capture
out.
%%capture
# !pip install torch
!pip install torch_geometric
!pip install git+https://github.com/pathpy/pathpyG.git
Motivation and Learning Objectives¶
In the previous tutorial, we have seen how we can use higher-order models to model paths in complex networks. In this example, paths were directly given in terms of sequences of nodes traversed by some process (like a random walk). We have further seen that higher-order De Bruijn graph models can be used to capture patterns that influence the causal topology of a complex network, i.e. which nodes can possibly influence each other via paths. The same is true for time-respecting paths in a temporal graph. Due to the fact that time-stamped edges need to occur in the correct temporal ordering (and within a given time interval based on the maximum time difference $\delta$), the causal topology given by time-respecting paths can be very different from what we would expect from the (static) topology of links.
In the following, we will show how we can easiy and efficiently construct higher-order models for time-respecting paths in a temporal graph. To illustrate this, we use the same toy example as before:
import pathpyG as pp
pp.config['torch']['device'] = 'cpu'
tedges = [('a', 'b', 1),('a', 'b', 2), ('b', 'a', 3), ('b', 'c', 3), ('d', 'c', 4), ('a', 'b', 4), ('c', 'b', 4),
('c', 'd', 5), ('b', 'a', 5), ('c', 'b', 6)]
t = pp.TemporalGraph.from_edge_list(tedges)
print(t)
Temporal Graph with 4 nodes, 6 unique edges and 10 events in [1.0, 6.0] Graph attributes src <class 'torch.Tensor'> -> torch.Size([10]) dst <class 'torch.Tensor'> -> torch.Size([10]) t <class 'torch.Tensor'> -> torch.Size([10])
/opt/conda/lib/python3.10/site-packages/torch_geometric/data/storage.py:450: UserWarning: Unable to accurately infer 'num_nodes' from the attribute set '{'src', 'dst', 't'}'. Please explicitly set 'num_nodes' as an attribute of 'data' to suppress this warning warnings.warn(
To better understand this temporal graph, we can again create a directed acyclic graph that represents the topology of time-respecting paths:
e_i = pp.algorithms.lift_order_temporal(t, delta=1)
dag = pp.Graph.from_edge_index(e_i)
pp.plot(dag, node_label = [f'{v}-{w}-{time}' for v, w, time in t.temporal_edges]);
100%|██████████| 6/6 [00:00<00:00, 930.41it/s]
For $\delta=1$, we again have the following time-respecting paths:
Length one:
a -> b
b -> a
b -> c
c -> b
c -> d
d -> c
Length two:
a -> b -> a (twice)
b -> a -> b
a -> b -> c
b -> c -> b
c -> b -> a
d -> c -> d
Length three:
a -> b -> a -> b
b -> a -> b -> a
a -> b -> c -> b
b -> c -> b -> a
Length four:
a -> b -> a -> b -> a
a -> b -> c -> b -> a
As you can see, these time-respecting paths are actually very similar to the paths data that we have previously represented using the PathData
object. In fact, we could - in theory - first extract all time-respecting paths of all lengths, add them to a PathData
object and then use the MultiOderModel
class to generate higher-order De Bruijn graph models of all orders. In the example above, since we have paths of length one to four, we could create higher-order models with orders from one to four.
However, this approach would not be efficient for large temporal graphs, as it is computationally expensive to calculate all possible time-respecting paths as well as subpaths of length $k$, especially for larger values of $\delta$. To avoid this bottleneck, pathpyG
uses a smarter, GPU-based algorithm to calculate time-respecting paths of length $k$ that are needed for a given order $k$.
For the example above, we can generate all higher-order models up to order four as follows:
m = pp.MultiOrderModel.from_temporal_graph(t, delta=1, max_order=4)
print(m.layers[3])
print(m.layers[4])
Directed graph with 6 nodes and 4 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([6, 3]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([4]) Graph attributes inverse_idx <class 'torch.Tensor'> -> torch.Size([7]) num_nodes <class 'int'> Directed graph with 4 nodes and 2 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([4, 4]) inverse_idx <class 'torch.Tensor'> -> torch.Size([4]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([2]) Graph attributes num_nodes <class 'int'>
Remember that in a $k$-th order model nodes capture paths of length $k-1$, while edges capture paths of length $k$.
This implies that the first-order model has four nodes and six edges, which simply corresponds to the time-aggregated weighted graph for our example temporal network.
print(m.layers[1])
pp.plot(m.layers[1], node_label=[v for v in m.layers[1].nodes]);
Undirected graph with 4 nodes and 6 (directed) edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([4, 1]) inverse_idx <class 'torch.Tensor'> -> torch.Size([4]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([6]) Graph attributes num_nodes <class 'int'>
For the second-order model, we have six nodes, which map to the six different edges (each edge trivially being a time-respecting path of length one) of the temporal graph. The six edges in the second-order model represent the six different time-respecting paths of length two (see above). Since the time-respecting path $a \rightarrow b \rightarrow a$ occurs twice at different times, we have one edge with weight two.
print(m.layers[2])
print(m.layers[2].data.edge_weight)
pp.plot(m.layers[2], node_label=m.layers[2].mapping.node_ids.tolist());
Directed graph with 6 nodes and 6 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([6, 2]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([6]) Graph attributes inverse_idx <class 'torch.Tensor'> -> torch.Size([10]) num_nodes <class 'int'> tensor([2., 1., 1., 1., 1., 1.])
For the third-oder mode, we have four edges representing the four diffeerent time-respecting paths of length three in the temporal graph above:
print(m.layers[3])
pp.plot(m.layers[3], node_label=m.layers[3].mapping.node_ids.tolist());
Directed graph with 6 nodes and 4 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([6, 3]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([4]) Graph attributes inverse_idx <class 'torch.Tensor'> -> torch.Size([7]) num_nodes <class 'int'>
And finally, for the model with order $k=4$ we only have two edges, representing the two time-respecting paths $a \rightarrow b \rightarrow a \rightarrow b \rightarrow a$ and $a \rightarrow b \rightarrow c \rightarrow b \rightarrow a$:
print(m.layers[4])
pp.plot(m.layers[4], node_label=m.layers[4].mapping.node_ids.tolist());
Directed graph with 4 nodes and 2 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([4, 4]) inverse_idx <class 'torch.Tensor'> -> torch.Size([4]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([2]) Graph attributes num_nodes <class 'int'>
Intuitively, since in our example there are no time-respecting paths longer than four, if we were to generate a multi-order model with De Bruijn graphs with orders larger than four, those graphs cannot contain any edges. We see this in the following example. The first-order graph is simply the time-aggregated weighted graph, i.e. the number of nodes is equal to the number of nodes in the temporal graph and the number of edges is equal to the number of different time-stamped edges. In each graph of order $k>1$, the number of nodes corresponds to the number of edges in the graph with order $k-1$, since each of those nodes corresponds to a time-respecting path of length $k-1$, which are represented by edges in a $k-1$-th order gaph. This implies that the graph with order five has two nodes, which are the two time-respecting paths of length four. Those nodes are not connected since there is no time-respecting path with length five.
m = pp.MultiOrderModel.from_temporal_graph(t, delta=1, max_order=5)
print(m.layers[4])
print(m.layers[5])
pp.plot(m.layers[5], node_label=m.layers[5].mapping.node_ids.tolist());
Directed graph with 4 nodes and 2 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([4, 4]) inverse_idx <class 'torch.Tensor'> -> torch.Size([4]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([2]) Graph attributes num_nodes <class 'int'> Undirected graph with 2 nodes and 0 (directed) edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([2, 5]) inverse_idx <class 'torch.Tensor'> -> torch.Size([2]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([0]) Graph attributes num_nodes <class 'int'>
Constructing Higher-Order De Bruijn Graph Models for Empirical Temporal Networks¶
Let us now use pathpyG
to construcct higher-order De Bruijn graph models for time-respecting paths in empirical temporal network. For this, we first read a number of temporal graphs using TemporalGraph.from_csv
. In the following, we use the following three publicly available data sets:
- Antenna interactions between ants in a colony (Blonder and Dornhaus, 2011)
- E-Mail exchanges in a manufacturing company (Nurek and Michalski, 2020)
- Face-to-face interactions in a highschool (Mastrandrea, Fournet, Barrat, 2015)
# pp.config['torch']['device'] = 'cuda'
t_ants = pp.io.read_csv_temporal_graph('../data/ants_1_1.tedges', header=False)
print(t_ants)
Temporal Graph with 89 nodes, 1298 unique edges and 3822 events in [0.0, 1438.0] Graph attributes src <class 'torch.Tensor'> -> torch.Size([3822]) dst <class 'torch.Tensor'> -> torch.Size([3822]) t <class 'torch.Tensor'> -> torch.Size([3822])
/opt/conda/lib/python3.10/site-packages/torch_geometric/data/storage.py:450: UserWarning: Unable to accurately infer 'num_nodes' from the attribute set '{'src', 'dst', 't'}'. Please explicitly set 'num_nodes' as an attribute of 'data' to suppress this warning warnings.warn(
t_email = pp.io.read_csv_temporal_graph('../data/manufacturing_email.tedges', header=False)
print(t_email)
Temporal Graph with 167 nodes, 6501 unique edges and 165854 events in [1262454016.0, 1285884544.0] Graph attributes src <class 'torch.Tensor'> -> torch.Size([165854]) dst <class 'torch.Tensor'> -> torch.Size([165854]) t <class 'torch.Tensor'> -> torch.Size([165854])
/opt/conda/lib/python3.10/site-packages/torch_geometric/data/storage.py:450: UserWarning: Unable to accurately infer 'num_nodes' from the attribute set '{'src', 'dst', 't'}'. Please explicitly set 'num_nodes' as an attribute of 'data' to suppress this warning warnings.warn(
t_sp = pp.io.read_csv_temporal_graph('../data/sociopatterns_highschool_2013.tedges', header=False)
print(t_sp)
--------------------------------------------------------------------------- FileNotFoundError Traceback (most recent call last) Cell In[12], line 1 ----> 1 t_sp = pp.io.read_csv_temporal_graph('../data/sociopatterns_highschool_2013.tedges', header=False) 2 print(t_sp) File /workspaces/pathpyG/src/pathpyG/io/pandas.py:399, in read_csv_temporal_graph(filename, sep, header, is_undirected, timestamp_format, time_rescale, **kwargs) 397 df = pd.read_csv(filename, header=0, sep=sep) 398 else: --> 399 df = pd.read_csv(filename, header=None, sep=sep) 400 return df_to_temporal_graph(df, is_undirected=is_undirected, timestamp_fromat=timestamp_format, time_rescale=time_rescale, **kwargs) File /opt/conda/lib/python3.10/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs) 209 else: 210 kwargs[new_arg_name] = new_arg_value --> 211 return func(*args, **kwargs) File /opt/conda/lib/python3.10/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs) 325 if len(args) > num_allow_args: 326 warnings.warn( 327 msg.format(arguments=_format_argument_list(allow_args)), 328 FutureWarning, 329 stacklevel=find_stack_level(), 330 ) --> 331 return func(*args, **kwargs) File /opt/conda/lib/python3.10/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options) 935 kwds_defaults = _refine_defaults_read( 936 dialect, 937 delimiter, (...) 946 defaults={"delimiter": ","}, 947 ) 948 kwds.update(kwds_defaults) --> 950 return _read(filepath_or_buffer, kwds) File /opt/conda/lib/python3.10/site-packages/pandas/io/parsers/readers.py:605, in _read(filepath_or_buffer, kwds) 602 _validate_names(kwds.get("names", None)) 604 # Create the parser. --> 605 parser = TextFileReader(filepath_or_buffer, **kwds) 607 if chunksize or iterator: 608 return parser File /opt/conda/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1442, in TextFileReader.__init__(self, f, engine, **kwds) 1439 self.options["has_index_names"] = kwds["has_index_names"] 1441 self.handles: IOHandles | None = None -> 1442 self._engine = self._make_engine(f, self.engine) File /opt/conda/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1735, in TextFileReader._make_engine(self, f, engine) 1733 if "b" not in mode: 1734 mode += "b" -> 1735 self.handles = get_handle( 1736 f, 1737 mode, 1738 encoding=self.options.get("encoding", None), 1739 compression=self.options.get("compression", None), 1740 memory_map=self.options.get("memory_map", False), 1741 is_text=is_text, 1742 errors=self.options.get("encoding_errors", "strict"), 1743 storage_options=self.options.get("storage_options", None), 1744 ) 1745 assert self.handles is not None 1746 f = self.handles.handle File /opt/conda/lib/python3.10/site-packages/pandas/io/common.py:856, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options) 851 elif isinstance(handle, str): 852 # Check whether the filename is to be opened in binary mode. 853 # Binary mode does not support 'encoding' and 'newline'. 854 if ioargs.encoding and "b" not in ioargs.mode: 855 # Encoding --> 856 handle = open( 857 handle, 858 ioargs.mode, 859 encoding=ioargs.encoding, 860 errors=errors, 861 newline="", 862 ) 863 else: 864 # Binary mode 865 handle = open(handle, ioargs.mode) FileNotFoundError: [Errno 2] No such file or directory: '../data/sociopatterns_highschool_2013.tedges'
To generate a MultiOderModel
consisting of multiple layers of higher-order De Bruijn graph models, we can use the MultiOderModel.from_temporal_graph
method. We can further specify the maximum order of the highest-order layer, as well as the maximum time difference $\delta$ for time-respecting paths.
For the ants, we consider time-respecting paths with a maximum time difference of 30 seconds up to length four:
m = pp.MultiOrderModel.from_temporal_graph(t_ants, delta=30, max_order=4)
print(m.layers[1])
print(m.layers[2])
print(m.layers[3])
print(m.layers[4])
Directed graph with 89 nodes and 947 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([89, 1]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([947]) Graph attributes num_nodes <class 'int'> Directed graph with 947 nodes and 1780 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([947, 2]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([1780]) Graph attributes num_nodes <class 'int'> Directed graph with 1780 nodes and 2410 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([1780, 3]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([2410]) Graph attributes num_nodes <class 'int'> Directed graph with 2410 nodes and 3292 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([2410, 4]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([3292]) Graph attributes num_nodes <class 'int'>
For the E-Mail communication network, we use time-respecting paths up to length four with a maximum time difference of one hour.
m = pp.MultiOrderModel.from_temporal_graph(t_email, delta=3600, max_order=4)
print(m.layers[1])
print(m.layers[2])
print(m.layers[3])
print(m.layers[4])
Directed graph with 167 nodes and 5784 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([167, 1]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([5784]) Graph attributes num_nodes <class 'int'> Directed graph with 5784 nodes and 25596 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([5784, 2]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([25596]) Graph attributes num_nodes <class 'int'> Directed graph with 25596 nodes and 47326 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([25596, 3]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([47326]) Graph attributes num_nodes <class 'int'> Directed graph with 47326 nodes and 67801 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([47326, 4]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([67801]) Graph attributes num_nodes <class 'int'>
And finally, for the largest data set from the Sociopatterns collaboration, we use a maximum time difference of 15 minutes. As you can see below, we can efficiently generate a 5-th order model despite using a temporal graph with more than 188,000 time-stamped edges and considering all time-respecting paths up to length five with a large maximum time difference. Thanks to the use of GPU-accelerated operations, creating such a model takes less than 12 seconds on an (old) RTX 2090 GPU.
m = pp.MultiOrderModel.from_temporal_graph(t_sp, delta=900, max_order=5)
print(m.layers[1])
print(m.layers[5])
Directed graph with 327 nodes and 5818 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([327, 1]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([5818]) Graph attributes num_nodes <class 'int'> Directed graph with 16307 nodes and 8712 edges Node attributes node_sequence <class 'torch.Tensor'> -> torch.Size([16307, 5]) Edge attributes edge_weight <class 'torch.Tensor'> -> torch.Size([8712]) Graph attributes num_nodes <class 'int'>
How can we use such higher-order graph models for graph learning tasks? We will demonstrate this in the next unit of our tutorial.