Higher-Order Models for Time-Respecting Paths in Temporal Graphs¶

Prerequisites¶

First, we need to set up our Python environment that has PyTorch, PyTorch Geometric and PathpyG installed. Depending on where you are executing this notebook, this might already be (partially) done. E.g. Google Colab has PyTorch installed by default so we only need to install the remaining dependencies. The DevContainer that is part of our GitHub Repository on the other hand already has all of the necessary dependencies installed.

In the following, we install the packages for usage in Google Colab using Jupyter magic commands. For other environments comment in or out the commands as necessary. For more details on how to install pathpyG especially if you want to install it with GPU-support, we refer to our documentation. Note that %%capture discards the full output of the cell to not clutter this tutorial with unnecessary installation details. If you want to print the output, you can comment %%capture out.

In [1]:

Copied!





%%capture
# !pip install torch
!pip install torch_geometric
!pip install git+https://github.com/pathpy/pathpyG.git
%%capture
# !pip install torch
!pip install torch_geometric
!pip install git+https://github.com/pathpy/pathpyG.git

Motivation and Learning Objectives¶

In the previous tutorial, we have seen how we can use higher-order models to model paths in complex networks. In this example, paths were directly given in terms of sequences of nodes traversed by some process (like a random walk). We have further seen that higher-order De Bruijn graph models can be used to capture patterns that influence the causal topology of a complex network, i.e. which nodes can possibly influence each other via paths. The same is true for time-respecting paths in a temporal graph. Due to the fact that time-stamped edges need to occur in the correct temporal ordering (and within a given time interval based on the maximum time difference $\delta$), the causal topology given by time-respecting paths can be very different from what we would expect from the (static) topology of links.

In the following, we will show how we can easiy and efficiently construct higher-order models for time-respecting paths in a temporal graph. To illustrate this, we use the same toy example as before:

In [1]:

Copied!

import pathpyG as pp

pp.config['torch']['device'] = 'cpu'
import pathpyG as pp

pp.config['torch']['device'] = 'cpu'

In [2]:

Copied!





tedges = [('a', 'b', 1),('a', 'b', 2), ('b', 'a', 3), ('b', 'c', 3), ('d', 'c', 4), ('a', 'b', 4), ('c', 'b', 4),
              ('c', 'd', 5), ('b', 'a', 5), ('c', 'b', 6)]
t = pp.TemporalGraph.from_edge_list(tedges)
print(t)
tedges = [('a', 'b', 1),('a', 'b', 2), ('b', 'a', 3), ('b', 'c', 3), ('d', 'c', 4), ('a', 'b', 4), ('c', 'b', 4),
              ('c', 'd', 5), ('b', 'a', 5), ('c', 'b', 6)]
t = pp.TemporalGraph.from_edge_list(tedges)
print(t)

Temporal Graph with 4 nodes, 6 unique edges and 10 events in [1.0, 6.0]

Graph attributes
	src		<class 'torch.Tensor'> -> torch.Size([10])
	dst		<class 'torch.Tensor'> -> torch.Size([10])
	t		<class 'torch.Tensor'> -> torch.Size([10])

/opt/conda/lib/python3.10/site-packages/torch_geometric/data/storage.py:450: UserWarning: Unable to accurately infer 'num_nodes' from the attribute set '{'src', 'dst', 't'}'. Please explicitly set 'num_nodes' as an attribute of 'data' to suppress this warning
  warnings.warn(

To better understand this temporal graph, we can again create a directed acyclic graph that represents the topology of time-respecting paths:

In [3]:

Copied!

e_i = pp.algorithms.lift_order_temporal(t, delta=1)
dag = pp.Graph.from_edge_index(e_i)
pp.plot(dag, node_label = [f'{v}-{w}-{time}' for v, w, time in t.temporal_edges]);
e_i = pp.algorithms.lift_order_temporal(t, delta=1)
dag = pp.Graph.from_edge_index(e_i)
pp.plot(dag, node_label = [f'{v}-{w}-{time}' for v, w, time in t.temporal_edges]);

100%|██████████| 6/6 [00:00<00:00, 930.41it/s]

For $\delta=1$, we again have the following time-respecting paths:

Length one:
a -> b
b -> a
b -> c
c -> b
c -> d
d -> c
Length two:
a -> b -> a (twice)
b -> a -> b
a -> b -> c
b -> c -> b
c -> b -> a
d -> c -> d
Length three:
a -> b -> a -> b
b -> a -> b -> a
a -> b -> c -> b
b -> c -> b -> a
Length four:
a -> b -> a -> b -> a
a -> b -> c -> b -> a

As you can see, these time-respecting paths are actually very similar to the paths data that we have previously represented using the PathData object. In fact, we could - in theory - first extract all time-respecting paths of all lengths, add them to a PathData object and then use the MultiOderModel class to generate higher-order De Bruijn graph models of all orders. In the example above, since we have paths of length one to four, we could create higher-order models with orders from one to four.

However, this approach would not be efficient for large temporal graphs, as it is computationally expensive to calculate all possible time-respecting paths as well as subpaths of length $k$, especially for larger values of $\delta$. To avoid this bottleneck, pathpyG uses a smarter, GPU-based algorithm to calculate time-respecting paths of length $k$ that are needed for a given order $k$.

For the example above, we can generate all higher-order models up to order four as follows:

In [4]:

Copied!

m = pp.MultiOrderModel.from_temporal_graph(t, delta=1, max_order=4)

print(m.layers[3])
print(m.layers[4])
m = pp.MultiOrderModel.from_temporal_graph(t, delta=1, max_order=4)

print(m.layers[3])
print(m.layers[4])

Directed graph with 6 nodes and 4 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([6, 3])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([4])

Graph attributes
	inverse_idx		<class 'torch.Tensor'> -> torch.Size([7])
	num_nodes		<class 'int'>

Directed graph with 4 nodes and 2 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([4, 4])
	inverse_idx		<class 'torch.Tensor'> -> torch.Size([4])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([2])

Graph attributes
	num_nodes		<class 'int'>

Remember that in a $k$-th order model nodes capture paths of length $k-1$, while edges capture paths of length $k$.

This implies that the first-order model has four nodes and six edges, which simply corresponds to the time-aggregated weighted graph for our example temporal network.

In [5]:

Copied!

print(m.layers[1])
pp.plot(m.layers[1], node_label=[v for v in m.layers[1].nodes]);
print(m.layers[1])
pp.plot(m.layers[1], node_label=[v for v in m.layers[1].nodes]);

Undirected graph with 4 nodes and 6 (directed) edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([4, 1])
	inverse_idx		<class 'torch.Tensor'> -> torch.Size([4])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([6])

Graph attributes
	num_nodes		<class 'int'>

For the second-order model, we have six nodes, which map to the six different edges (each edge trivially being a time-respecting path of length one) of the temporal graph. The six edges in the second-order model represent the six different time-respecting paths of length two (see above). Since the time-respecting path $a \rightarrow b \rightarrow a$ occurs twice at different times, we have one edge with weight two.

In [6]:

Copied!

print(m.layers[2])
print(m.layers[2].data.edge_weight)
pp.plot(m.layers[2], node_label=m.layers[2].mapping.node_ids.tolist());
print(m.layers[2])
print(m.layers[2].data.edge_weight)
pp.plot(m.layers[2], node_label=m.layers[2].mapping.node_ids.tolist());

Directed graph with 6 nodes and 6 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([6, 2])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([6])

Graph attributes
	inverse_idx		<class 'torch.Tensor'> -> torch.Size([10])
	num_nodes		<class 'int'>

tensor([2., 1., 1., 1., 1., 1.])

For the third-oder mode, we have four edges representing the four diffeerent time-respecting paths of length three in the temporal graph above:

In [7]:

Copied!

print(m.layers[3])
pp.plot(m.layers[3], node_label=m.layers[3].mapping.node_ids.tolist());
print(m.layers[3])
pp.plot(m.layers[3], node_label=m.layers[3].mapping.node_ids.tolist());

Directed graph with 6 nodes and 4 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([6, 3])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([4])

Graph attributes
	inverse_idx		<class 'torch.Tensor'> -> torch.Size([7])
	num_nodes		<class 'int'>

And finally, for the model with order $k=4$ we only have two edges, representing the two time-respecting paths $a \rightarrow b \rightarrow a \rightarrow b \rightarrow a$ and $a \rightarrow b \rightarrow c \rightarrow b \rightarrow a$:

In [8]:

Copied!

print(m.layers[4])
pp.plot(m.layers[4], node_label=m.layers[4].mapping.node_ids.tolist());
print(m.layers[4])
pp.plot(m.layers[4], node_label=m.layers[4].mapping.node_ids.tolist());

Directed graph with 4 nodes and 2 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([4, 4])
	inverse_idx		<class 'torch.Tensor'> -> torch.Size([4])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([2])

Graph attributes
	num_nodes		<class 'int'>

Intuitively, since in our example there are no time-respecting paths longer than four, if we were to generate a multi-order model with De Bruijn graphs with orders larger than four, those graphs cannot contain any edges. We see this in the following example. The first-order graph is simply the time-aggregated weighted graph, i.e. the number of nodes is equal to the number of nodes in the temporal graph and the number of edges is equal to the number of different time-stamped edges. In each graph of order $k>1$, the number of nodes corresponds to the number of edges in the graph with order $k-1$, since each of those nodes corresponds to a time-respecting path of length $k-1$, which are represented by edges in a $k-1$-th order gaph. This implies that the graph with order five has two nodes, which are the two time-respecting paths of length four. Those nodes are not connected since there is no time-respecting path with length five.

In [9]:

Copied!

m = pp.MultiOrderModel.from_temporal_graph(t, delta=1, max_order=5)

print(m.layers[4])
print(m.layers[5])
pp.plot(m.layers[5], node_label=m.layers[5].mapping.node_ids.tolist());
m = pp.MultiOrderModel.from_temporal_graph(t, delta=1, max_order=5)

print(m.layers[4])
print(m.layers[5])
pp.plot(m.layers[5], node_label=m.layers[5].mapping.node_ids.tolist());

Directed graph with 4 nodes and 2 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([4, 4])
	inverse_idx		<class 'torch.Tensor'> -> torch.Size([4])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([2])

Graph attributes
	num_nodes		<class 'int'>

Undirected graph with 2 nodes and 0 (directed) edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([2, 5])
	inverse_idx		<class 'torch.Tensor'> -> torch.Size([2])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([0])

Graph attributes
	num_nodes		<class 'int'>

Constructing Higher-Order De Bruijn Graph Models for Empirical Temporal Networks¶

Let us now use pathpyG to construcct higher-order De Bruijn graph models for time-respecting paths in empirical temporal network. For this, we first read a number of temporal graphs using TemporalGraph.from_csv. In the following, we use the following three publicly available data sets:

Antenna interactions between ants in a colony (Blonder and Dornhaus, 2011)
E-Mail exchanges in a manufacturing company (Nurek and Michalski, 2020)
Face-to-face interactions in a highschool (Mastrandrea, Fournet, Barrat, 2015)

In [2]:

Copied!

# pp.config['torch']['device'] = 'cuda'
# pp.config['torch']['device'] = 'cuda'

In [10]:

Copied!

t_ants = pp.io.read_csv_temporal_graph('../data/ants_1_1.tedges', header=False)
print(t_ants)
t_ants = pp.io.read_csv_temporal_graph('../data/ants_1_1.tedges', header=False)
print(t_ants)

Temporal Graph with 89 nodes, 1298 unique edges and 3822 events in [0.0, 1438.0]

Graph attributes
	src		<class 'torch.Tensor'> -> torch.Size([3822])
	dst		<class 'torch.Tensor'> -> torch.Size([3822])
	t		<class 'torch.Tensor'> -> torch.Size([3822])

/opt/conda/lib/python3.10/site-packages/torch_geometric/data/storage.py:450: UserWarning: Unable to accurately infer 'num_nodes' from the attribute set '{'src', 'dst', 't'}'. Please explicitly set 'num_nodes' as an attribute of 'data' to suppress this warning
  warnings.warn(

In [11]:

Copied!

t_email = pp.io.read_csv_temporal_graph('../data/manufacturing_email.tedges', header=False)
print(t_email)
t_email = pp.io.read_csv_temporal_graph('../data/manufacturing_email.tedges', header=False)
print(t_email)

Temporal Graph with 167 nodes, 6501 unique edges and 165854 events in [1262454016.0, 1285884544.0]

Graph attributes
	src		<class 'torch.Tensor'> -> torch.Size([165854])
	dst		<class 'torch.Tensor'> -> torch.Size([165854])
	t		<class 'torch.Tensor'> -> torch.Size([165854])

/opt/conda/lib/python3.10/site-packages/torch_geometric/data/storage.py:450: UserWarning: Unable to accurately infer 'num_nodes' from the attribute set '{'src', 'dst', 't'}'. Please explicitly set 'num_nodes' as an attribute of 'data' to suppress this warning
  warnings.warn(

In [12]:

Copied!

t_sp = pp.io.read_csv_temporal_graph('../data/sociopatterns_highschool_2013.tedges', header=False)
print(t_sp)
t_sp = pp.io.read_csv_temporal_graph('../data/sociopatterns_highschool_2013.tedges', header=False)
print(t_sp)

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[12], line 1
----> 1 t_sp = pp.io.read_csv_temporal_graph('../data/sociopatterns_highschool_2013.tedges', header=False)
      2 print(t_sp)

File /workspaces/pathpyG/src/pathpyG/io/pandas.py:399, in read_csv_temporal_graph(filename, sep, header, is_undirected, timestamp_format, time_rescale, **kwargs)
    397     df = pd.read_csv(filename, header=0, sep=sep)
    398 else:
--> 399     df = pd.read_csv(filename, header=None, sep=sep)
    400 return df_to_temporal_graph(df, is_undirected=is_undirected, timestamp_fromat=timestamp_format, time_rescale=time_rescale, **kwargs)

File /opt/conda/lib/python3.10/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
    209     else:
    210         kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    325 if len(args) > num_allow_args:
    326     warnings.warn(
    327         msg.format(arguments=_format_argument_list(allow_args)),
    328         FutureWarning,
    329         stacklevel=find_stack_level(),
    330     )
--> 331 return func(*args, **kwargs)

File /opt/conda/lib/python3.10/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    935 kwds_defaults = _refine_defaults_read(
    936     dialect,
    937     delimiter,
   (...)
    946     defaults={"delimiter": ","},
    947 )
    948 kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)

File /opt/conda/lib/python3.10/site-packages/pandas/io/parsers/readers.py:605, in _read(filepath_or_buffer, kwds)
    602 _validate_names(kwds.get("names", None))
    604 # Create the parser.
--> 605 parser = TextFileReader(filepath_or_buffer, **kwds)
    607 if chunksize or iterator:
    608     return parser

File /opt/conda/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1442, in TextFileReader.__init__(self, f, engine, **kwds)
   1439     self.options["has_index_names"] = kwds["has_index_names"]
   1441 self.handles: IOHandles | None = None
-> 1442 self._engine = self._make_engine(f, self.engine)

File /opt/conda/lib/python3.10/site-packages/pandas/io/parsers/readers.py:1735, in TextFileReader._make_engine(self, f, engine)
   1733     if "b" not in mode:
   1734         mode += "b"
-> 1735 self.handles = get_handle(
   1736     f,
   1737     mode,
   1738     encoding=self.options.get("encoding", None),
   1739     compression=self.options.get("compression", None),
   1740     memory_map=self.options.get("memory_map", False),
   1741     is_text=is_text,
   1742     errors=self.options.get("encoding_errors", "strict"),
   1743     storage_options=self.options.get("storage_options", None),
   1744 )
   1745 assert self.handles is not None
   1746 f = self.handles.handle

File /opt/conda/lib/python3.10/site-packages/pandas/io/common.py:856, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    851 elif isinstance(handle, str):
    852     # Check whether the filename is to be opened in binary mode.
    853     # Binary mode does not support 'encoding' and 'newline'.
    854     if ioargs.encoding and "b" not in ioargs.mode:
    855         # Encoding
--> 856         handle = open(
    857             handle,
    858             ioargs.mode,
    859             encoding=ioargs.encoding,
    860             errors=errors,
    861             newline="",
    862         )
    863     else:
    864         # Binary mode
    865         handle = open(handle, ioargs.mode)

FileNotFoundError: [Errno 2] No such file or directory: '../data/sociopatterns_highschool_2013.tedges'

To generate a MultiOderModel consisting of multiple layers of higher-order De Bruijn graph models, we can use the MultiOderModel.from_temporal_graph method. We can further specify the maximum order of the highest-order layer, as well as the maximum time difference $\delta$ for time-respecting paths.

For the ants, we consider time-respecting paths with a maximum time difference of 30 seconds up to length four:

In [6]:

Copied!





m = pp.MultiOrderModel.from_temporal_graph(t_ants, delta=30, max_order=4)
print(m.layers[1])
print(m.layers[2])
print(m.layers[3])
print(m.layers[4])
m = pp.MultiOrderModel.from_temporal_graph(t_ants, delta=30, max_order=4)
print(m.layers[1])
print(m.layers[2])
print(m.layers[3])
print(m.layers[4])

Directed graph with 89 nodes and 947 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([89, 1])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([947])

Graph attributes
	num_nodes		<class 'int'>

Directed graph with 947 nodes and 1780 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([947, 2])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([1780])

Graph attributes
	num_nodes		<class 'int'>

Directed graph with 1780 nodes and 2410 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([1780, 3])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([2410])

Graph attributes
	num_nodes		<class 'int'>

Directed graph with 2410 nodes and 3292 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([2410, 4])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([3292])

Graph attributes
	num_nodes		<class 'int'>

For the E-Mail communication network, we use time-respecting paths up to length four with a maximum time difference of one hour.

In [7]:

Copied!





m = pp.MultiOrderModel.from_temporal_graph(t_email, delta=3600, max_order=4)
print(m.layers[1])
print(m.layers[2])
print(m.layers[3])
print(m.layers[4])
m = pp.MultiOrderModel.from_temporal_graph(t_email, delta=3600, max_order=4)
print(m.layers[1])
print(m.layers[2])
print(m.layers[3])
print(m.layers[4])

Directed graph with 167 nodes and 5784 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([167, 1])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([5784])

Graph attributes
	num_nodes		<class 'int'>

Directed graph with 5784 nodes and 25596 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([5784, 2])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([25596])

Graph attributes
	num_nodes		<class 'int'>

Directed graph with 25596 nodes and 47326 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([25596, 3])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([47326])

Graph attributes
	num_nodes		<class 'int'>

Directed graph with 47326 nodes and 67801 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([47326, 4])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([67801])

Graph attributes
	num_nodes		<class 'int'>

And finally, for the largest data set from the Sociopatterns collaboration, we use a maximum time difference of 15 minutes. As you can see below, we can efficiently generate a 5-th order model despite using a temporal graph with more than 188,000 time-stamped edges and considering all time-respecting paths up to length five with a large maximum time difference. Thanks to the use of GPU-accelerated operations, creating such a model takes less than 12 seconds on an (old) RTX 2090 GPU.

In [7]:

Copied!

m = pp.MultiOrderModel.from_temporal_graph(t_sp, delta=900, max_order=5)
print(m.layers[1])
print(m.layers[5])
m = pp.MultiOrderModel.from_temporal_graph(t_sp, delta=900, max_order=5)
print(m.layers[1])
print(m.layers[5])

Directed graph with 327 nodes and 5818 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([327, 1])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([5818])

Graph attributes
	num_nodes		<class 'int'>

Directed graph with 16307 nodes and 8712 edges

Node attributes
	node_sequence		<class 'torch.Tensor'> -> torch.Size([16307, 5])

Edge attributes
	edge_weight		<class 'torch.Tensor'> -> torch.Size([8712])

Graph attributes
	num_nodes		<class 'int'>

How can we use such higher-order graph models for graph learning tasks? We will demonstrate this in the next unit of our tutorial.