Causality-Aware Graph Neural Networks¶

Prerequisites¶

First, we need to set up our Python environment that has PyTorch, PyTorch Geometric and PathpyG installed. Depending on where you are executing this notebook, this might already be (partially) done. E.g. Google Colab has PyTorch installed by default so we only need to install the remaining dependencies. The DevContainer that is part of our GitHub Repository on the other hand already has all of the necessary dependencies installed.

In the following, we install the packages for usage in Google Colab using Jupyter magic commands. For other environments comment in or out the commands as necessary. For more details on how to install pathpyG especially if you want to install it with GPU-support, we refer to our documentation. Note that %%capture discards the full output of the cell to not clutter this tutorial with unnecessary installation details. If you want to print the output, you can comment %%capture out.

In [28]:

Copied!





%%capture
# !pip install torch
# !pip install torch_geometric
# !pip install git+https://github.com/pathpy/pathpyG.git
%%capture
# !pip install torch
# !pip install torch_geometric
# !pip install git+https://github.com/pathpy/pathpyG.git

Motivation and Learning Objectives¶

In previous tutorials, we have introduced causal paths in temporal graphs, and how we can use them to generate higher-order De Bruijn graph models that capture temporal-topological patterns in time series data. In this tutorial, we will show how we can use De Bruijn Graph Neural Networks, a causality-aware deep learning architecture for temporal graph data. The details of this approach are introduced in this paper. The architecture is implemented in pathpyG and can be readily applied to temporal graph data.

Below we illustrate this method in a supervised node classification task, i.e. given a temporal graph we will use the temporal-topological patterns in the graph to classify nodes.

We start by importing a few modules:

In [29]:

Copied!





import os
import tempfile
from copy import deepcopy
from urllib import request

import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
import torch
from sklearn.manifold import TSNE
from sklearn.metrics import balanced_accuracy_score
from torch_geometric.transforms import RandomNodeSplit

import pathpyG as pp
from pathpyG.nn.dbgnn import DBGNN

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
import os
import tempfile
from copy import deepcopy
from urllib import request

import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
import torch
from sklearn.manifold import TSNE
from sklearn.metrics import balanced_accuracy_score
from torch_geometric.transforms import RandomNodeSplit

import pathpyG as pp
from pathpyG.nn.dbgnn import DBGNN

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Temporal-Topological Clusters in Temporal Graphs¶

Let us load a small synthetic toy example for a temporal graph with 60.000 time-stamped interactions between 30 nodes. We use the TemporalGraph class to load this example from a file containing edges with discrete time-stamps.

Dataset Availability

Depending on how you are executing this notebook, you may need to download the dataset first. We automatically check if the dataset is available in the relative path ../data/temporal_clusters.tedges, which is the default location if you cloned the pathpyG repository. If the file is not found, we download it from the GitHub repository.

In [30]:

Copied!





if os.path.exists('../data/temporal_clusters.tedges'):
    print("Loading dataset from local path...")
    t = pp.io.read_csv_temporal_graph('../data/temporal_clusters.tedges', header=False)
else:
    print("Loading dataset from remote URL...")
    with tempfile.TemporaryDirectory() as tmpdir:
        url = "https://raw.githubusercontent.com/pathpy/pathpyG/refs/heads/main/docs/data/temporal_clusters.tedges"
        file_path = os.path.join(tmpdir, 'temporal_clusters.tedges')
        request.urlretrieve(url, file_path)
        t = pp.io.read_csv_temporal_graph(file_path, header=False)

t = t.to(device)
if os.path.exists('../data/temporal_clusters.tedges'):
    print("Loading dataset from local path...")
    t = pp.io.read_csv_temporal_graph('../data/temporal_clusters.tedges', header=False)
else:
    print("Loading dataset from remote URL...")
    with tempfile.TemporaryDirectory() as tmpdir:
        url = "https://raw.githubusercontent.com/pathpy/pathpyG/refs/heads/main/docs/data/temporal_clusters.tedges"
        file_path = os.path.join(tmpdir, 'temporal_clusters.tedges')
        request.urlretrieve(url, file_path)
        t = pp.io.read_csv_temporal_graph(file_path, header=False)

t = t.to(device)

Loading dataset from local path...

This example has created in such a way that the nodes naturally form three clusters, which are highlighted in the interactive visualization below:

In [31]:

Copied!

style = {}
style["node_color"] = ["green"] * 10 + ["red"] * 10 + ["blue"] * 10
pp.plot(t, **style, show_labels=False);
style = {}
style["node_color"] = ["green"] * 10 + ["red"] * 10 + ["blue"] * 10
pp.plot(t, **style, show_labels=False);

Modelling Causal Structures with Higher-Order De Bruijn Graphs¶

But what is the origin for the cluster pattern? In the visualization above, you will notice that the time-stamped edges randomly interconnect nodes within and across clusters, actually there is no correlation whatsoever between the topology of links and the cluster membership of the nodes. Hence, the notion of clusters does not correspond to the common idea of cluster patterns in static graphs, which we can highlight further by plotting the static time-aggregated network:

In [32]:

Copied!

pp.plot(t.to_static_graph(), **style, show_labels=False);
pp.plot(t.to_static_graph(), **style, show_labels=False);

In fact, the topology of this graph corresponds to that of a random graph, i.e. there are not patterns whatsoever in the topology of links. Nevertheless, the temporal graph contains a cluster pattern in the topology of causal or time-respecting paths. In particular, the temporal ordering of time-stamped edges is such that nodes with the same cluster label are more frequently connected by time-respecting paths than nodes with different cluster labels. Hence, nodes within the same clusters can more strongly influence each other in a causal way, i.e. via multiple interactions that follow the arrow of time.

Traditional (temporal) graph neural networks will not be able to learn from this pattern, as it is due to the specific microscopic temporal ordering of edges. Using higher-order De Bruijn graph models implemented in pathpyG, we can learn from temporal graph data that contains such patterns. Let us explain this step by step.

Referring to the previous tutorial on causal paths in temporal graphs, we first create a node-time directed acyclic graph that captures the causal structure of the temporal graph. In this small example, we will only consider two time-stamped edges $(u,v;t)$ and $(v,w;t')$ to contribute to a causal path iff $0 < t'-t \leq 1$, i.e. we use a delta for the maximum time difference of one time step.

In [33]:

skip-execution

Copied!

%%capture
m = pp.MultiOrderModel.from_temporal_graph(t, max_order=2)
%%capture
m = pp.MultiOrderModel.from_temporal_graph(t, max_order=2)

In [34]:

skip-execution

Copied!

print(m)
print(m)

MultiOrderModel with max. order 2

We can get the first and second order networks from the Multi Order Network object. The first order network is the network of nodes and edges, while the second order network is the network of first order edges as second order nodes and second order edges. The second order network is a De Bruijn graph that captures the temporal-topological patterns in the data.

In [35]:

skip-execution

Copied!

g = m.layers[1]
g2 = m.layers[2]
g = m.layers[1]
g2 = m.layers[2]

In [36]:

skip-execution

Copied!

pp.plot(g, edge_size=1, show_labels=False);
pp.plot(g, edge_size=1, show_labels=False);

Since it does not consider patterns in the causal topology of the temporal graph, this is not a meaningful model. We can instead use a second-order De Bruijn graph model, which we can easily fit to the paths:

In [37]:

skip-execution

Copied!





layout_style = {}
layout_style["layout"] = "Fruchterman-Reingold"
layout_style["seed"] = 1
layout_style["k"] = 0.5
layout_style["iterations"] = 300
layout = pp.layout(g2, **layout_style)
pp.plot(g2, backend="matplotlib", layout=layout, edge_size=0.5, node_size=3, show_labels=False);
layout_style = {}
layout_style["layout"] = "Fruchterman-Reingold"
layout_style["seed"] = 1
layout_style["k"] = 0.5
layout_style["iterations"] = 300
layout = pp.layout(g2, **layout_style)
pp.plot(g2, backend="matplotlib", layout=layout, edge_size=0.5, node_size=3, show_labels=False);

No description has been provided for this image

In this graph, every node is a link and links correspond to causal paths of length two, i.e. temporally ordered sequences consisting of two edges that overlap in the center node. In this graph, we clearly see a cluster pattern that is due to the way in which temporal edges are ordered in time. In particular, we see three clusters, where the edges in three of the clusters correspond to causal paths of length two that connect nodes within each of the three clusters. The edges in the fourth cluster (in the center of the visualization) represent causal paths that connect nodes in different clusters.

Comparison to Temporal Graph with Shuffled Time Stamps¶

You may wonder whether this pattern is really due to the temporal ordering of time-stamped edges. It is easy to check this. We can simply randomly shuffle the time stamps of all edges, which will break any correlations in the temporal ordering that lead to patterns in the causal topology.

We repeat the path calculation for this shuffled temporal graph and construct the second-order De Bruijn Graph model again:

In [38]:

skip-execution

Copied!

t_shuffled = deepcopy(t)
t_shuffled.shuffle_time()
t_shuffled = deepcopy(t)
t_shuffled.shuffle_time()

In [39]:

skip-execution

Copied!

%%capture
g2_shuffled = pp.MultiOrderModel.from_temporal_graph(t_shuffled, max_order=2).layers[2]
%%capture
g2_shuffled = pp.MultiOrderModel.from_temporal_graph(t_shuffled, max_order=2).layers[2]

In [40]:

skip-execution

Copied!

print(g2_shuffled)
print(g2_shuffled)

Directed graph with 557 nodes and 1993 edges
{   'Edge Attributes': {'edge_weight': "<class 'torch.Tensor'> -> torch.Size([1993])"},
    'Graph Attributes': {'inverse_idx': "<class 'torch.Tensor'> -> torch.Size([60000])", 'num_nodes': "<class 'int'>"},
    'Node Attributes': {}}

In [41]:

skip-execution

Copied!

layout = pp.layout(g2_shuffled, **layout_style)
pp.plot(g2_shuffled, backend="matplotlib", layout=layout, edge_size=0.5, node_size=3, show_labels=False);
layout = pp.layout(g2_shuffled, **layout_style)
pp.plot(g2_shuffled, backend="matplotlib", layout=layout, edge_size=0.5, node_size=3, show_labels=False);

We now find that the cluster pattern in the second-order graph has vanished. In fact, there is no pattern whatsoever since the underlying (static) graph topology is random and the random shuffling of time stamps leads to random causal paths.

Spectral clustering with second-order graph Laplacian¶

To take a different perspective on cluster patterns, we can actually use pathpyG to apply a spectral analysis to the higher-order graph. We can simply calculate a generalization of the Laplacian matrix to the second-order graph both for the actual temporal graph and its shuffled counterpart:

In [42]:

skip-execution

Copied!

L = g2.laplacian(normalization='rw', edge_attr='edge_weight')
L_shuffled= g2_shuffled.laplacian(normalization='rw',edge_attr='edge_weight')
L = g2.laplacian(normalization='rw', edge_attr='edge_weight')
L_shuffled= g2_shuffled.laplacian(normalization='rw',edge_attr='edge_weight')

We then calculate the eigenvalues and eigenvectors of the Laplacians, and compute the Fiedler vector, i.e. the eigenvector that corresponds to the second-smallest eigenvalue of the Laplacian.

In [43]:

skip-execution

Copied!

w,v = sp.linalg.eig(L.todense(),left= False, right = True)
w_shuffled, v_shuffled = sp.linalg.eig(L_shuffled.todense())
w,v = sp.linalg.eig(L.todense(),left= False, right = True)
w_shuffled, v_shuffled = sp.linalg.eig(L_shuffled.todense())

In [44]:

skip-execution

Copied!

fiedler = v[:,np.argsort(w)[1]]
fiedler_shuffled = v_shuffled[:,np.argsort(w_shuffled)[1]]
fiedler = v[:,np.argsort(w)[1]]
fiedler_shuffled = v_shuffled[:,np.argsort(w_shuffled)[1]]

Below, we show that the clusters in the causal topology of the temporal graph correspond to clusters in the distribution of entries in the Fiedler vector, while there is no such pattern for the Fiedler vector of the second-order graph constructed from the shuffled temporal graph:

In [45]:

Copied!





def higher_order_class_assignment(ho_node_id):
    """Assign class labels based on the higher-order node ids.
    
    There are three classes that were assigned based on the original node ids:
        - Class 0: node ids 0-9
        - Class 1: node ids 10-19
        - Class 2: node ids 20-29

    The higher-order patterns were constructed such that the clusters are formed by second-order nodes whose first-order node ids belong to the same class.
    We therefore assign higher-order class labels based on the first-order node ids in the higher-order node id tuple.
    """
    if ho_node_id[0] < 10 and ho_node_id[1] < 10:
        return 0
    elif ho_node_id[0] < 20 and ho_node_id[0] >= 10 and ho_node_id[1] < 20 and ho_node_id[1] >= 10:
        return 1
    elif ho_node_id[0] < 30 and ho_node_id[0] >= 20 and ho_node_id[1] < 30 and ho_node_id[1] >= 20:
        return 2
    else:
        return 3
def higher_order_class_assignment(ho_node_id):
    """Assign class labels based on the higher-order node ids.
    
    There are three classes that were assigned based on the original node ids:
        - Class 0: node ids 0-9
        - Class 1: node ids 10-19
        - Class 2: node ids 20-29

    The higher-order patterns were constructed such that the clusters are formed by second-order nodes whose first-order node ids belong to the same class.
    We therefore assign higher-order class labels based on the first-order node ids in the higher-order node id tuple.
    """
    if ho_node_id[0] < 10 and ho_node_id[1] < 10:
        return 0
    elif ho_node_id[0] < 20 and ho_node_id[0] >= 10 and ho_node_id[1] < 20 and ho_node_id[1] >= 10:
        return 1
    elif ho_node_id[0] < 30 and ho_node_id[0] >= 20 and ho_node_id[1] < 30 and ho_node_id[1] >= 20:
        return 2
    else:
        return 3

In [46]:

Copied!

colors = {0: 'green', 1: 'red', 2: 'blue', 3: 'gray'}
opacities = {0: 0.6, 1: 0.6, 2: 0.6, 3: 0.1}
colors = {0: 'green', 1: 'red', 2: 'blue', 3: 'gray'}
opacities = {0: 0.6, 1: 0.6, 2: 0.6, 3: 0.1}

In the plots below, we have colored those entries of the Fiedler vectors that correspond to edges connecting nodes within one of the three clusters shown above. The Fiedler vector shows a clear pattern, which translates to the cluster pattern in the causal topology that we have planted into our synthetic temporal graph.

In [47]:

skip-execution

Copied!





ho_class_ids = list(map(higher_order_class_assignment, g2.nodes))
plt.scatter(
    range(g2.n), np.real(fiedler), c=[colors[i] for i in ho_class_ids], alpha=[opacities[i] for i in ho_class_ids]
)
plt.ylim(-0.25, 0.25)
ho_class_ids = list(map(higher_order_class_assignment, g2.nodes))
plt.scatter(
    range(g2.n), np.real(fiedler), c=[colors[i] for i in ho_class_ids], alpha=[opacities[i] for i in ho_class_ids]
)
plt.ylim(-0.25, 0.25)

Out[47]:

(-0.25, 0.25)

No such pattern exists in the Fiedler vector of the second-order graph corresponding to the shuffled TemporalGraph.

In [48]:

skip-execution

Copied!





shuffled_ho_class_ids = list(map(higher_order_class_assignment, g2_shuffled.nodes))
plt.scatter(
    range(g2_shuffled.n),
    np.real(fiedler_shuffled),
    c=[colors[i] for i in shuffled_ho_class_ids],
    alpha=[opacities[i] for i in shuffled_ho_class_ids],
)
plt.ylim(-0.25, 0.25)
shuffled_ho_class_ids = list(map(higher_order_class_assignment, g2_shuffled.nodes))
plt.scatter(
    range(g2_shuffled.n),
    np.real(fiedler_shuffled),
    c=[colors[i] for i in shuffled_ho_class_ids],
    alpha=[opacities[i] for i in shuffled_ho_class_ids],
)
plt.ylim(-0.25, 0.25)

Out[48]:

(-0.25, 0.25)

Node Classification with Causality-Aware Graph Neural Networks¶

Let us now explore how we can develop a causality-aware deep graph learning architecture that utilizes this pattern in the causal topology. We will follow the architecture introduced in this work. The architecture actually performs message passing in higher-order models with multiple orders at once. In a final message passing step, a bipartite graph is used to obtain vector-space representations of actual nodes in the temporal graph.

We now set up a pytorch_geometric.Data object that contains all of the information needed to train the DBGNN model. For this, we can use a convenience function of the MultiOrderModel class in pathpyG. Combining a first- and a second-order model, this uses the edge indices and the weight tensors for a message passing scheme. it further constructs an edge_index of a bipartite graph that uses the last node in a second-order node to map messages back to first-order nodes.

In [49]:

skip-execution

Copied!

data = m.to_dbgnn_data(max_order=2, mapping="last")
data.y = torch.tensor([int(i) // 10 for i in t.nodes], device=device)
data = m.to_dbgnn_data(max_order=2, mapping="last")
data.y = torch.tensor([int(i) // 10 for i in t.nodes], device=device)

Training the model¶

We are now ready to train and evaluate our causality-aware graph neural network. We will frist create a random split of the nodes, set the optimizer and the hyperparameters of our model.

In [50]:

skip-execution

Copied!





data = RandomNodeSplit(num_val=0, num_test=0.3)(data)

model = DBGNN(num_features=[g.n, g2.n], num_classes=len(data.y.unique()), hidden_dims=[16, 32, 8], p_dropout=0.4).to(
    device
)

optimizer = torch.optim.Adam(model.parameters(), lr=0.005)
loss_function = torch.nn.CrossEntropyLoss()
data = RandomNodeSplit(num_val=0, num_test=0.3)(data)

model = DBGNN(num_features=[g.n, g2.n], num_classes=len(data.y.unique()), hidden_dims=[16, 32, 8], p_dropout=0.4).to(
    device
)

optimizer = torch.optim.Adam(model.parameters(), lr=0.005)
loss_function = torch.nn.CrossEntropyLoss()

The following function evaluates the prediction of our model based on the balanced accuracy score for categorical predictions.

In [51]:

skip-execution

Copied!

def test(model, data):
    """Evaluate the model on training and test data."""
    model.eval()

    _, pred = model(data).max(dim=1)

    metrics_train = balanced_accuracy_score(data.y[data.train_mask].cpu(), pred[data.train_mask].cpu().numpy())

    metrics_test = balanced_accuracy_score(data.y[data.test_mask].cpu(), pred[data.test_mask].cpu().numpy())

    return metrics_train, metrics_test
def test(model, data):
    """Evaluate the model on training and test data."""
    model.eval()

    _, pred = model(data).max(dim=1)

    metrics_train = balanced_accuracy_score(data.y[data.train_mask].cpu(), pred[data.train_mask].cpu().numpy())

    metrics_test = balanced_accuracy_score(data.y[data.test_mask].cpu(), pred[data.test_mask].cpu().numpy())

    return metrics_train, metrics_test

In [52]:

skip-execution

Copied!





losses = []
for epoch in range(50):
    output = model(data)
    loss = loss_function(output[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    losses.append(loss)

    if epoch % 10 == 0:
        train_ba, test_ba = test(model, data)
        print(f"Epoch: {epoch}, Loss: {loss}, Train balanced accuracy: {train_ba}, Test balanced accuracy: {test_ba}")
losses = []
for epoch in range(50):
    output = model(data)
    loss = loss_function(output[data.train_mask], data.y[data.train_mask])
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    losses.append(loss)

    if epoch % 10 == 0:
        train_ba, test_ba = test(model, data)
        print(f"Epoch: {epoch}, Loss: {loss}, Train balanced accuracy: {train_ba}, Test balanced accuracy: {test_ba}")

Epoch: 0, Loss: 1.8728970289230347, Train balanced accuracy: 0.3333333333333333, Test balanced accuracy: 0.3333333333333333
Epoch: 10, Loss: 0.8794518709182739, Train balanced accuracy: 0.6666666666666666, Test balanced accuracy: 0.6666666666666666
Epoch: 20, Loss: 0.3931295871734619, Train balanced accuracy: 0.9583333333333334, Test balanced accuracy: 1.0
Epoch: 30, Loss: 0.16060538589954376, Train balanced accuracy: 1.0, Test balanced accuracy: 1.0
Epoch: 40, Loss: 0.08556197583675385, Train balanced accuracy: 1.0, Test balanced accuracy: 1.0

Causality-aware latent space representation of nodes¶

We can inspect the model by plotting a latent space representation of the edges generated by the second-order layer of our architecture.

In [53]:

skip-execution

Copied!





model.eval()
latent = model.higher_order_layers[0].forward(data.x_h, data.edge_index_higher_order).detach()
latent = model.higher_order_layers[1].forward(latent, data.edge_index_higher_order).detach()
node_embedding = TSNE(n_components=2, learning_rate="auto", init="random").fit_transform(latent.cpu())

embedding_layout = {v: node_embedding[g2.mapping.to_idx(v)] for v in g2.nodes}
pp.plot(g2, backend="matplotlib", layout=embedding_layout, show_labels=False, edge_size=0.3, node_size=3, edge_opacity=0.1, node_color=[colors[i] for i in ho_class_ids]);
model.eval()
latent = model.higher_order_layers[0].forward(data.x_h, data.edge_index_higher_order).detach()
latent = model.higher_order_layers[1].forward(latent, data.edge_index_higher_order).detach()
node_embedding = TSNE(n_components=2, learning_rate="auto", init="random").fit_transform(latent.cpu())

embedding_layout = {v: node_embedding[g2.mapping.to_idx(v)] for v in g2.nodes}
pp.plot(g2, backend="matplotlib", layout=embedding_layout, show_labels=False, edge_size=0.3, node_size=3, edge_opacity=0.1, node_color=[colors[i] for i in ho_class_ids]);

We can further generate latent space representations of the nodes generated by the last bipartite layer of our architecture:

In [54]:

skip-execution

Copied!





model.eval()
latent = model.forward(data).detach()
node_embedding = TSNE(n_components=2, learning_rate='auto', init='random', perplexity=10).fit_transform(latent.cpu())

embedding_layout = {v: node_embedding[g.mapping.to_idx(v)] for v in g.nodes}
pp.plot(g, backend="matplotlib", layout=embedding_layout, show_labels=False, edge_size=0.3, node_size=3, edge_opacity=0.1, **style);
model.eval()
latent = model.forward(data).detach()
node_embedding = TSNE(n_components=2, learning_rate='auto', init='random', perplexity=10).fit_transform(latent.cpu())

embedding_layout = {v: node_embedding[g.mapping.to_idx(v)] for v in g.nodes}
pp.plot(g, backend="matplotlib", layout=embedding_layout, show_labels=False, edge_size=0.3, node_size=3, edge_opacity=0.1, **style);

In [ ]: