Causality-Aware Graph Neural Networks¶
Prerequisites¶
First, we need to set up our Python environment that has PyTorch, PyTorch Geometric and PathpyG installed. Depending on where you are executing this notebook, this might already be (partially) done. E.g. Google Colab has PyTorch installed by default so we only need to install the remaining dependencies. The DevContainer that is part of our GitHub Repository on the other hand already has all of the necessary dependencies installed.
In the following, we install the packages for usage in Google Colab using Jupyter magic commands. For other environments comment in or out the commands as necessary. For more details on how to install pathpyG especially if you want to install it with GPU-support, we refer to our documentation. Note that %%capture discards the full output of the cell to not clutter this tutorial with unnecessary installation details. If you want to print the output, you can comment %%capture out.
%%capture
# !pip install torch
# !pip install torch_geometric
# !pip install git+https://github.com/pathpy/pathpyG.git
Motivation and Learning Objectives¶
In previous tutorials, we have introduced causal paths in temporal graphs, and how we can use them to generate higher-order De Bruijn graph models that capture temporal-topological patterns in time series data. In this tutorial, we will show how we can use De Bruijn Graph Neural Networks, a causality-aware deep learning architecture for temporal graph data. The details of this approach are introduced in this paper. The architecture is implemented in pathpyG and can be readily applied to temporal graph data.
Below we illustrate this method in a supervised node classification task, i.e. given a temporal graph we will use the temporal-topological patterns in the graph to classify nodes.
We start by importing a few modules:
import os
import tempfile
from copy import deepcopy
from urllib import request
import matplotlib.pyplot as plt
import numpy as np
import scipy as sp
import torch
from sklearn.manifold import TSNE
from sklearn.metrics import balanced_accuracy_score
from torch_geometric.transforms import RandomNodeSplit
import pathpyG as pp
from pathpyG.nn.dbgnn import DBGNN
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Temporal-Topological Clusters in Temporal Graphs¶
Let us load a small synthetic toy example for a temporal graph with 60.000 time-stamped interactions between 30 nodes. We use the TemporalGraph class to load this example from a file containing edges with discrete time-stamps.
Dataset Availability
Depending on how you are executing this notebook, you may need to download the dataset first. We automatically check if the dataset is available in the relative path ../data/temporal_clusters.tedges, which is the default location if you cloned the pathpyG repository. If the file is not found, we download it from the GitHub repository.
if os.path.exists('../data/temporal_clusters.tedges'):
print("Loading dataset from local path...")
t = pp.io.read_csv_temporal_graph('../data/temporal_clusters.tedges', header=False)
else:
print("Loading dataset from remote URL...")
with tempfile.TemporaryDirectory() as tmpdir:
url = "https://raw.githubusercontent.com/pathpy/pathpyG/refs/heads/main/docs/data/temporal_clusters.tedges"
file_path = os.path.join(tmpdir, 'temporal_clusters.tedges')
request.urlretrieve(url, file_path)
t = pp.io.read_csv_temporal_graph(file_path, header=False)
t = t.to(device)
Loading dataset from local path...
This example has created in such a way that the nodes naturally form three clusters, which are highlighted in the interactive visualization below:
style = {}
style["node_color"] = ["green"] * 10 + ["red"] * 10 + ["blue"] * 10
pp.plot(t, **style, show_labels=False);