statistics
Functions to compute various graph statistics.
The functions in this module allow to compute various statistics on graphs
Example
import pathpyG as pp
# Generate a toy example graph.
g = pp.Graph.from_edge_list([("b", "c"), ("a", "b"), ("c", "d"), ("d", "a"), ("b", "d")])
# Calculate degree distribution and raw moments
d_dist = pp.statistics.degree_distribution(g)
k_1 = pp.statistics.degree_raw_moment(g, k=1)
k_2 = pp.statistics.degree_raw_moment(g, k=2)
avg_clustering_coefficient
¶
Calculates the average clustering coefficient \(C\) of the graph \(G=(V, E)\).
Given the local clustering coefficients \(C_u\) for all nodes \(u \in V\), the average clustering coefficient is defined as their mean:
Warning
This measurement of global clustering should not be confused with the global clustering coefficient defined as the fraction of closed paths of length two over all paths of length two in the graph.
Reference
Proposed by Watts and Strogatz in their seminal paper on "Collective dynamics of 'small-world' networks"1. Further details can be found in in Chapter 7.3 in Networks2 by Mark Newman.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
g
|
pathpyG.core.graph.Graph
|
The graph in which to calculate the average clustering coefficient. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The average clustering coefficient of the graph. |
Source code in src/pathpyG/statistics/clustering.py
closed_triads
¶
Calculates the set of edges that represent a closed triad around a given node \(v\).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
g
|
pathpyG.core.graph.Graph
|
The graph in which to calculate the list of closed triads. |
required |
v
|
str
|
The node around which to calculate the closed triads. |
required |
Source code in src/pathpyG/statistics/clustering.py
degree_assortativity
¶
Calculate the degree assortativity coefficient of the graph.
The degree assortativity coefficient \(r\) of a graph \(G=(V, E)\) is defined as
with the adjacency matrix \(A\), the degree \(d_i\) of node \(i\), the number of edges \(m\) and the Kronecker delta \(\delta_{ij}\).
For computational reasons, we calculate the coefficient as follows:
where \(S_l = \sum_{i} d_i^l\) for \(l=1,2,3\) and \(S_e = \sum_{(i,j) \in E} d_i d_j\).
Reference
You can find the defintions above with further explanations in Equations (10.27) and (10.28) in Chapter 10.7 in Networks1 by Mark Newman.
-
Newman, M. E. J. Networks. (Oxford University Press, 2018). doi:10.1093/oso/9780198805090.001.0001. ↩
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
pathpyG.core.graph.Graph
|
The graph for which to calculate the degree assortativity |
required |
mode
|
str
|
'in', 'out' or 'total' for directed graphs, ignored for undirected graphs |
'total'
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The degree assortativity coefficient of the graph |
Source code in src/pathpyG/statistics/degrees.py
degree_central_moment
¶
Calculates the k-th central moment of the degree distribution.
The k-th central moment \(\mu_k\) of the degree distribution \(P_{mode}(d)\) of a graph \(G=(V, E)\) is defined as
where \(\langle d_{mode} \rangle\) is the mean degree of the graph and \(P_{mode}(d)\) is the degree distribution.
Note
The 2nd central moment corresponds to the variance of the degree distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
pathpyG.core.graph.Graph
|
The graph for which to calculate the k-th central moment |
required |
k
|
int
|
The order of the moment to calculate |
1
|
mode
|
str
|
'in', 'out' or 'total' for directed graphs, ignored for undirected graphs |
'total'
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The k-th central moment of the degree distribution |
Source code in src/pathpyG/statistics/degrees.py
degree_distribution
¶
Calculates the (unweighted) degree distribution of a graph.
The degree distribution \(P_{mode}(d)\) of a graph \(G=(V, E)\) is defined as
with \(N_d = |\{v_i \in V : d_{mode}(v_i) = d\}|\) being the number of nodes with degree \(d\) and \(n\) being the total number of nodes in the graph. The modes are defined as follows:
- 'in': In-degree \(d_{in}(v_i)\) of node \(v_i\), i.e. the number of incoming edges \(|\{(v_j, v_i) \in E\}|\) from any other node \(v_j\)
- 'out': Out-degree \(d_{out}(v_i)\) of node \(v_i\), i.e. the number of outgoing edges \(|\{(v_i, v_j) \in E\}|\) to any other node \(v_j\)
- 'total': Total degree \(d_{total}(v_i)\) of node \(v_i\), i.e. the sum of in-degree and out-degree \(d_{in}(v_i) + d_{out}(v_i)\)
The degree distribution is returned as a torch.Tensor of shape (d_max + 1,) where d_max is the maximum degree in the graph.
Reference
For further reading, see Chapter 10.3 in Networks1 by Mark Newman.
-
Newman, M. E. J. Networks. (Oxford University Press, 2018). doi:10.1093/oso/9780198805090.001.0001. ↩
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
pathpyG.core.graph.Graph
|
The |
required |
mode
|
str
|
'in', 'out' or 'total' for directed graphs, ignored for undirected graphs |
'total'
|
Returns:
| Type | Description |
|---|---|
torch.Tensor
|
torch.Tensor: A tensor containing the degree distribution |
Source code in src/pathpyG/statistics/degrees.py
degree_generating_function
¶
Returns the generating function of the degree distribution of a graph.
Returns \(f(x)\) where \(f\) is the probability generating function for the degree distribution \(P_{mode}(d)\) for a graph \(G=(V, E)\) defined as
The function is defined in the interval \(\left[0,1\right]\). The following properties hold:
-
The values of the degree distribution \(P_{mode}(d)\) can be retrieved from the generating function via $$ P_{mode}(d) = \left[\frac{1}{d!} \frac{d^d f}{dx^d}\right]_{x=0} $$
with \(\frac{d^d}{dx^d} f\) being the d-th derivative of \(f\) by \(x\).
-
The \(k\)-th raw moment of the degree distribution can be retrieved from the generating function with
\[ \left[\left(x \frac{d}{dx}\right)^k f\right]_{x=1} = \langle d_{mode}^k \rangle. \]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
pathpyG.core.graph.Graph
|
The graph for which the generating function shall be computed |
required |
x
|
float | list[float] | numpy.ndarray | torch.Tensor
|
float, list, numpy.ndarray, or torch.Tensor The argument(s) for which value(s) \(f(x)\) shall be computed. |
required |
mode
|
str
|
'in', 'out' or 'total' for directed graphs, ignored for undirected graphs |
'total'
|
Returns:
| Type | Description |
|---|---|
float | torch.Tensor
|
float or torch.Tensor: The value(s) of the generating function at x |
Examples:
Generate simple network:
>>> import pathpyG as pp
>>>
>>> g = pp.Graph.from_edge_list(
... [('a', 'b'), ('b', 'c'), ('a', 'c'), ('c', 'd'), ('d', 'e'), ('d', 'f'), ('e', 'f')]
... ).to_undirected()
Return single function value:
Plot generating function of degree distribution
>>> x = list(range(10))
>>> y = pp.statistics.degree_generating_function(g, x)
>>> print(y)
tensor([ 0.0000, 1.0000, 5.3333, 15.0000, 32.0000, 58.3333, 96.0000,
147.0000, 213.3333, 297.0000])
Source code in src/pathpyG/statistics/degrees.py
250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 | |
degree_raw_moment
¶
Calculates the k-th raw moment of the degree distribution of a graph.
The k-th raw moment \(\langle d^k \rangle\) of the degree distribution \(P_{mode}(d)\) of a graph \(G=(V, E)\) is defined as
Reference
For further reading, see Equation 10.20 in Chapter 10.4 in Networks1 by Mark Newman.
-
Newman, M. E. J. Networks. (Oxford University Press, 2018). doi:10.1093/oso/9780198805090.001.0001. ↩
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
pathpyG.core.graph.Graph
|
The graph in which to calculate the k-th raw moment |
required |
k
|
int
|
The order of the moment to calculate |
1
|
mode
|
str
|
'in', 'out' or 'total' for directed graphs, ignored for undirected graphs |
'total'
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The k-th raw moment of the degree distribution |
Source code in src/pathpyG/statistics/degrees.py
degree_sequence
¶
Calculates the (unweighted) degree sequence of a graph.
Returns the degree sequence of a graph \(G=(V, E)\) defined as
with \(d_{mode}(v_i)\) being the degree of node \(v_i\) in mode 'in', 'out' or 'total'. The modes are defined as follows:
- 'in': In-degree \(d_{in}(v_i)\) of node \(v_i\), i.e. the number of incoming edges \(|\{(v_j, v_i) \in E\}|\) from any other node \(v_j\)
- 'out': Out-degree \(d_{out}(v_i)\) of node \(v_i\), i.e. the number of outgoing edges \(|\{(v_i, v_j) \in E\}|\) to any other node \(v_j\)
- 'total': Total degree \(d_{total}(v_i)\) of node \(v_i\), i.e. the sum of in-degree and out-degree \(d_{in}(v_i) + d_{out}(v_i)\)
The degree sequence is returned as a torch.Tensor of shape (n,) where \(n\) is the number of nodes in the graph.
The order of the degree sequence corresponds to the indexing of the nodes in the graph and
the index of a node \(v_i\) given by its label can be accessed via graph.mapping.to_idx(v_i).
Reference
For further reading, see Chapter 10.3 in Networks1 by Mark Newman.
-
Newman, M. E. J. Networks. (Oxford University Press, 2018). doi:10.1093/oso/9780198805090.001.0001. ↩
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
pathpyG.core.graph.Graph
|
The |
required |
mode
|
str
|
'in', 'out' or 'total' for directed graphs, ignored for undirected graphs |
'total'
|
Returns:
| Type | Description |
|---|---|
torch.Tensor
|
torch.Tensor: A tensor containing the degree sequence |
Source code in src/pathpyG/statistics/degrees.py
local_clustering_coefficient
¶
Calculates the local clustering coefficient \(C_u\) for a given node \(u\) from a graph \(G=(V, E)\).
The local clustering coefficient is defined as the fraction of closed triads around node \(u\) over the number of possible triads.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
g
|
pathpyG.core.graph.Graph
|
The graph in which to calculate the clustering coefficient. |
required |
u
|
str
|
The node for which to calculate the clustering coefficient. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The local clustering coefficient of node u. |
Source code in src/pathpyG/statistics/clustering.py
mean_degree
¶
Calculates the mean degree of a graph.
The mean degree \(\langle d \rangle\) of a graph \(G=(V, E)\) is defined as
with \(d_{mode}(v_i)\) being the degree of node \(v_i\) in mode 'in', 'out' or 'total'. The modes are defined as follows:
- 'in': In-degree \(d_{in}(v_i)\) of node \(v_i\), i.e. the number of incoming edges \(|\{(v_j, v_i) \in E\}|\) from any other node \(v_j\)
- 'out': Out-degree \(d_{out}(v_i)\) of node \(v_i\), i.e. the number of outgoing edges \(|\{(v_i, v_j) \in E\}|\) to any other node \(v_j\)
- 'total': Total degree \(d_{total}(v_i)\) of node \(v_i\), i.e. the sum of in-degree and out-degree \(d_{in}(v_i) + d_{out}(v_i)\)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
pathpyG.core.graph.Graph
|
The graph for which to calculate the mean degree |
required |
mode
|
str
|
'in', 'out' or 'total' for directed graphs, ignored for undirected graphs |
'total'
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The mean degree of the graph |
Source code in src/pathpyG/statistics/degrees.py
mean_neighbor_degree
¶
Calculates the mean neighbor degree of a graph.
The mean neighbor degree \(\langle d_{\mathcal{N}} \rangle\) of a graph \(G=(V, E)\) is defined as
with the number of edges \(m\) (1), the set of neighbors \(\mathcal{N}(v_i)\) of node \(v_i\) and \(d_{mode}(v_j)\) being the degree of neighbor node \(v_j \in \mathcal{N}(v_i)\) in mode 'in', 'out' or 'total'.
- This definition is for directed graphs. For undirected graphs, the denominator is \(2m\).
The modes are defined as follows:
- 'in': In-degree \(d_{in}(v_i)\) of node \(v_i\), i.e. the number of incoming edges \(|\{(v_j, v_i) \in E\}|\) from any other node \(v_j\)
- 'out': Out-degree \(d_{out}(v_i)\) of node \(v_i\), i.e. the number of outgoing edges \(|\{(v_i, v_j) \in E\}|\) to any other node \(v_j\)
- 'total': Total degree \(d_{total}(v_i)\) of node \(v_i\), i.e. the sum of in-degree and out-degree \(d_{in}(v_i) + d_{out}(v_i)\)
Reference
For further reading, see Chapter 12.2 in Networks1 by Mark Newman.
-
Newman, M. E. J. Networks. (Oxford University Press, 2018). doi:10.1093/oso/9780198805090.001.0001. ↩
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
graph
|
pathpyG.core.graph.Graph
|
The graph for which to calculate the mean neighbor degree |
required |
mode
|
str
|
'in', 'out' or 'total' for directed graphs, ignored for undirected graphs |
'total'
|
exclude_backlink
|
bool
|
Whether to exclude the backlink to the original node when calculating neighbor degrees |
False
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
The mean neighbor degree of the graph |