library(ergm)
# Simulating a Bernoulli network
<- matrix(rbinom(100^2, 1, 0.1), 100, 100)
Y diag(Y) <- NA
cbind(
logit = glm(as.vector(Y) ~ 1, family = binomial) |> coef(),
ergm = ergm(Y ~ edges) |> coef(),
avg = mean(Y, na.rm = TRUE) |> qlogis()
)## logit ergm avg
## (Intercept) -2.210766 -2.210766 -2.210766
Introduction to ERGMs
Have you ever wondered if the friend of your friend is your friend? Or if the people you consider to be your friends feel the same about you? Or if age is related to whether you seek advice from others? All these (and many others certainly more creative) questions can be answered using Exponential-Family Random Graph Models.
1 Introduction
Exponential-Family Random Graph Models, known as ERGMs, ERG Models, or p^* models (Holland and Leinhardt (1981), Frank and Strauss (1986), Wasserman and Pattison (1996), Snijders et al. (2006), Robins et al. (2007), and many others) are a large family of statistical distributions for random graphs. In ERGMs, the focus is on the processes that give origin to the network rather than the individual ties.1
The most common representation of ERGMs is the following:
P_{\mathcal{Y}, \bm{\theta}}(\bm{Y}=\bm{y}) = \exp\left(\bm{\theta}^{\mathbf{t}} s(y)\right)\kappa(\bm{\theta})^{-1}
where \bm{Y} is a random graph, \bm{y}\in \mathcal{Y} is a particular realization of Y, \bm{\theta} is a vector of parameters, s(\bm{y}) is a vector of sufficient statistics, and \kappa(\bm{\theta}) is a normalizing constant. The normalizing constant is defined as:
\kappa(\bm{\theta}) = \sum_{\bm{y} \in \mathcal{Y}} \exp\left(\bm{\theta}^{\mathbf{t}} s(\bm{y})\right)
From the statistical point of view, the normalizing constant makes this model attractive; only cases where \mathcal{Y} is small enough to be enumerated are feasible (Vega Yon, Slaughter, and Haye 2021). Because of that reason, estimating ERGMs is a challenging task.
In more simple terms, ERG models resemble Logistic regressions. The term \bm{\theta}^{\mathbf{t}} s(\bm{y}) is the sum of the product between the parameters and the statistics (like you would see in a Logistic regression,) and its key component is the composition of the vector of sufficient statistics, s(\bm{y}). The latter is what defines the model.
The vector of sufficient statistics dictates the data-generating process of the network. Like the mean and variance characterize the normal distribution, sufficient statistics (and corresponding parameters) characterize the ERG distribution. Figure 1 shows some examples of sufficient statistics with their corresponding parametrizations.
The ergm
package has many terms we can use to fit these models. You can explore the available terms in the manual ergm.terms
in the ergm
package. Take a moment to explore the manual and see the different terms available.
2 Dyadic independence (p1)
The simplest ERG model we can fit is a Bernoulli (Erdos-Renyi) model. Here, the only statistic is the edgecount. Now, if we can write the function computing sufficient statistics as a sum over all edges, i.e., s(\bm{y}) = \sum_{ij}s'(y_{ij}), the model reduces to a Logistic regression.
When we see this, we say that dyads are independent of each other; this is also called p1 models (Holland and Leinhardt 1981). As a second example, imagine that we are interested in assessing whether gender homophily (the tendency between individuals of the same gender to connect) is present in a network. ERG models are the right tool for this task. Moreover, if we assume that gender homophily is the only mechanism that governs the network, the problem reduces to a Logistic regression:
P_{\mathcal{Y}, \bm{\theta}}(y_{ij} = 1) = \text{Logit}^{-1}\left(\theta_{edges} + \theta_{homophily}\mathbb{1}\left(X_i = X_j\right)\right)
where \mathbb{1}\left(\cdot\right) is the indicator function, and X_i equals one if the ith individual is female, and zero otherwise. To see this, let’s simulate some data. We will use the simulate_formula
function in the ergm
package. All we need is a network, a model, and some arbitrary coefficients. We start with an indicator vector for gender (1: female, male otherwise):
# Simulating data
set.seed(7731)
<- rbinom(100, 1, 0.5)
X head(X)
[1] 0 1 0 0 1 0
Next, we use the functions from the ergm
and network
packages to simulate a network with homophily:
# Simulating the network with homophily
library(ergm)
# A network with 100 vertices
<- network(100)
Y
# Adding the vertex attribute with the %v% operator
%v% "X" <- X
Y
# Simulating the network
<- ergm::simulate_formula(
Y # The model
~ edges + nodematch("X"),
Y # The coefficients
coef = c(-3, 2)
)
In this case, the network tends to be sparse (negative edges coefficient) and present homophilic ties (positive nodematch coefficient). Using the sna
package (also from the statnet
suite), we can plot the network:
library(sna)
gplot(Y, vertex.col = c("green", "red")[Y %v% "X" + 1])
Using the ERGM package, we can fit the model using code very similar to the one used to simulate the network:
<- ergm(Y ~ edges + nodematch("X")) fit_homophily
We will check the results later and compare them against the following: the Logit model. The following code chunk implements the logistic regression model for this particular model. This example only proves that ERGMs are a generalization of Logistic regression.
Code
<- 100
n <- summary_formula(Y ~ edges + nodematch("X"))
sstats <- as.matrix(Y)
Y_mat diag(Y_mat) <- NA
# To speedup computations, we pre-compute this value
<- outer(X, X, "==") |> as.numeric()
X_vec
# Function to compute the negative loglikelihood
<- \(theta) {
obj
# Compute the probability according to the value of Y
<- ifelse(
p == 1,
Y_mat
# If Y = 1
plogis(theta[1] + X_vec * theta[2]),
# If Y = 0
1 - plogis(theta[1] + X_vec * theta[2])
)
# The -loglikelihood
-sum(log(p[!is.na(p)]))
}
And, using the optim
function, we can fit the model:
# Fitting the model
<- optim(c(0,0), obj, hessian = TRUE) fit_homophily_logit
Now that we have the values let’s compare them:
# The coefficients
cbind(
theta_ergm = coef(fit_homophily),
theta_logit = fit_homophily_logit$par,
sd_ergm = vcov(fit_homophily) |> diag() |> sqrt(),
sd_logit = sqrt(diag(solve(fit_homophily_logit$hessian)))
)
theta_ergm theta_logit sd_ergm sd_logit
edges -2.973331 -2.973829 0.06659648 0.06661194
nodematch.X 1.926039 1.926380 0.07395580 0.07397025
As you can see, both models yielded the same estimates because they are the same! Before continuing, let’s review a couple of important results in ERGMs.
3 The most important results
If we were able to say what two of the most important results in ERGMs are, I would say the following: (a) conditioning on the rest of the graph, the probability of a tie distributes Logistic (Bernoulli), and (b) the ratio between two loglikelihoods can be approximated through simulation.
3.1 The logistic distribution
Let’s start by stating the result: Conditioning on all graphs that are not y_{ij}, the probability of a tie Y_{ij} is distributed Logistic; formally:
P_{\mathcal{Y}, \bm{\theta}}(Y_{ij}=1 | \bm{y}_{-ij}) = \frac{1}{1 + \exp \left(\bm{\theta}^{\mathbf{t}}\delta_{ij}(\bm{y}){}\right)},
where \delta_{ij}(\bm{y}){}\equiv s_{ij}^+(\bm{y}) - s_{ij}^-(\bm{y}) is the change statistic, and s_{ij}^+(\bm{y}) and s_{ij}^-(\bm{y}) are the statistics of the graph with and without the tie Y_{ij}, respectively.
The importance of this result is two-fold: (a) we can use this equation to interpret fitted models in the context of a single graph (like using odds,) and (b) we can use this equation to simulate from the model, without touching the normalizing constant.
3.2 The ratio of loglikelihoods
The second significant result is that the ratio of loglikelihoods can be approximated through simulation. It is based on the following observation by Geyer and Thompson (1992):
\frac{\kappa(\bm{\theta})}{\kappa(\bm{\theta}_0)} = \mathbb{E}_{\mathcal{Y}, \bm{\theta}_0}\left((\bm{\theta} - \bm{\theta}_0)s(\bm{y})^{\mathbf{t}}\right),
Using the latter, we can approximate the following loglikelihood ratio:
\begin{align*} l(\bm{\theta}) - l(\bm{\theta}_0) = & (\bm{\theta} - \bm{\theta}_0)^{\mathbf{t}}s(\bm{y}) - \log\left[\frac{\kappa(\bm{\theta})}{\kappa(\bm{\theta}_0)}\right]\\ \approx & (\bm{\theta} - \bm{\theta}_0)^{\mathbf{t}}s(\bm{y}) - \log\left[M^{-1}\sum_{\bm{y}^{(m)}} (\bm{\theta} - \bm{\theta}_0)^{\mathbf{t}}s(\bm{y}^{(m)})\right] \end{align*}
Where \bm{\theta}_0 is an arbitrary vector of parameters, and \bm{y}^{(m)} are sampled from the distribution P_{\mathcal{Y}, \bm{\theta}_0}. In the words of Geyer and Thompson (1992), “[…] it is possible to approximate \bm{\theta} by using simulations from one distribution P_{\mathcal{Y}, \bm{\theta}_0} no matter which \bm{\theta}_0 in the parameter space is.” This result has been key for the MC-MLE algorithm (see Hunter, Krivitsky, and Schweinberger (2012)), which has been front and center in the ergm
R package.2
4 Start to finish example
There is no silver bullet to fit ERGMs. However, the following steps are a good starting point:
Inspect the data: Ensure the network you work with is processed correctly. Some common mistakes include isolates being excluded, vertex attributes not being adequately aligned (mismatch), etc.
Start with endogenous effects first: Before jumping into vertex/edge-covariates, try fitting models that only include structural terms such as edgecount, triangles (or their equivalent,) stars, reciprocity, etc.
After structure is controlled: You can add vertex/edge-covariates. The most common ones are homophily, nodal-activity, etc.
Evaluate your results: Once you have a model you are happy with, the last couple of steps are (a) assess convergence (which is usually done automagically by the
ergm
package,) and (b) assess goodness-of-fit, which in this context means how good was our model to capture (not-controlled for) properties of the network.
Although we could go ahead and use an existing dataset to play with, instead, we will simulate a directed graph with the following properties:
- 100 nodes.
- Homophily on Music taste.
- Gender heterophily.
- Reciprocity
The following code chunk illustrates how to do this in R. Notice that, like the previous example, we need to create a network with the vertex attributes needed to simulate homophily.
# Simulating the covariates (vertex attribute)
set.seed(1235)
# Simulating the data
<- network(100, directed = TRUE)
Y %v% "fav_music" <- sample(c("rock", "jazz", "pop"), 100, replace = TRUE)
Y %v% "female" <- rbinom(100, 1, 0.5) Y
Now that we have a starting point for the simulation, we can use the simulate_formula
function to get our network:
<- ergm::simulate_formula(
Y ~
Y +
edges nodematch("fav_music") +
nodematch("female") +
mutual,coef = c(-4, 1, -1, 2)
)
# And visualize it!
gplot(Y, vertex.col = c("green", "red")[Y %v% "female" + 1])
This figure is precisely why we need ERGMs (and why many Sunbelt talks don’t include a network visualization!). We know the graph has structure (it’s not random), but visually, it is hard to see.
4.1 Inspect the data
For the sake of time, we will not take the time to investigate our network properly. However, you should always do so. Make sure you do descriptive statistics (density, centrality, modularity, etc.), check missing values, isolates (disconnected nodes), and inspect your data visually through “notepad” and visualizations before jumping into your ERG model.
4.2 Start with endogenous effects first
The step is to check whether we can fit an ERGM or not. We can do so with the Bernoulli graph:
<- ergm(Y ~ edges)
model_1 summary(model_1)
Call:
ergm(formula = Y ~ edges)
Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -3.78885 0.06833 0 -55.45 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 13724 on 9900 degrees of freedom
Residual Deviance: 2102 on 9899 degrees of freedom
AIC: 2104 BIC: 2112 (Smaller is better. MC Std. Err. = 0)
It is rare to see a model in which the edgecount is not significant. The next term we will add is reciprocity (mutual
in the ergm
package)
<- ergm(Y ~ edges + mutual)
model_2 summary(model_2)
## Call:
## ergm(formula = Y ~ edges + mutual)
##
## Monte Carlo Maximum Likelihood Results:
##
## Estimate Std. Error MCMC % z value Pr(>|z|)
## edges -3.93277 0.07833 0 -50.205 <1e-04 ***
## mutual 2.15152 0.31514 0 6.827 <1e-04 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Null Deviance: 13724 on 9900 degrees of freedom
## Residual Deviance: 2064 on 9898 degrees of freedom
##
## AIC: 2068 BIC: 2083 (Smaller is better. MC Std. Err. = 0.8083)
As expected, reciprocity is significant (we made it like this!.) Notwithstanding, there is a difference between this model and the previous one. This model was not fitted using MLE. Instead, since the reciprocity term involves more than one tie, the model cannot be reduced to a Logistic regression, so it needs to be estimated using one of the other available estimation methods in the ergm
package.
The model starts gaining complexity as we add higher-order terms involving more ties. An infamous example is the number of triangles. Although highly important for social sciences, including triangle terms in your ERGMs results in a degenerate model (when the MCMC chain jumps between empty and fully connected graphs). One exception is if you deal with small networks. To address this, Snijders et al. (2006) and Hunter (2007) introduced various new terms that significantly reduce the risk of degeneracy. Here, we will illustrate the use of the term “geometrically weighted dyadic shared partner” (gwdsp
,) which Prof. David Hunter proposed. The gwdsp
term is akin to triadic closure but reduces the chances of degeneracy.
# Fitting two more models (output message suppressed)
<- ergm(Y ~ edges + mutual + gwdsp(.5, fixed = TRUE))
model_3 # model_4 <- ergm(Y ~ edges + triangles) # bad idea
Right after fitting a model, we want to inspect the results. An excellent tool for this is the R package texreg
(Leifeld 2013):
library(texreg)
knitreg(list(model_1, model_2, model_3))
Model 1 | Model 2 | Model 3 | |
---|---|---|---|
edges | -3.79*** | -3.93*** | -3.84*** |
(0.07) | (0.08) | (0.21) | |
mutual | 2.15*** | 2.14*** | |
(0.32) | (0.29) | ||
gwdsp.OTP.fixed.0.5 | -0.02 | ||
(0.05) | |||
AIC | 2104.43 | 2068.22 | 2071.94 |
BIC | 2111.63 | 2082.62 | 2093.54 |
Log Likelihood | -1051.22 | -1032.11 | -1032.97 |
***p < 0.001; **p < 0.01; *p < 0.05 |
So far, model_2
is winning. We will continue with this model.
4.3 Let’s add a little bit of structure
Now that we only have a model featuring endogenous terms, we can add vertex/edge-covariates. Starting with "fav_music"
, there are a couple of different ways to use this node feature:
Directly through homophily (assortative mixing): Using the
nodematch
term, we can control for the propensity of individuals to connect based on shared music taste.Homophily (v2): We could activate the option
diff = TRUE
using the same term. By doing this, the homophily term is operationalized differently, adding as many terms as options in the vertex attribute.Mixing: We can use the term
nodemix
for individuals’ tendency to mix between musical tastes.
<- ergm(Y ~ edges + mutual + nodematch("fav_music"))
model_4 <- ergm(Y ~ edges + mutual + nodematch("fav_music", diff = TRUE))
model_5 <- ergm(Y ~ edges + mutual + nodemix("fav_music")) model_6
Now, let’s inspect what we have so far:
knitreg(list(`2` = model_2, `4` = model_4, `5` = model_5, `6` = model_6))
2 | 4 | 5 | 6 | |
---|---|---|---|---|
edges | -3.93*** | -4.29*** | -4.29*** | -3.56*** |
(0.08) | (0.11) | (0.11) | (0.24) | |
mutual | 2.15*** | 1.99*** | 2.00*** | 2.02*** |
(0.32) | (0.30) | (0.30) | (0.30) | |
nodematch.fav_music | 0.85*** | |||
(0.14) | ||||
nodematch.fav_music.jazz | 0.74** | |||
(0.25) | ||||
nodematch.fav_music.pop | 0.82*** | |||
(0.18) | ||||
nodematch.fav_music.rock | 0.87*** | |||
(0.17) | ||||
mix.fav_music.pop.jazz | -0.54 | |||
(0.34) | ||||
mix.fav_music.rock.jazz | -0.75* | |||
(0.38) | ||||
mix.fav_music.jazz.pop | -0.60 | |||
(0.35) | ||||
mix.fav_music.pop.pop | 0.10 | |||
(0.27) | ||||
mix.fav_music.rock.pop | -0.50 | |||
(0.31) | ||||
mix.fav_music.jazz.rock | -1.40** | |||
(0.44) | ||||
mix.fav_music.pop.rock | -0.86* | |||
(0.33) | ||||
mix.fav_music.rock.rock | 0.15 | |||
(0.27) | ||||
AIC | 2068.22 | 2030.85 | 2033.57 | 2036.37 |
BIC | 2082.62 | 2052.45 | 2069.57 | 2108.38 |
Log Likelihood | -1032.11 | -1012.42 | -1011.79 | -1008.19 |
***p < 0.001; **p < 0.01; *p < 0.05 |
Although model 5 has a higher loglikelihood, using AIC or BIC suggests model 4 is a better candidate. For the sake of time, we will jump ahead and add nodematch("female")
as the last term of our model. The next step is to assess (a) convergence and (b) goodness-of-fit.
<- ergm(Y ~ edges + mutual + nodematch("fav_music") + nodematch("female"))
model_final
# Printing the pretty table
knitreg(list(`2` = model_2, `4` = model_4, `Final` = model_final))
2 | 4 | Final | |
---|---|---|---|
edges | -3.93*** | -4.29*** | -3.95*** |
(0.08) | (0.11) | (0.12) | |
mutual | 2.15*** | 1.99*** | 1.86*** |
(0.32) | (0.30) | (0.33) | |
nodematch.fav_music | 0.85*** | 0.81*** | |
(0.14) | (0.14) | ||
nodematch.female | -0.74*** | ||
(0.15) | |||
AIC | 2068.22 | 2030.85 | 2002.18 |
BIC | 2082.62 | 2052.45 | 2030.98 |
Log Likelihood | -1032.11 | -1012.42 | -997.09 |
***p < 0.001; **p < 0.01; *p < 0.05 |
4.4 Evaluate your results
4.4.1 Convergence
As our model was fitted using MCMC, we must ensure the chains converged. We can use the mcmc.diagnostics
function from the ergm
package to check model convergence. This function looks at the last set of simulations of the MCMC model and generates various diagnostics for the user.
Under the hood, the fitting algorithm generates a stream of networks based on those parameters for each new proposed set of model parameters. The last stream of networks is thus simulated using the final state of the model. The mcmc.diagnostics
function takes that stream of networks and plots the sequence of the sufficient statistics included in the model. A converged model should show a stationary statistics sequence, moving around a fixed value without (a) becoming stuck at any point and (b) chaining the tendency. This model shows both:
mcmc.diagnostics(model_final, which = c("plots"))
Note: MCMC diagnostics shown here are from the last round of
simulation, prior to computation of final parameter estimates.
Because the final estimates are refinements of those used for this
simulation run, these diagnostics may understate model performance.
To directly assess the performance of the final model on in-model
statistics, please use the GOF command: gof(ergmFitObject,
GOF=~model).
Now that we know our model was good enough to represent the observed statistics (sample them, actually,) let’s see how good it is at capturing other features of the network that were not included in the model.
4.4.2 Goodness-of-fit
This would be the last step in the sequence of steps to fit an ERGM. As we mentioned before, the idea of Goodness-of-fit [GOF] in ERG models is to see how well our model captures other properties of the graph that were not included in the model. By default, the gof
function from the ergm
package computes GOF for:
The model statistics.
The outdegree distribution.
The indegree distribution.
The distribution of edge-wise shared partners.
The distribution of the geodesic distances (shortest path).
The process of evaluating GOF is relatively straightforward. Using samples from the posterior distribution, we check whether the observed statistics from above are covered (fall within the CI) of our model. If they do, we say that the model has a good fit. Otherwise, if we observe significant anomalies, we return to the bench and try to improve our model.
As with all simulated data, our gof()
call shows that our selected model was an excellent choice for the observed graph:
<- gof(model_final)
gof_final print(gof_final)
##
## Goodness-of-fit for in-degree
##
## obs min mean max MC p-value
## idegree0 11 4 11.00 25 1.00
## idegree1 22 16 23.86 33 0.74
## idegree2 23 19 26.43 41 0.56
## idegree3 30 11 20.47 33 0.08
## idegree4 10 5 10.81 19 0.88
## idegree5 3 1 4.66 12 0.70
## idegree6 1 0 1.91 7 0.86
## idegree7 0 0 0.60 4 1.00
## idegree8 0 0 0.18 3 1.00
## idegree9 0 0 0.06 1 1.00
## idegree10 0 0 0.01 1 1.00
## idegree12 0 0 0.01 1 1.00
##
## Goodness-of-fit for out-degree
##
## obs min mean max MC p-value
## odegree0 10 4 10.75 19 0.90
## odegree1 30 12 24.16 33 0.22
## odegree2 23 18 26.76 40 0.52
## odegree3 17 12 19.71 29 0.64
## odegree4 10 5 11.05 19 0.80
## odegree5 8 0 4.89 10 0.22
## odegree6 2 0 1.96 6 1.00
## odegree7 0 0 0.52 2 1.00
## odegree8 0 0 0.15 2 1.00
## odegree9 0 0 0.04 1 1.00
## odegree11 0 0 0.01 1 1.00
##
## Goodness-of-fit for edgewise shared partner
##
## obs min mean max MC p-value
## esp.OTP0 212 180 210.99 240 0.96
## esp.OTP1 7 3 11.27 24 0.40
## esp.OTP2 0 0 0.27 2 1.00
##
## Goodness-of-fit for minimum geodesic distance
##
## obs min mean max MC p-value
## 1 219 184 222.53 264 0.96
## 2 449 320 469.83 626 0.76
## 3 790 507 863.71 1284 0.68
## 4 1151 688 1277.86 2016 0.76
## 5 1245 767 1409.86 2122 0.56
## 6 1043 698 1163.97 1546 0.46
## 7 745 347 771.20 1119 0.86
## 8 434 87 442.32 810 1.00
## 9 198 19 231.86 631 0.88
## 10 80 2 115.88 435 0.92
## 11 10 0 58.27 328 0.50
## 12 1 0 29.23 204 0.54
## 13 0 0 14.39 136 0.80
## 14 0 0 6.85 104 1.00
## 15 0 0 3.26 76 1.00
## 16 0 0 1.51 48 1.00
## 17 0 0 0.72 29 1.00
## 18 0 0 0.34 18 1.00
## 19 0 0 0.16 11 1.00
## 20 0 0 0.08 8 1.00
## 21 0 0 0.05 5 1.00
## 22 0 0 0.02 2 1.00
## 23 0 0 0.01 1 1.00
## Inf 3535 1150 2816.09 5550 0.40
##
## Goodness-of-fit for model statistics
##
## obs min mean max MC p-value
## edges 219 184 222.53 264 0.96
## mutual 16 10 16.88 26 0.88
## nodematch.fav_music 123 90 124.93 165 0.90
## nodematch.female 72 54 71.71 97 1.00
It is easier to see the results using the plot
function:
# Plotting the result (5 figures)
<- par(mfrow = c(3,2))
op plot(gof_final)
par(op)
5 Conclusion
We have shown here just a glimpse of what ERG models are. A large, active, collaborative community of social network scientists is working on new extensions and advances. If you want to learn more about ERGMs, I recommend the following resources:
6 Group activity
The ergm
package comes with a handful of vignettes (extended R examples) that you can use to learn more about the package. The vignette with the same name as the package shows an example fitting a model to the network faux.mesa.high
. Address the following questions:
What type of ERG model is this?
Looking at the plot in the vignette, what other term(s) could you consider worth adding to the model? Why?
The vignette did not include a goodness-of-fit analysis. Can you add one? What do you find?
Challenge: without using
ergm
to fit the model, estimate the following p1 model:~ edges + nodefactor("Race")
. Compare your results with what you get usingergm
.