7  Convergence on ERGMs

Let’s continue what we left off in the previous session: Evaluating our ERGMs. Like most models, ERGMs are a mix between art and science. Here is again a list of steps/considerations to have when fitting ERGMs:

  1. Inspect the data.

  2. Start with endogenous effects first.

  3. After structure is controlled.

  4. Evaluate your results: Once you have a model you are happy with, the last couple of steps are (a) assess convergence (which is usually done automagically by the ergm package,) and (b) assess goodness-of-fit, which in this context means how good was our model to capture (not-controlled for) properties of the network.

We are in the last step. Let’s recall what was our final model with the simulated data (code folded intentionally):

Code
## Loading the packages
library(ergm)
library(sna)

## Simulating the covariates (vertex attribute)
set.seed(1235)

## Simulating the data
Y <- network(100, directed = TRUE)
Y %v% "fav_music" <- sample(c("rock", "jazz", "pop"), 100, replace = TRUE)
Y %v% "female"    <- rbinom(100, 1, 0.5) 

## Simulating the ERGM
Y <- ergm::simulate_formula(
  Y ~
    edges +
    nodematch("fav_music") + 
    nodematch("female") +
    mutual,
  coef = c(-4, 1, -1, 2)
  )
gplot(Y, vertex.col = c("green", "red")[Y %v% "female" + 1])

model_final <- ergm(Y ~ edges + mutual + nodematch("fav_music") + nodematch("female"))

summary(model_final)
Call:
ergm(formula = Y ~ edges + mutual + nodematch("fav_music") + 
    nodematch("female"))

Monte Carlo Maximum Likelihood Results:

                    Estimate Std. Error MCMC % z value Pr(>|z|)    
edges                -3.9573     0.1112      0 -35.588   <1e-04 ***
mutual                1.8950     0.2853      0   6.643   <1e-04 ***
nodematch.fav_music   0.8295     0.1291      0   6.426   <1e-04 ***
nodematch.female     -0.7436     0.1379      0  -5.394   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 13724  on 9900  degrees of freedom
 Residual Deviance:  1994  on 9896  degrees of freedom
 
AIC: 2002  BIC: 2031  (Smaller is better. MC Std. Err. = 0.8189)

7.1 Convergence

As our model was fitted using MCMC, we must ensure the chains converged. We can use the mcmc.diagnostics function from the ergm package to check model convergence. This function looks at the last set of simulations of the MCMC model and generates various diagnostics for the user.

Under the hood, the fitting algorithm generates a stream of networks based on those parameters for each new proposed set of model parameters. The last stream of networks is thus simulated using the final state of the model. The mcmc.diagnostics function takes that stream of networks and plots the sequence of the sufficient statistics included in the model. A converged model should show a stationary statistics sequence, moving around a fixed value without (a) becoming stuck at any point and (b) chaining the tendency. This model shows both:

mcmc.diagnostics(model_final, which   = c("plots"))


Note: MCMC diagnostics shown here are from the last round of
  simulation, prior to computation of final parameter estimates.
  Because the final estimates are refinements of those used for this
  simulation run, these diagnostics may understate model performance.
  To directly assess the performance of the final model on in-model
  statistics, please use the GOF command: gof(ergmFitObject,
  GOF=~model).

Now that we know our model was good enough to represent the observed statistics (sample them, actually,) let’s see how good it is at capturing other features of the network that were not included in the model.

7.2 Goodness-of-fit

This would be the last step in the sequence of steps to fit an ERGM. As we mentioned before, the idea of Goodness-of-fit [GOF] in ERG models is to see how well our model captures other properties of the graph that were not included in the model. By default, the gof function from the ergm package computes GOF for:

  1. The model statistics.

  2. The outdegree distribution.

  3. The indegree distribution.

  4. The distribution of edge-wise shared partners.

  5. The distribution of the geodesic distances (shortest path).

The process of evaluating GOF is relatively straightforward. Using samples from the posterior distribution, we check whether the observed statistics from above are covered (fall within the CI) of our model. If they do, we say that the model has a good fit. Otherwise, if we observe significant anomalies, we return to the bench and try to improve our model.

As with all simulated data, our gof() call shows that our selected model was an excellent choice for the observed graph:

gof_final <- gof(model_final)
print(gof_final)
## 
## Goodness-of-fit for in-degree 
## 
##           obs min  mean max MC p-value
## idegree0   11   3 10.93  20       1.00
## idegree1   22  16 24.80  34       0.58
## idegree2   23  17 26.54  36       0.52
## idegree3   30   9 19.41  31       0.02
## idegree4   10   4 10.94  20       0.94
## idegree5    3   1  4.97  11       0.46
## idegree6    1   0  1.68   5       0.96
## idegree7    0   0  0.58   3       1.00
## idegree8    0   0  0.14   1       1.00
## idegree10   0   0  0.01   1       1.00
## 
## Goodness-of-fit for out-degree 
## 
##           obs min  mean max MC p-value
## odegree0   10   3 10.71  18       1.00
## odegree1   30  13 24.67  35       0.28
## odegree2   23  18 27.50  40       0.38
## odegree3   17  12 19.01  28       0.74
## odegree4   10   3 10.72  18       1.00
## odegree5    8   1  4.84  10       0.12
## odegree6    2   0  1.79   6       1.00
## odegree7    0   0  0.57   3       1.00
## odegree8    0   0  0.15   2       1.00
## odegree9    0   0  0.03   1       1.00
## odegree10   0   0  0.01   1       1.00
## 
## Goodness-of-fit for edgewise shared partner 
## 
##          obs min   mean max MC p-value
## esp.OTP0 212 178 209.21 241       0.78
## esp.OTP1   7   3  10.47  22       0.44
## esp.OTP2   0   0   0.40   3       1.00
## 
## Goodness-of-fit for minimum geodesic distance 
## 
##      obs min    mean  max MC p-value
## 1    219 190  220.08  255       0.94
## 2    449 326  459.25  619       0.98
## 3    790 499  842.85 1183       0.90
## 4   1151 708 1252.11 1960       0.80
## 5   1245 775 1396.75 2266       0.66
## 6   1043 656 1181.38 1680       0.44
## 7    745 499  802.34 1211       0.72
## 8    434 194  466.85  791       0.86
## 9    198  57  244.78  569       0.72
## 10    80  11  121.93  408       0.76
## 11    10   1   60.50  285       0.42
## 12     1   0   30.83  251       0.46
## 13     0   0   15.96  186       0.74
## 14     0   0    8.42  133       1.00
## 15     0   0    5.02   96       1.00
## 16     0   0    3.03   65       1.00
## 17     0   0    1.83   60       1.00
## 18     0   0    1.05   47       1.00
## 19     0   0    0.54   23       1.00
## 20     0   0    0.18    7       1.00
## 21     0   0    0.07    3       1.00
## 22     0   0    0.01    1       1.00
## Inf 3535 776 2784.24 4680       0.38
## 
## Goodness-of-fit for model statistics 
## 
##                     obs min   mean max MC p-value
## edges               219 190 220.08 255       0.94
## mutual               16   9  16.36  25       1.00
## nodematch.fav_music 123  97 123.55 152       1.00
## nodematch.female     72  49  71.53  90       0.90

It is easier to see the results using the plot function:

## Plotting the result (5 figures)
op <- par(mfrow = c(3,2))
plot(gof_final)
par(op)