Parameter | Value |
---|---|
Network size | 8 |
Edge count | 26 |
\(\boldsymbol{\theta}_{\mbox{homophily}}\) | 2 |
\(\alpha\) | 0.10 |
\(1 - \beta\) | 0.80 |
JSM 2023
Toronto, Canada
The University of Utah
2023-08-09
What I highlight in their paper:
Start to finish framework for multi-ERG models.
Dealing with heterogeneous samples.
Model building process.
Goodness-of-fit analyses.
Two important missing pieces (for the next paper): power analysis and how to deal with collinearity in small networks.
Two different questions: How many nodes? and “How many networks?”
Is the network bounded?
If it is bounded, can we collect all the nodes?
If we cannot collect all the nodes, can we do inference (Schweinberger, Krivitsky, and Butts 2017; Schweinberger et al. 2020)?
There is a growing number of studies featuring multiple networks (e.g., egocentric studies).
There’s no clear way to do power analysis in ERGMs.
In funding justification, power analysis is fundamental, so we need that.
We can leverage conditional ERG models for power analysis.
Conditioning on one sufficient statistic results in a distribution invariant to the associated parameter, formally:
\[\begin{align} \notag% {\mbox{Pr}_{\mathcal{Y},\boldsymbol{\theta}}\left(\boldsymbol{Y}= \boldsymbol{y}\left|\;\boldsymbol{g}\left(\boldsymbol{y}\right)_l = s_l\right.\right)}% & = \frac{% {\mbox{Pr}_{\mathcal{Y},\boldsymbol{\theta}}\left(\boldsymbol{g}\left(\boldsymbol{Y}\right)_{-l} = \boldsymbol{g}\left(\boldsymbol{y}\right)_{-l}, \boldsymbol{g}\left(\boldsymbol{y}\right)_l = s_l\right) } }{% \sum_{\boldsymbol{y}'\in\mathcal{Y}:\boldsymbol{g}\left(\boldsymbol{y}'\right)_l = s_l}{\mbox{Pr}_{\mathcal{Y},\boldsymbol{\theta}}\left(\boldsymbol{g}\left(\boldsymbol{Y}\right) = \boldsymbol{y}'\right) }% } \\ & = % \frac{% \mbox{exp}\left\{{\boldsymbol{\theta}_{-l}}^{\boldsymbol{t}}\boldsymbol{g}\left(\boldsymbol{y}\right)_{-l}\right\} }{% \kappa_{\mathcal{Y}}\left(\boldsymbol{\theta}\right)_{-l} }, \tag{1} \end{align}\]
where \(\boldsymbol{g}\left(\boldsymbol{y}\right)_l\) and \(\boldsymbol{\theta}_l\) are the \(l\)-th element of \(\boldsymbol{g}\left(\boldsymbol{y}\right)\) and \(\boldsymbol{\theta}\) respectively, \(\boldsymbol{g}\left(\boldsymbol{y}\right)_{-l}\) and \(\boldsymbol{\theta}_{-l}\) are their complement, and \(\kappa_{\mathcal{Y}}\left(\boldsymbol{\theta}\right)_{-l} = \sum_{\boldsymbol{y}' \in \mathcal{Y}: \boldsymbol{g}\left(\boldsymbol{y}'\right)_l = s_l}\mbox{exp}\left\{{\boldsymbol{\theta}_{-l}}^{\boldsymbol{t}}\boldsymbol{g}\left(\boldsymbol{y}'\right)_{-l}\right\}\) is the normalizing constant.
We can use this to generate networks with a prescribed edgecount (based on previous studies) and compute power through simulation.
Want to detect an effect size of \(\boldsymbol{\theta}_{\mbox{homophily}} = 2\), using conditional ERGMs (prev Eq.):
For each \(n \in N \equiv \{10, 20, \dots\}\), do:
Simulate: \(1,000\) sets of \(n\) undirected networks of size 8 and 26 ties.
Fit ERGM Estimate \(\widehat{\boldsymbol{\theta}}_{\mbox{homophily}}\), and generate the indicator variable \(p_{n, i}\) equal to one if the estimate is significant at the 95% level.
Compute empirical power \(p_n \equiv \frac{1}{1,000}\sum_{i}p_{n, i}\).
Model \(n\) as a function of power Using \(\{p_{10}, p_{20}, \dots\}\), we can fit the model \(n \sim f(p_n)\).
Using KCH as a reference for density, we can fix the edge count to \(0.93 \times 8 (8 - 1) / 2 \approx 26\)
Parameter | Value |
---|---|
Network size | 8 |
Edge count | 26 |
\(\boldsymbol{\theta}_{\mbox{homophily}}\) | 2 |
\(\alpha\) | 0.10 |
\(1 - \beta\) | 0.80 |
Finally, the required sample size can be computed with \(f(1-\beta) = f(0.80)\).
Variance Inflation Factor [VIF] is a common measure of collinearity in regular models.
Usually, VIF > 10 is considered problematic.
VIFs are not straightforward in ERGMs:
Traditional models can feature completely exogenous variables.
ERGMs are by construction endogenous (
It is expected that VIFs will be higher in ERGMs.
Duxbury (2021)’s large simulation study recommends using VIF between 20 and 150 as a threshold for multicollinearity.
As small networks usually are denser, VIFs can be more severe.
A directed network with 5 nodes, two of them female and three male.
Two models: (a) Bernoulli (0.50 density) and (b) ERGM(edge count, transitivity) (0.92 density).
When \(\boldsymbol{\theta}_{\mbox{ttriad}} = 0.75\) and \(\boldsymbol{\theta}_{\mbox{edges}} = -2\) (second row), Cor(transitive triads, mutual ties) \(\to 1\), and VIF reaches 140 (mutual ties).
In the same network, many combinations of model parameters yield \(\rho\to 1\) and high VIFs.
KCH’s networks were highly dense, (0.93 and 0.73 for the household and egocentric samples, respectively.) \(\rightarrow\) collinearity should be severe.
\(\boldsymbol{Y}\sim \mbox{ERGM}(\mbox{edgecount}, \mbox{mutual ties}, \mbox{transitivity})\)
Krivitsky, Coletti, and Hens’ work make an important contribution to ERG models, most relevant: model building, selection, and GOF for multi-network models.
Power (sample size requirements) and multicollinearity are two important issues that are yet to be addressed.
I presented a possible approach to deal with power analysis in ERGMs using conditional distributions.
Collinearity in small networks (like those in KCH) can be serious (more than in larger networks.) Yet we need to further explore this.
Vega Yon – ggv.cl/slides/jsm2023 – The University of Utah