ENA
ENA
is an ensemble-based network aggregation approach to combine networks reconstructed from different methods.
The current ENA
algorithm integrates NS
, GLASSO
, GLASSO-SF
,
PCACMI
, SPACE
and BayesianGLASSO
.
Suppose $G^k (k = 1,...,M)$ is a set of networks constructed by $M$ different methods. The rank $r^k_{ij}$ of
connection strength for edge between gene $i$ and gene $j$ is calculated on each individual network in $G^k$.
This operation is performed on all edges in $G^k$ to get the rank of all edges $r^k_{ij} (i, j \in N \ and \ i < j)$
in $M$ different methods. Then the predicted rank $\tilde{r}_{ij}$ of a particular edge between gene $i$ and $j$
in the aggregated network is calculated by taking the harmonic mean of the inverse of the ranks of the same
edge across all network in $G^k$, according to
$$\tilde{r}_{ij} = M \ / \sum^M_{i=1}{1 \ / \ r^k_{ij}}$$
To derive the confidence level of an edge to be a true positive connection, the original dataset is permutated
to obtain a resampled dataset $MD^{p_i}$. Then ENA
algorithm is applied to get the estimated graph
$G^{p_i}$ on this dataset. This procedure is repeated for $m$ times and null distribution $G^{null}$ is generated
by aggregating all estimated edge strength in $m$ permutations. Then the confidence level $\tilde{p}_{ij}$ is
derived by calculating the quantile of $\tilde{r}_{ij}$ in $G^{null}$ with Benjamini-Hochberg adjustment to
avoid multiple comparison problem.
$$\tilde{p}_{ij} = BH adjust(\frac{\# \ of \ \tilde{r}_{ij} < permutated \ r \ value \ in \ G^{null}}{total \ \# \ of \ permutated \ r \ value \ in \ G^{null}})$$
Reference:
1. Rui Zhong, Jeffrey D. Allen, Guanghua Xiao, and Yang Xie. "Ensemble-based network aggregation improves the accuracy of gene network reconstruction." PloS one 9.11 (2014): e106319.
Note:
1. Hub gene input is currently not supported.
2. BayesianGLASSO
is time consuming. The user can select whether to include BayesianGLASSO
or not (the result won't change too much).
3. If BayesianGLASSO
is selected, we require the input expression data to have no more than 50 genes (columns)
and no more than 100 observations (rows). (Otherwise, you won't be able to submit the job!)