Neighborhood Selection
Neighborhood selection (NS
) separately solves the lasso problem and identifies edges with
nonzero estimated regression coefficients for each node with tuning parameter $λ_i(\alpha)$.
The NS
method is asymptotically consistent in identifying the neighborhood of each node when the neighborhood
stability condition is satisfied.
To be specific, for each node $i \in V = \{1,2,...,p\}$, NS
solves the following lasso problem
$$\hat{\beta}^{i,\lambda} = \underset{\beta \in \mathbb{R}^p: \beta_i = 0}{argmin} \frac{1}{2}\left \|X_i - X\beta\right \|^2_2 + \lambda\left \|\beta\right \|_1,$$
where
$\left \| x \right \|_{2}^{2} = \sum_{i=1}^{p}x_{i}^{2}$ and
$\left \| x \right \|_{1} = \sum_{i=1}^{p}\left | x_{i} \right |$ for $x \in \mathbb{R}^p$.
With the estimate
$\hat{\beta}^{i,\lambda}$,
NS
identifies the neighborhood of the node $i$ as
$N_i(\lambda) = \{ k | \hat{\beta}_{k}^{i,\lambda} \neq 0 \}$,
which defines an edge set
$E_{i}^{\lambda} = \left \{ \left ( i, j\right ) | j \in N_{i}\left ( \lambda\right )\right \}$.
Choice of the tuning parameter $λ_i(\alpha)$ for the $i$th node is given by
$$\lambda(\alpha) = \left \| X_{i} \right \|_{2}\tilde{\Phi}^{-1}(\frac{\alpha}{2p^{2}})$$
where $\tilde{\phi} = 1 - \phi$ and $\phi$ is the distribution function of the standard normal distribution.
With this choice of $\lambda_i(\alpha)$ for $i=1,2,...,p$, the probability of falsely identifying edges in the
network is bounded by the level $\alpha$. We implement NS
with R package CDLasso
provided
by the authors.
Reference:
1. Donghyeon Yu, Johan Lim, Xinlei Wang, Faming Liang, and Guanghua Xiao. "Enhanced construction of gene regulatory networks using hub gene information." BMC bioinformatics 18.1 (2017): 186.
2. Meinshausen, Nicolai, and Peter Bühlmann. "High-dimensional graphs and variable selection with the lasso." The annals of statistics 34, no. 3 (2006): 1436-1462.
Note:
Change the $\alpha$ level to control the false positive rate $(\alpha > 0)$.
A larger $\alpha$ will give you more estimated edges, but with lower confidence. If you don't know how to choose
a value, use the default one.