PCACMI
Mutual information (MI) is a widely used measure of dependency between variables in
information theory.
PCACMI
adapts adopt the path consistency algorithm (PCA) to identify dependent pairs of variables
for reconstruction of the gene regulatory networks based on the conditional mutual information (CMI).
To be specific, let $H(X)$ and $H(X,Y)$ be the entropy of a random variable $X$ and the joint
entropy of random variables $X$ and $Y$, respectively. For two random variables $X$ and $Y$, $H(X)$ and $H(X,Y)$
can be expressed as
$$H(X) = E(-logf_{x}(X)), \quad H(X, Y) = E (-logf_{xy}(X, Y)),$$
where $f_X(x)$ is the marginal probability density function (PDF) of $X$ and $f_XY(x, y)$ is the joint PDF of $X$
and $Y$. With these notations, MI is defined as
$$I(X, Y) = E(-log\frac{f_{XY}(X, Y)}{f_{X}(X)f_{Y}(Y)})\\\quad\quad\quad\quad = H(X) + H(Y) - H(X, Y).$$
It is known that MI measures dependency between two variables that contain both directed dependency and
indirected dependency through other variables. While MI can not distinguish directed and indirected dependency,
CMI can measure directed dependency between two variables by conditioning on other variables. CMI for $X$ and $Y$
given Z is defined as
$$I(X,Y|Z) = H(X,Z) + H(Y,Z) - H(Z) - H(X,Y,Z).$$
To estimate the entropies, Gaussian kernal density estimator is considered. MI and CMI are defined as
$$\hat{I}(X,Y) = \frac{1}{2}log\frac{|C(X)||C(Y)|}{|C(X,Y)|},$$
$$\hat{I}(X,Y|Z) = \frac{1}{2}log\frac{|C(X,Z)||C(Y,Z)|}{|C(Z)||C(X,Y,Z)|},$$
where $|A|$ is the determinant of matrix $A$, $C(X)$, $C(Y)$ and $C(Z)$ are the variances of $X$, $Y$ and $Z$, respectively,
and $C(X,Z)$, $C(Y,Z)$ and $C(X,Y,Z)$ are the covariance matrices of $(X,Z)$, $(Y,Z)$ and $(X,Y,Z)$, respectively.
The PCACMI
method
sets $L = 0$ and calculates with $L$-order CMI, which is equivalent to MI if $L = 0$. Then PCACMI
removes the pairs of variables such that the maximal CMI of two variables given $L+1$ adjacent variables is less
than a given threshold $\alpha$, where $\alpha$ determines whether two variables are independent or not and adjacent
variables denote variables connected to the two target variables in PCACMI
at the previous step.
PCACMI
repeats the above steps until there is no higher order.
Reference:
1. Donghyeon Yu, Johan Lim, Xinlei Wang, Faming Liang, and Guanghua Xiao. "Enhanced construction of gene regulatory networks using hub gene information." BMC bioinformatics 18.1 (2017): 186.
2. Zhang, Xiujun, Xing-Ming Zhao, Kun He, Le Lu, Yongwei Cao, Jingdong Liu, Jin-Kao Hao, Zhi-Ping Liu, and Luonan Chen. "Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information." Bioinformatics 28.1 (2011): 98-104.
Note:
Change the $\alpha$ value $(\alpha > 0)$ to control the sparsity of network. The larger the $\alpha$, the more
sparse the constructed network. If you don't know how to choose a value, use the default one.