GeNeCK: Gene Network construction tool kit

PCACMI

Path consistency algorithm based on conditional mutual information

Mutual information (MI) is a widely used measure of dependency between variables in information theory. PCACMI adapts adopt the path consistency algorithm (PCA) to identify dependent pairs of variables for reconstruction of the gene regulatory networks based on the conditional mutual information (CMI).

To be specific, let $H(X)$ and $H(X,Y)$ be the entropy of a random variable $X$ and the joint entropy of random variables $X$ and $Y$, respectively. For two random variables $X$ and $Y$, $H(X)$ and $H(X,Y)$ can be expressed as $$H(X) = E(-logf_{x}(X)), \quad H(X, Y) = E (-logf_{xy}(X, Y)),$$ where $f_X(x)$ is the marginal probability density function (PDF) of $X$ and $f_XY(x, y)$ is the joint PDF of $X$ and $Y$. With these notations, MI is defined as $$I(X, Y) = E(-log\frac{f_{XY}(X, Y)}{f_{X}(X)f_{Y}(Y)})\\\quad\quad\quad\quad = H(X) + H(Y) - H(X, Y).$$ It is known that MI measures dependency between two variables that contain both directed dependency and indirected dependency through other variables. While MI can not distinguish directed and indirected dependency, CMI can measure directed dependency between two variables by conditioning on other variables. CMI for $X$ and $Y$ given Z is defined as $$I(X,Y|Z) = H(X,Z) + H(Y,Z) - H(Z) - H(X,Y,Z).$$ To estimate the entropies, Gaussian kernal density estimator is considered. MI and CMI are defined as $$\hat{I}(X,Y) = \frac{1}{2}log\frac{|C(X)||C(Y)|}{|C(X,Y)|},$$ $$\hat{I}(X,Y|Z) = \frac{1}{2}log\frac{|C(X,Z)||C(Y,Z)|}{|C(Z)||C(X,Y,Z)|},$$ where $|A|$ is the determinant of matrix $A$, $C(X)$, $C(Y)$ and $C(Z)$ are the variances of $X$, $Y$ and $Z$, respectively, and $C(X,Z)$, $C(Y,Z)$ and $C(X,Y,Z)$ are the covariance matrices of $(X,Z)$, $(Y,Z)$ and $(X,Y,Z)$, respectively. The PCACMI method sets $L = 0$ and calculates with $L$-order CMI, which is equivalent to MI if $L = 0$. Then PCACMI removes the pairs of variables such that the maximal CMI of two variables given $L+1$ adjacent variables is less than a given threshold $\alpha$, where $\alpha$ determines whether two variables are independent or not and adjacent variables denote variables connected to the two target variables in PCACMI at the previous step. PCACMI repeats the above steps until there is no higher order.

Reference:
1. Donghyeon Yu, Johan Lim, Xinlei Wang, Faming Liang, and Guanghua Xiao. "Enhanced construction of gene regulatory networks using hub gene information." BMC bioinformatics 18.1 (2017): 186.
2. Zhang, Xiujun, Xing-Ming Zhao, Kun He, Le Lu, Yongwei Cao, Jingdong Liu, Jin-Kao Hao, Zhi-Ping Liu, and Luonan Chen. "Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information." Bioinformatics 28.1 (2011): 98-104.

Note:
Change the $\alpha$ value $(\alpha > 0)$ to control the sparsity of network. The larger the $\alpha$, the more sparse the constructed network. If you don't know how to choose a value, use the default one.

Data & parameters (Required)

Gene expression data:		Example	Submit Form requirements Expression data requirements: Only CSV file is accepted here, and the maximum size is 12MB. The first row will be used as gene name. The rest of the row must be numeric type. Each sample (row) must contain the same number of columns (genes) as the first row. Each sample will be scaled to have mean 0 and standard deviation 1. Download demo data moderate size: small size: Close
Alpha:
Enter the code:


User information (optional)

Name:
Organization:
Email:
	You will be notified through e-mail when you submit your job.

	submit

GeNeCK

Gene Network Construction Tool Kit @ QBRC

Network Inference Method

Incorporate Hub Gene

Integrative Methods

PCACMI