I have submitted the following abstract to the CSHL Network Biology conference which will take place in March. It represents the second half of my research where I no longer use Boltzmann machines, but have switched to a nonparametric/model-free estimation method. I hope the abstract will get selected and I get to present a poster or talk.

Model-independent estimation of higher order interactions in single cell expression data

Traditional techniques to infer gene regulatory networks are based only on properties of pairs of genes. Common examples include (partial) correlation and mutual information. This simplification generally hinders the correct estimation of pairwise and higher-order interactions as compared to the ground truth. Furthermore, these pairwise quantities are introduced to serve as proxies for the more interesting but abstract notion of a causal interaction that corresponds to a biological process.

We consider a model-independent nonparametric estimator of symmetric interactions that coincides with the definition of interaction in statistical physics. This estimator can be shown to both analytically and numerically recover the ground truth in simulated binary systems on a lattice. We apply this estimator to extract (higher-order) interactions from binary single cell gene expression data.

Estimating the interactions directly from data requires a deeply sampled state space, which is generally not available in expression data where the state space grows exponentially with the number of genes. Exploiting the conditional independencies between the variables can make inference much less data-hungry. We randomly select 2000 cells from each of five cell types in the 10X Million Cell Dataset of an embryonic mouse brain, keeping only the 500 most highly variable genes, and use two causal discovery algorithms, the constraint-based PC algorithm and a hybrid MCMC method, to discover conditional independencies amongst genes and obtain the causal graphs. Knowledge of these graphs makes the inference of interactions from expression data tractable and we find significant genetic interactions at first, second, and third order, with differences in interaction graph structure and density exhibited between cell types. Across cell types, we find between 75 and 537 significant triplet interactions where the 99% confidence interval does not include 0.

By simulating systems on small causal graphs, we see that different underlying causal dynamics lead to different structures in the hypergraph of inferred interactions. This relationship between interactions and causal structure allows us to identify interacting gene triplets, and distinguish various types of causal interactions. Across cell types, we find between 1 and 30 triplets that can be identified as having a multiplicative triplet interaction.