UPDATE Feb 2021: Success! I will be presenting a poster at the conference this March.

I have submitted the following abstract to the CSHL Network Biology conference which will take place in March. It represents the second half of my research where I no longer use Boltzmann machines, but have switched to a nonparametric/model-free estimation method. I hope the abstract will get selected and I get to present a poster or talk.

Complex gene regulation: higher-order interactions in single cell expression data

Traditional techniques to infer gene regulatory networks are based only on properties of pairs of genes. Common examples include (partial) correlation and mutual information. This simplification not only hinders the correct estimation of pairwise interactions, it also ignores the complexity of higher-order interactions and combinatorial gene regulation. Furthermore, these pairwise quantities are introduced to serve as proxies for the more interesting but abstract notion of a causal interaction that corresponds to a biological process.

We consider a model-independent nonparametric estimator of symmetric interactions that coincides with the definition of interaction in statistical physics. This estimator can be shown to both analytically and numerically recover the ground truth in simulated binary systems on a lattice, and it separates each order of interaction. We apply this estimator to extract (higher-order) interactions from binary single cell gene expression data.

Estimating the interactions directly from data requires a deeply sampled state space, which is generally not available in expression data where the state space grows exponentially with the number of genes. Exploiting the conditional independencies between the variables can make inference much less data-hungry. We randomly select 10k cells from each of four cell types in the 10X Million Cell Dataset of an embryonic mouse brain, keeping only the 500 most highly variable genes, and use two causal discovery algorithms, the constraint-based PC algorithm and a hybrid MCMC method, to discover conditional independencies amongst genes and obtain the causal graphs. Knowledge of these graphs makes the inference of interactions from expression data tractable and we find significant genetic interactions at first, second, and third order, with differences in interaction graph structure and density exhibited between cell types. Across cell types, we find hundreds of significant triplet interactions where the 95% confidence interval does not include zero.

By simulating systems on small causal graphs, we see that different underlying causal dynamics lead to different structures in the inferred interactions. This relationship between interactions and causal structure allows us to predict combinatorially interacting gene triplets, and distinguish various types of causal interactions.