Data resources for gene regulatory network modeling

[ grn  ]

This is not a standalone post – it’s just a rough list of datasets that could be useful for regulatory network models. Check out the intro to this series for more context.

Benchmarking

  • The DREAM5 competition data consist of microarray (RNA) data for yeast (Saccharomyces cerevisiae), bacteria (E. coli, S. aureus), and simulated data. This is a popular dataset for benchmarking methods, because it comes with gold-standard sets of known regulatory relationships. Note, however, that the data may have some batch effects.
  • The Connectivity Map at the Broad Institute offers transcriptomic profiles for several cell lines under thousands of different genetic and chemical perturbations. This seems like a gold mine for testing regulatory network models, and I can’t figure out why it isn’t more heavily used in stem cell biology. I am planning to find out the hard way soon. It is part of a consortium called LINCS, which I am just digging into.

Chromatin

  • The FANTOM5 consortium has gathered CAGE-seq data in a wide array of human tissues and cell lines (about 400 total). CAGE-seq captures enhancers and genes, and it has been used for global network inference based on locations of active transcription factor binding motifs (http://regulatorycircuits.org/). It has also been used for prediction of reprogramming factors (http://www.mogrify.net/).
  • The ENCODE project has produced DNase-seq data on a wide range of human cell types, measuring active transcription factor binding motifs. For more information, visit http://www.regulatorynetworks.org/ and look for the “about” page.
  • The sci-ATAC atlas covers eight organs in the adult mouse.
  • The ImmGen consortium has produced a dataset covering chromatin accessibility in 86 mouse immune cell types, discussed here.
  • ENCODE and ChEA have direct measurements of transcription factor binding, though people often prefer the data above because the direct measurements aren’t even close to measuring every transcription factor in every cell type, whereas accessibility data can get closer to this ideal.

RNA

Written on March 10, 2020