Challenges to drive innovation in biomedical data analysis
The explosion in biomedical data requires ongoing development of analysis methods and community standards. A proven way of driving innovation is through Challenges—competitions to develop crowdsourced methods to answer specific scientific questions. These challenges offer unique opportunities for academic and commercial communities to advance science by developing precise algorithms and democratizing access to reliable training data for future use. Challenges can be set by commercial organizations (such as the Netflix Prize, which sought improved algorithms for movie recommendations), or by academic groups (including the DREAM Challenges to develop methods in systems biology and translational medicine).
While nominally competitions, DREAM Challenges also foster collaborative, open research. Their datasets become standards for benchmarking and testing novel algorithms; for example, for identifying biomarkers for Alzheimer’s disease and predicting survival in patients with breast cancer.
We’re now thrilled to bring a DREAM Challenge—the snappily named ICGC-TCGA DREAM Somatic Mutation Calling – RNA Challenge (SMC-RNA)—to our Cancer Genomics Cloud (CGC). We’ve hosted the challenge resources within a “Public Project”—a one-stop shop to access reference and training data. Participants can clone or copy this project to make their own—a private workspace where they can collaborate with others, develop, and test their methods. In addition they can learn from our documentation, request support from our staff, and submit their entries for evaluation—all within the CGC ecosystem. Using the CGC for the SMC-RNA challenge streamlines the development and submission processes, letting more participants spend more time developing their algorithms. Participants also gain easy access to the other public cancer datasets on the CGC, all deeply annotated with metadata.
Unravelling RNA biology in cancer
Researchers have used RNA sequencing (RNA-seq) to better understand the role of RNA in cancer, including the role of gene fusions and alternative expression patterns in oncogenesis and metastasis.
The SMC-RNA challenge comprises two sub-challenges based on RNA-seq data:
- Quantify known isoforms: develop algorithms to estimate the levels of a given set of mRNA isoforms. Altered expression can serve as the basis for clinically-important prognostic biomarkers.
- Detect gene fusions: develop algorithms to predict the presence of gene fusions. Fusion transcripts can give rise to chimeric protein products that serve as diagnostic markers or drug targets.
Challenge participants will be asked to develop methods to solve the above problems. To do so they are given training datasets, which they can supplement with their own and other public data (including The Cancer Genome Atlas and Cancer Cell Line Encyclopedia available through the CGC). The methods they develop will be submitted for assessment and their performance will be assessed on never-seen simulated data and ‘gold standard’ long-read and hybrid capture data across a range of tumors. In both cases the ‘true’ outcomes are known so performance can be accurately assessed.
A natural home on the Cancer Genomics Cloud
One of the guiding principles we followed while building the CGC is that the best science happens in teams. We’re excited to support challenges like DREAM SMC-RNA, which promote collaboration, data sharing, and the development of community standards. Many DREAM Challenges extend collaboration past their formal submission deadlines through meetings, articles and benchmarks. The SMC-RNA Challenge is also the first to use both Docker and the Common Workflow Language to ensure reproducibility and portability of the submissions.
We’ve organized the resources for the DREAM Challenge as a Public Project, a template workspace containing data, example applications and analyses, and more information to help you get started. After copying the template project, users and their chosen collaborators can privately develop workflows building on the challenge data (training data, reference and truth files), example workflows and tools, collaboration facilities, development capabilities, and direct and easy access to cloud-based computation for analysis.
DREAM Challenge participants can access up to $1,600 in computational credits to test their algorithms. User can easily submit their entries through a Common Workflow Language description of their methods, and the winning workflows can be shared among researchers in a single click.
With the CGC, DREAM Challenge participants have a platform to access data and a workbench to develop, test, evaluate, and submit workflows—all in one place. We are honored for the CGC to co-host this innovative effort to further our understanding of the role of RNA in cancer and human health.