Childhood Cancer Data Lab

Accelerating the Pace of Childhood Cancer Research with Big Data.

CCDL Background

About The Childhood Cancer Data Lab

The Childhood Cancer Data Lab was established by Alex’s Lemonade Stand Foundation (ALSF) in 2017. ALSF recognized that pediatric cancer researchers face hurdles that impede the pace of research. A massive amount of childhood cancer data is publicly available, but collecting, sharing, and utilizing it can be a challenge. Far too often, data is not available in a ready-to-use format or found in easily accessible locations, making it difficult for researchers to carry out analyses and answer their scientific questions. ALSF introduced the Data Lab to empower researchers and scientists across the globe by removing roadblocks, supporting opportunities for collaboration and sharing, and developing resources to accelerate new treatment and cure discovery.

Putting resources and knowledge in the hands of pediatric cancer experts

The Childhood Cancer Data Lab constructs tools that make vast amounts of data widely available, easily mineable, and broadly reusable. They also train researchers and scientists to better understand their own data and to advance their work more quickly. 

ScPCA Portal

The Single-cell Pediatric Cancer Atlas (ScPCA) is a growing database of uniformly processed single-cell data from pediatric cancer tumors and model systems. In 2019, ALSF funded 10 awards for researchers working on single-cell profiling of patient samples from a broad range of cancer types. The generated data was shared with the Data Lab to be uniformly processed and made freely available on the ScPCA Portal. The Data Lab recently launched the Open Single-cell Pediatric Cancer Atlas (OpenScPCA), a collaborative project to analyze and improve the utility of the ScPCA data. OpenScPCA uses an open contribution model designed to allow experts worldwide to contribute and rapidly share the results in real-time. The Data Lab has already completed a project with a similar framework called the Open Pediatric Brain Tumor Atlas (OpenPBTA), through which over 60 collaborators from across the world openly analyzed and improved the data from more than 1,000 pediatric brain tumors.

Data Lab training workshops teach childhood cancer researchers the data science skills they need to examine their own data. They have trained nearly 200 researchers to date. Participants are introduced to the R programming language and to cutting-edge technologies used in single-cell and bulk RNA-sequencing data analysis. These workshops empower researchers to perform basic analysis of their own research and to better collaborate with other members of the research community. All training materials are openly licensed and made freely available by the Data Lab.

Refine.bio

refine.bio is a multi-organism collection of harmonized childhood cancer data that has been obtained from publicly available repositories. The vast amount of pediatric cancer data across the globe can provide unique insight into complex diseases. But this data is often found in different locations, in various formats, and requires reprocessing. refine.bio helps put this wealth of information to use broadly by uniformly processing the data into one universal repository. Since its launch, the Data Lab has harmonized more than 1.3 million data samples for immediate use, data that initially cost $1.3 billion to generate. Researchers from across the globe have downloaded over 2,500 ready-to-use datasets, saving them precious time and accelerating the pace of their research.

The Data Lab also developed refine.bio examples, which gives researchers access to a variety of example analyses for use with refine.bio data. The examples are designed to enhance usability and shorten the learning curve, allowing researchers to get the most out of their refine.bio datasets.