Asset Detail:

Pilot 1 Cancer Drug Response Prediction Dataset

Asset Detail:

Pilot 1 Cancer Drug Response Prediction Dataset
Overview
ASSET LINK: https://modac.cancer.gov/assetDetails?dme_data_id=NCI-DME-MS01-8088592
PROGRAM NAME: NCI-DOE Collaboration
STUDY NAME: NCI-DOE Collaboration Cellular Level Pilot: Predictive Modeling for Pre-Clinical Screening
ASSET NAME: Pilot 1 Cancer Drug Response Prediction Dataset
ASSET PATH: /NCI_DOE_Archive/JDACS4C/JDACS4C_Pilot_1/cancer_drug_response_prediction_dataset
Asset Attributes
  ATTRIBUTE VALUE
ASSET NAME Pilot 1 Cancer Drug Response Prediction Dataset
ASSET DESCRIPTION This collection contains DataFrames and supporting metadata used by Combo, P1B3, Uno, UNOMT, CLRNA, and benchmarking machine learning models in the Pilot 1 project to predict drug response in various cancer cell lines. It contains gene expression and drug response data for cancer cell lines from the NCI-60 Human Cancer Cell Line Screen (NCI 60), NCI ALMANAC, NCI Sarcoma (SCL), NCI Small Cell Lung Cancer (SCLC), Cancer Cell Line Encyclopedia (CCLE), Genomics of Drug Sensitivity in Cancer (GDSC), Genentech Cell Line Screening Initiative (gCSI), and Cancer Therapeutics Response Portal (CTRP) studies, and molecular descriptors generated using Dragon 7.0 and Mordred software packages. It also contains relevant metadata for the cancer cell lines and drug compounds. This collection also contains a list of genes from the Library of Integrated Network-Based Cellular Signatures (LINCS) 1000 study. The LINCS1000 gene set was used as a reference to filter cancer cell line data. The TopN DataFrames for Pilot 1 combine drug response data, gene expression data, and drug molecular descriptors into a single DataFrame to support building binary classification or regression machine learning models to predict drug response. These DataFrames include top N cancer types that have the most cell lines with the RNA-Seq and drug response data available. The models can be further evaluated and improved by using an empirical method, Learning curves. For more information, refer to the GitHub Repository links and Source links.
ASSET IDENTIFIER cancer_drug_response_prediction_dataset
ASSET TYPE Dataset
PLATFORM VERSION None
IS REFERENCE DATASET No
COLLECTION SIZE 19.9 GB
GITHUB REPOSITORY CLRNA https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Semi-Supervised-Feature-Learning-with-Center-Loss
GITHUB REPOSITORY COMBO https://github.com/CBIIT/NCI-DOE-Colab-Pilot1-Combo-combination-drug-response-predictor
GITHUB REPOSITORY LEARNING CURVE https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Learning-Curve
GITHUB REPOSITORY P1B3 https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Single-Drug-Response-Predictor
GITHUB REPOSITORY UNO https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Unified-Drug-Response-Predictor
INSTITUTE Argonne National Laboratory
SOURCE ASPURU-GUZIK VAE https://github.com/aspuru-guzik-group/chemical_vae
SOURCE CCLE https://portals.broadinstitute.org/ccle/data
SOURCE CTRP https://portals.broadinstitute.org/ctrp/
SOURCE DOSE RESPONSE AUC https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753377/
SOURCE GDC https://portal.gdc.cancer.gov/
SOURCE GDSC https://www.cancerrxgene.org/downloads/bulk_download
SOURCE LINCS1000 http://lincsportal.ccs.miami.edu/dcic-portal/
SOURCE NCI ALMANAC https://dtp.cancer.gov/ncialmanac/initializePage.do
SOURCE NCI PDMR https://pdmdb.cancer.gov/web/apex/f?p=101:41
SOURCE NCI SARCOMA https://sarcoma.cancer.gov/sarcoma/downloads.xhtml
SOURCE NCI SMALL CELL LUNG CANCER https://sclccelllines.cancer.gov/sclc/
SOURCE NCI-60 - CELLMINER https://discover.nci.nih.gov/cellminer/loadDownload.do
SOURCE NCI-60 - DTP https://dtp.cancer.gov/databases_tools/bulk_data.htm
SOURCE GCSI https://pharmacodb.pmgenomics.ca/datasets/4

Asset Files

To download files, please login.

FILE/COLLECTION FILE SIZE ACTIONS
Back To Top