Asset Detail:
Pilot 1 Cancer Drug Response Prediction Dataset
Asset Detail:
Pilot 1 Cancer Drug Response Prediction Dataset
Overview
ASSET LINK: | https://modac.cancer.gov/assetDetails?dme_data_id=NCI-DME-MS01-8088592 |
PROGRAM NAME: | NCI-DOE Collaboration |
STUDY NAME: | NCI-DOE Collaboration Cellular Level Pilot: Predictive Modeling for Pre-Clinical Screening |
ASSET NAME: | Pilot 1 Cancer Drug Response Prediction Dataset |
ASSET PATH: | /NCI_DOE_Archive/JDACS4C/JDACS4C_Pilot_1/cancer_drug_response_prediction_dataset |
Asset Attributes
ATTRIBUTE | VALUE |
---|---|
ASSET NAME | Pilot 1 Cancer Drug Response Prediction Dataset |
ASSET DESCRIPTION | This collection contains DataFrames and supporting metadata used by Combo, P1B3, Uno, UNOMT, CLRNA, and benchmarking machine learning models in the Pilot 1 project to predict drug response in various cancer cell lines. It contains gene expression and drug response data for cancer cell lines from the NCI-60 Human Cancer Cell Line Screen (NCI 60), NCI ALMANAC, NCI Sarcoma (SCL), NCI Small Cell Lung Cancer (SCLC), Cancer Cell Line Encyclopedia (CCLE), Genomics of Drug Sensitivity in Cancer (GDSC), Genentech Cell Line Screening Initiative (gCSI), and Cancer Therapeutics Response Portal (CTRP) studies, and molecular descriptors generated using Dragon 7.0 and Mordred software packages. It also contains relevant metadata for the cancer cell lines and drug compounds. This collection also contains a list of genes from the Library of Integrated Network-Based Cellular Signatures (LINCS) 1000 study. The LINCS1000 gene set was used as a reference to filter cancer cell line data. The TopN DataFrames for Pilot 1 combine drug response data, gene expression data, and drug molecular descriptors into a single DataFrame to support building binary classification or regression machine learning models to predict drug response. These DataFrames include top N cancer types that have the most cell lines with the RNA-Seq and drug response data available. The models can be further evaluated and improved by using an empirical method, Learning curves. For more information, refer to the GitHub Repository links and Source links. |
ASSET IDENTIFIER | cancer_drug_response_prediction_dataset |
ASSET TYPE | Dataset |
PLATFORM VERSION | None |
IS REFERENCE DATASET | No |
COLLECTION SIZE | 19.9 GB |
GITHUB REPOSITORY CLRNA | https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Semi-Supervised-Feature-Learning-with-Center-Loss |
GITHUB REPOSITORY COMBO | https://github.com/CBIIT/NCI-DOE-Colab-Pilot1-Combo-combination-drug-response-predictor |
GITHUB REPOSITORY LEARNING CURVE | https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Learning-Curve |
GITHUB REPOSITORY P1B3 | https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Single-Drug-Response-Predictor |
GITHUB REPOSITORY UNO | https://github.com/CBIIT/NCI-DOE-Collab-Pilot1-Unified-Drug-Response-Predictor |
INSTITUTE | Argonne National Laboratory |
SOURCE ASPURU-GUZIK VAE | https://github.com/aspuru-guzik-group/chemical_vae |
SOURCE CCLE | https://portals.broadinstitute.org/ccle/data |
SOURCE CTRP | https://portals.broadinstitute.org/ctrp/ |
SOURCE DOSE RESPONSE AUC | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753377/ |
SOURCE GDC | https://portal.gdc.cancer.gov/ |
SOURCE GDSC | https://www.cancerrxgene.org/downloads/bulk_download |
SOURCE LINCS1000 | http://lincsportal.ccs.miami.edu/dcic-portal/ |
SOURCE NCI ALMANAC | https://dtp.cancer.gov/ncialmanac/initializePage.do |
SOURCE NCI PDMR | https://pdmdb.cancer.gov/web/apex/f?p=101:41 |
SOURCE NCI SARCOMA | https://sarcoma.cancer.gov/sarcoma/downloads.xhtml |
SOURCE NCI SMALL CELL LUNG CANCER | https://sclccelllines.cancer.gov/sclc/ |
SOURCE NCI-60 - CELLMINER | https://discover.nci.nih.gov/cellminer/loadDownload.do |
SOURCE NCI-60 - DTP | https://dtp.cancer.gov/databases_tools/bulk_data.htm |
SOURCE GCSI | https://pharmacodb.pmgenomics.ca/datasets/4 |
Asset Files
To download files, please login.
FILE/COLLECTION | FILE SIZE | ACTIONS |
---|