Asset Detail:
ML Ready Pathology Reports
Asset Detail:
ML Ready Pathology Reports
Overview
ASSET LINK: | https://modac.cancer.gov/assetDetails?dme_data_id=NCI-DME-MS01-7423964 |
PROGRAM NAME: | NCI-DOE Collaboration |
STUDY NAME: | NCI DOE Collaboration MOSSAIC project: Population Information Integration, Analysis, and Modeling for Precision Surveillance |
ASSET NAME: | ML Ready Pathology Reports |
ASSET PATH: | /NCI_DOE_Archive/JDACS4C/JDACS4C_Pilot_3/ml_ready_pathology_reports |
Asset Attributes
ATTRIBUTE | VALUE |
---|---|
ASSET NAME | ML Ready Pathology Reports |
ASSET DESCRIPTION | This asset contains 7187 pathology reports with the associated site and histology labels downloaded from the Genomic Data Commons Platform at the National Cancer Institute. The files in ml_ready_raw_text_pathology_reports.tar.gz were converted from PDF to text using an optical character recognition program (refer to the Tesseract link). An example of a report is available on the GDC archive portal (refer to the GDC link). The file ml_ready_raw_text_histo_metadata.csv contains annotations (such as site and histology) extracted from those reports. This data set is used as input to MT-CNN and HiSan (refer to the GitHub Repository links and Model links). |
ASSET IDENTIFIER | ml_ready_pathology_reports |
ASSET TYPE | Dataset |
PLATFORM VERSION | None |
IS REFERENCE DATASET | No |
COLLECTION SIZE | 11.9 MB |
GDC | https://portal.gdc.cancer.gov/legacy-archive/files/a9a42650-4613-448d-895e-4f904285f508 |
GITHUB REPOSITORY HISAN | https://github.com/CBIIT/NCI-DOE-Collab-Pilot3-Pathology-Reports-Hierarchical-Self-Attention-Network |
GITHUB REPOSITORY MT-CNN | https://github.com/CBIIT/NCI-DOE-Collab-Pilot3-Multitask-Convolutional_Neural_Network |
MODEL HISAN | https://modac.cancer.gov/searchTab?dme_data_id=NCI-DME-MS01-7565752 |
MODEL MT-CNN | https://modac.cancer.gov/searchTab?dme_data_id=NCI-DME-MS01-7330732 |
TESSERACT | https://github.com/tesseract-ocr/ |
Asset Files
To download files, please login.
FILE/COLLECTION | FILE SIZE | ACTIONS |
---|