Machine Learning Algorithm Analysis using 55,780 Cases from a Commercial 592-gene NGS Panel to Accurately Predict Tumor Type for Carcinoma of Unknown Primary (CUP)

Authors:

Jim Abraham, Amy B. Heimberger, John Marshall, Joanne Xiu, Anthony Helmstetter, Daniel Magee, Adam Morgan, Curtis Johnston, Zoran Gatalica, Wolfgang Michael Korn, David Spetzler

Background:

The diagnosis of a malignancy is typically informed by clinical presentation and tumor tissue features including cell morphology, immunohistochemistry, cytogenetics, and molecular markers. However, in approximately 5-10% of cancers1,2, ambiguity is high enough that no tissue of origin can be determined and the specimen is labeled as a Cancer of Occult/Unknown Primary (CUP). Lack of reliable classification of a tumor poses a significant treatment dilemma for the oncologist leading to inappropriate and/or delayed treatment. Gene expression profiling has been used to try to identify the tumor type for CUP patients, but suffers from a number of inherent limitations. Specifically, tumor percentage, variation in expression, and the dynamic nature of RNA all contribute to suboptimal performance. For example, one commercial RNA-based assay has sensitivity of 83% in a test set of 187 tumors and confirmed results on only 78% of a separate 300 sample validation set3.

Methods:

55,780 tumor patients with NGS data were used to construct a multiple parameter tumor type specific classification system using an advanced machine learning approach.

Conclusions:

  • Final performance of DNA-based tumor type identification on an independent test of 15,000+ patient samples is superior to current standards using gene expression based methods
  • Unbiased training machine learning techniques applied to more than 45,000 enabled detection of tumor types independent of sampling location or tumor percentage
  • Tumor type predictors can render a histologic diagnosis to CUP cases that can inform treatment and potentially improve outcomes
  • Cancer of unknown primary remains a substantial problem for both clinicians and patients, diagnosis can be aided with the algorithms presented here.
  • Returning both diagnostic and therapeutic information that optimize patients treatment strategy from a single test is a substantial improvement over the current standard of multiple tests that require more tissue

Download Publication