Mini Review
Machine learning applications in cancer prognosis and prediction

https://doi.org/10.1016/j.csbj.2014.11.005Get rights and content
Under a Creative Commons license
open access

Abstract

Cancer has been characterized as a heterogeneous disease consisting of many different subtypes. The early diagnosis and prognosis of a cancer type have become a necessity in cancer research, as it can facilitate the subsequent clinical management of patients. The importance of classifying cancer patients into high or low risk groups has led many research teams, from the biomedical and the bioinformatics field, to study the application of machine learning (ML) methods. Therefore, these techniques have been utilized as an aim to model the progression and treatment of cancerous conditions. In addition, the ability of ML tools to detect key features from complex datasets reveals their importance. A variety of these techniques, including Artificial Neural Networks (ANNs), Bayesian Networks (BNs), Support Vector Machines (SVMs) and Decision Trees (DTs) have been widely applied in cancer research for the development of predictive models, resulting in effective and accurate decision making. Even though it is evident that the use of ML methods can improve our understanding of cancer progression, an appropriate level of validation is needed in order for these methods to be considered in the everyday clinical practice. In this work, we present a review of recent ML approaches employed in the modeling of cancer progression. The predictive models discussed here are based on various supervised ML techniques as well as on different input features and data samples. Given the growing trend on the application of ML methods in cancer research, we present here the most recent publications that employ these techniques as an aim to model cancer risk or patient outcomes.

Abbreviations

ML
Machine Learning
ANN
Artificial Neural Network
SVM
Support Vector Machine
DT
Decision Tree
BN
Bayesian Network
SSL
Semi-supervised Learning
TCGA
The Cancer Genome Atlas Research Network
HTT
High-throughput Technologies
OSCC
Oral Squamous Cell Carcinoma
CFS
Correlation based Feature Selection
AUC
Area Under Curve
ROC
Receiver Operating Characteristic
BCRSVM
Breast Cancer Support Vector Machine
PPI
Protein–Protein Interaction
GEO
Gene Expression Omnibus
LCS
Learning Classifying Systems
ES
Early Stopping algorithm
SEER
Surveillance, Epidemiology and End results Database
NSCLC
Non-small Cell Lung Cancer
NCI caArray
National Cancer Institute Array Data Management System

Keywords

Machine learning
Cancer susceptibility
Predictive models
Cancer recurrence
Cancer survival

Cited by (0)