The mstate package for estimation and prediction in non- and semi-parametric multi-state and competing risks models
Introduction
In recent years, multi-state models have been studied widely as a means to extend classical survival analysis (see Section 2.1 and e.g. [1], [2], [3]). Broadly speaking, multi-state models are used for two purposes. Their first aim is to obtain more biological insight into the disease/recovery process of a patient. In particular it is of interest to see how certain prognostic factors (covariates) influence different phases of this process. The second purpose is prediction; they enable clinicians to obtain more accurate predictions of survival duration for e.g. cancer patients than standard models and also to adjust these predictions in the course of time by incorporating intermediate events.
Despite the clear advantages of multi-state models, biomedical and other researchers have not frequently applied them so far. We can partly explain this limited use by the observation that it is difficult to communicate these more detailed models to colleagues. But another reason might be equally important for the disappointing dissemination of multi-state models in the biomedical literature: the lack of flexible and user-friendly software for multi-state models. This software needs to be capable of setting up and restructuring data, estimating covariate effects and hazards, predicting transition probabilities and calculating associated standard errors. Although a number of packages written in R or other languages are available for their analysis, they all have some limitations (Section 3.1).
We have created a software package that can be used for each of these steps of the analysis of multi-state models. This package, written in R, is called mstate (Sections 3.2 Philosophy and features of the, 3.3 The functions of). It can be applied to non- and semi-parametric models. The package contains functions to facilitate data preparation and flexible estimation of covariate effects in the context of Cox regression models, functions to estimate patient-specific transition intensities and dynamic prediction probabilities and their associated standard errors. These calculations involve the implementation of some rather complicated formulas derived by means of martingale techniques. The main formulas will be presented briefly in Sections 2.2 Notation, basic functions, 2.3 Estimation and data format. Competing risks models can also be analyzed by means of mstate, as they are a special type of multi-state models.
The mstate package is designed in such a way that each of its functions can be used independently: i.e., their input does not necessarily come from other functions in the package. Since a multi-state analysis rarely follows a standard path, we believe that the flexibility that this philosophy offers is an important asset of mstate. An overview of the philosophy and features of mstate is given in Section 3.2.
We shall illustrate the software by means of an analysis of data on liver cirrhosis patients (Section 4).
Section snippets
Multi-state models
A multi-state model is a model for time-to-event data in which all individuals start in one or possibly more starting states (e.g. post diagnosis, transplant or surgery) and eventually may end up in one (or more) absorbing or final state(s) (e.g. death or relapse). In between, intermediate states can be visited, possibly more than once. Some individuals are censored before they reach an absorbing state.
Competing risks models are a sub-category of multi-state models: they have one starting
A short overview of the existing software
Some software for the analysis of multi-state models, written in different languages, is already available. When compared to our package, all of it has some limitations. We discuss first the software in other languages than R, and then the R packages. The programming language R has two major advantages over other languages: it is freely available, and an increasing number of statistically relevant packages has been developed in it (for an overview of survival packages see CRAN Task View:
A multi-state model for liver cirrhosis data
In this paper, we will focus on what we consider to be the most important functions in mstate. A more extensive analysis of these data can be found online in De Wreede, Fiocco and Putter, A multi-state model for liver cirrhosis data, on http://www.msbi.nl/multistate.
Discussion
Multi-state models are a useful extension of classical survival analysis for several reasons. Firstly, they help to give more biological insight into the disease/recovery process of a patient. Secondly, they enable clinicians to obtain more accurate predictions of survival probabilities and to calculate dynamic predictions.
The mstate package, written in R, is meant to help scientists to actually use multi-state and competing risks models. It is primarily meant as a tool in survival analysis,
Conflict of interest statement
None declared.
Acknowledgements
Research leading to this paper was supported by the Netherlands Organization for Scientific Research Grant ZONMW-912-07-018 “Prognostic modeling and dynamic prediction for competing risks and multi-state models”. We are grateful to Per Kragh Andersen for making available the liver cirrhosis data.
References (20)
- et al.
SAS macros for estimation of the cumulative incidence functions based on a Cox regression model for competing risks survival data
Computer Methods and Programs in Biomedicine
(2004) - et al.
MKVPCI: a computer program for Markov models with piecewise constant intensities and covariates
Computer Methods and Programs in Biomedicine
(2001) - et al.
MARKOV: A computer program for multi-state Markov models with covariables
Computer Methods and Programs in Biomedicine
(1995) - et al.
tdc.msm: An R library for the analysis of multi-state survival data
Computer Methods and Programs in Biomedicine
(2007) - et al.
Tutorial in biostatistics: competing risks and multi-state models
Statistics in Medicine
(2007) - et al.
Inference for outcome probabilities in multi-state models
Lifetime Data Analysis
(2008) - L. Meira-Machado, J. de Uña-Álvarez, C. Cadarso-Suárez, P.K. Andersen, Multi-state models for the analysis of...
- P.K. Andersen, Ø. Borgan, R.D. Gill, N. Keiding, Statistical Models Based on Counting Processes, 2nd ed., Springer...
- et al.
The Statistical Analysis of Failure Time Data
(1980) - et al.
Reduced rank proportional hazards regression and simulation-based prediction for multi-state models
Statistics in Medicine
(2008)
Cited by (276)
Merkel cell carcinoma recurrence risk estimation is improved by integrating factors beyond cancer stage: A multivariable model and web-based calculator
2024, Journal of the American Academy of DermatologyMS-CPFI: A model-agnostic Counterfactual Perturbation Feature Importance algorithm for interpreting black-box Multi-State models
2024, Artificial Intelligence in Medicine