The mstate package for estimation and prediction in non- and semi-parametric multi-state and competing risks models

https://doi.org/10.1016/j.cmpb.2010.01.001Get rights and content

Abstract

In recent years, multi-state models have been studied widely in survival analysis. Despite their clear advantages, their use in biomedical and other applications has been rather limited so far. An important reason for this is the lack of flexible and user-friendly software for multi-state models.

This paper introduces a package in R, called ‘mstate’, for each of the steps of the analysis of multi-state models. It can be applied to non- and semi-parametric models. The package contains functions to facilitate data preparation and flexible estimation of different types of covariate effects in the context of Cox regression models, functions to estimate patient-specific transition intensities, dynamic prediction probabilities and their associated standard errors (both Greenwood and Aalen-type). Competing risks models can also be analyzed by means of mstate, as they are a special type of multi-state models. The package is available from the R homepage http://cran.r-project.org.

We give a self-contained account of the underlying mathematical theory, including a new asymptotic result for the cumulative hazard function and new recursive formulas for the calculation of the estimated standard errors of the estimated transition probabilities, and we illustrate the use of the key functions of the mstate package by the analysis of a reversible multi-state model describing survival of liver cirrhosis patients.

Introduction

In recent years, multi-state models have been studied widely as a means to extend classical survival analysis (see Section 2.1 and e.g. [1], [2], [3]). Broadly speaking, multi-state models are used for two purposes. Their first aim is to obtain more biological insight into the disease/recovery process of a patient. In particular it is of interest to see how certain prognostic factors (covariates) influence different phases of this process. The second purpose is prediction; they enable clinicians to obtain more accurate predictions of survival duration for e.g. cancer patients than standard models and also to adjust these predictions in the course of time by incorporating intermediate events.

Despite the clear advantages of multi-state models, biomedical and other researchers have not frequently applied them so far. We can partly explain this limited use by the observation that it is difficult to communicate these more detailed models to colleagues. But another reason might be equally important for the disappointing dissemination of multi-state models in the biomedical literature: the lack of flexible and user-friendly software for multi-state models. This software needs to be capable of setting up and restructuring data, estimating covariate effects and hazards, predicting transition probabilities and calculating associated standard errors. Although a number of packages written in R or other languages are available for their analysis, they all have some limitations (Section 3.1).

We have created a software package that can be used for each of these steps of the analysis of multi-state models. This package, written in R, is called mstate (Sections 3.2 Philosophy and features of the, 3.3 The functions of). It can be applied to non- and semi-parametric models. The package contains functions to facilitate data preparation and flexible estimation of covariate effects in the context of Cox regression models, functions to estimate patient-specific transition intensities and dynamic prediction probabilities and their associated standard errors. These calculations involve the implementation of some rather complicated formulas derived by means of martingale techniques. The main formulas will be presented briefly in Sections 2.2 Notation, basic functions, 2.3 Estimation and data format. Competing risks models can also be analyzed by means of mstate, as they are a special type of multi-state models.

The mstate package is designed in such a way that each of its functions can be used independently: i.e., their input does not necessarily come from other functions in the package. Since a multi-state analysis rarely follows a standard path, we believe that the flexibility that this philosophy offers is an important asset of mstate. An overview of the philosophy and features of mstate is given in Section 3.2.

We shall illustrate the software by means of an analysis of data on liver cirrhosis patients (Section 4).

Section snippets

Multi-state models

A multi-state model is a model for time-to-event data in which all individuals start in one or possibly more starting states (e.g. post diagnosis, transplant or surgery) and eventually may end up in one (or more) absorbing or final state(s) (e.g. death or relapse). In between, intermediate states can be visited, possibly more than once. Some individuals are censored before they reach an absorbing state.

Competing risks models are a sub-category of multi-state models: they have one starting

A short overview of the existing software

Some software for the analysis of multi-state models, written in different languages, is already available. When compared to our package, all of it has some limitations. We discuss first the software in other languages than R, and then the R packages. The programming language R has two major advantages over other languages: it is freely available, and an increasing number of statistically relevant packages has been developed in it (for an overview of survival packages see CRAN Task View:

A multi-state model for liver cirrhosis data

In this paper, we will focus on what we consider to be the most important functions in mstate. A more extensive analysis of these data can be found online in De Wreede, Fiocco and Putter, A multi-state model for liver cirrhosis data, on http://www.msbi.nl/multistate.

Discussion

Multi-state models are a useful extension of classical survival analysis for several reasons. Firstly, they help to give more biological insight into the disease/recovery process of a patient. Secondly, they enable clinicians to obtain more accurate predictions of survival probabilities and to calculate dynamic predictions.

The mstate package, written in R, is meant to help scientists to actually use multi-state and competing risks models. It is primarily meant as a tool in survival analysis,

Conflict of interest statement

None declared.

Acknowledgements

Research leading to this paper was supported by the Netherlands Organization for Scientific Research Grant ZONMW-912-07-018 “Prognostic modeling and dynamic prediction for competing risks and multi-state models”. We are grateful to Per Kragh Andersen for making available the liver cirrhosis data.

References (20)

There are more references available in the full text version of this article.

Cited by (276)

View all citing articles on Scopus
View full text