Hypertabastic survival model
© Tabatabai et al; licensee BioMed Central Ltd. 2007
Received: 05 March 2007
Accepted: 26 October 2007
Published: 26 October 2007
Skip to main content
© Tabatabai et al; licensee BioMed Central Ltd. 2007
Received: 05 March 2007
Accepted: 26 October 2007
Published: 26 October 2007
A new two-parameter probability distribution called hypertabastic is introduced to model the survival or time-to-event data. A simulation study was carried out to evaluate the performance of the hypertabastic distribution in comparison with popular distributions. We then demonstrate the application of the hypertabastic survival model by applying it to data from two motivating studies. The first one demonstrates the proportional hazards version of the model by applying it to a data set from multiple myeloma study. The second one demonstrates an accelerated failure time version of the model by applying it to data from a randomized study of glioma patients who underwent radiotherapy treatment with and without radiosensitizer misonidazole. Based on the results from the simulation study and two applications, the proposed model shows to be a flexible and promising alternative to practitioners in this field.
Time to event models, commonly known as survival or reliability models, have been studied and applied in a variety of scientific disciplines such as medicine, engineering and business. The Hosmer and Lemeshow , Lee and Wang , Kleinbaum and Klein , and Collet  books give a detailed overview of survival data modeling techniques. Non-parametric and semi-parametric survival models such as the Cox regression analysis have been the most widely used models in the analysis of time to event survival data . On the other hand, if the assumption for parametric probability distribution is met for the data set under consideration, it will result in more efficient and easier to interpret estimates than non-parametric or semi parametric models. A comprehensive review was given by Efron  and Lee and Go .
Parametric hazard functions can enable clinicians and researchers to model various disease scenarios, assess disease prognosis and progression, give valuable insights on the pattern of failure, and understand the pathogenesis of a chronic disease and how they are affected by different treatment effects [8–10]. Estimation of hazard function is also useful in the analysis of change-point hazard rate models. It helps policy makers with cost effective health care policy decisions . Lundin et al.  estimated the survival probabilities in breast cancer patients and concluded that parametric survival estimates may be more precise than Kaplan-Meier estimates when there are few patients in a particular stratum. Royston and Parmar  modeled the baseline distribution function by restricted cubic regression spline. Kay et al.  discussed the use of hazard functions in breast cancer studies. They believe that the hazard function is an important tool in investigating disease curability and can help the clinician to express his ideas regarding disease progression and the biology of treatment effect. Foulkes et al.  used parametric modeling to assess the prognostic factors in the recurrence of ischemic strokes. Sama et al.  used five parametric models to analyze the survival time data of infections and they found that the best fit could be obtained using parametric models. They also indicated that parametric models can be used to model the duration of malaria infections. Kannan et al.  used log-logistic probability distribution to model altitude decompression sickness (DCS) risk and symptom onset time. They concluded that the log-logistic model could provide good estimates of the probability of DCS over time. Nardi and Schemper [18, 19] emphasized the role of residuals for the selection of survival models. They argued that when empirical data is sufficient, parametric models provide some insight into the shape of the baseline hazard function.
In the Cox model, the baseline hazard function is regarded as a nuisance parameter, while in parametric models, the hazard function reflects the time course of the process under study. In this paper, we introduce a new two-parameter continuous probability distribution called hypertabastic probability distribution. The hypertabastic hazard function can assume a different variety of shapes. It can be used to analyze biomedical data such as cancer recurrence time. Based on the hypertabastic distribution, we introduce the hypertabastic survival model which includes the hypertabastic proportional hazards model with parametric baseline hazard function, the hypertabastic accelerated failure model and the hypertabastic proportional odds model. The hypertabastic distribution can be used to analyze the accelerated hazards regression model of Chen . It can also be used to monitor the disease progression and regression and provide the clinicians with the time interval(s) on which the disease progresses or regresses and the time interval(s) on which the disease progression or regression speeds up or slows down. This vital information will make it easier for the physicians to take appropriate action regarding their patients.
In this section we introduce a new probability distribution which can be used in many scientific disciplines. Such disciplines may include, but are not limited to, biomedical, engineering, and business fields.
where W(t) = α(1 - t β Coth(t β ))/β.
Definition 3.1 (The Hypertabastic Survival Function) Let T be a continuous random variable representing the waiting time until the occurrence of an event. Then the hypertabastic survival function is defined asS(t) = P(T > t) = Sech [α(1 - t β Coth(t β ))/β]
where P (T > t) is the probability that waiting time exceeds t.
Definition 3.2 (The Hypertabastic Hazard Function) The hypertabastic hazard function h(t) which represents the instantaneous failure rate at time t given survival up to time t is defined ash(t) = α(t2β-1Csch(t β )2 - tβ-1Coth(t β ))Tanh [W(t)]
The cumulative hazard function H(t) is defined asH(t) = - ln(Sech [W(t)]).
The hazard function is a conditional failure rate which gives the instantaneous potential for failing at time t per unit time for an individual surviving to time t.
The most commonly used baseline hazard functions are Weibull, log-normal and log-logistic. The Weibull baseline hazard function has monotone increasing, monotone decreasing and constant forms. The log-normal baseline hazard function is non-monotonic. It follows a behavior that increases to a maximum, and then decreases (unimodal shaped). The log-logistic baseline hazard function can be monotone decreasing or have unimodal shape. These different hazard shapes may reveal different biological mechanisms of disease progression and regression and can provide helpful medical information. The hypertabastic baseline hazard function shapes are given as follows along with illustrative examples from published literature that could fit a pattern of that particular behavior where applicable:
1. The hypertabastic baseline hazard function is monotone decreasing if 0 <β ≤ 0.25 (Figure 1a). Clark et al.  analyzed data for 825 patients diagnosed with primary epithelial ovarian carcinoma and observed that the hazard rate was initially high after the diagnosis and gradually decreased afterwards.
2. The hypertabastic baseline hazard function first increases with time until it reaches its maximum and then decreases (unimodal-shaped) if 0.25 <β < 1 (Figure 1b). Demicheli at al.  concluded that the hazard rate for node-positive post-menopausal women was unimodal-shaped. Schulman et al.  studied the influence of donor and recipient HLA locus mismatching on the development of obliterative bronchiolitis (OB) after lung transplantation and estimated the risk of OB after the transplant. Their estimated hazard function of developing OB indicated a unimodal-shaped curve as well.
3. The hypertabastic baseline hazard initially increases with time, then it reaches its horizontal asymptote α provided that β = 1 (Figure 1c). Weitz and Fraser  concluded that hazard rate plateaus are explained as a generic consequence of considering death in terms of first passage time for processes undergoing a random walk with drift. They analyzed the hazard rate plateau in populations of fruit flies, yeast and other organisms.
4. The hypertabastic baseline hazard function is increasing with upward concavity until it reaches its inflection point and then it continues to increase with downward concavity thereafter if 1 <β < 2 (Figure 1d). Such a hazard can be used in many applied sciences when failure rate increases with respect to increase in time. For instance, it can be used to model leukemia survival times for patients not responding to treatment .
5. The hypertabastic baseline hazard function is increasing with upward concavity for a while and then it becomes a linear function with slope α if β = 2. (Figure 1e).
6. The hypertabastic baseline hazard function is increasing with upward concavity if β > 2. (Figure 1f).
Definition 3.3 (Disease progression and regression) Let the hazard function be defined on interval I and h'(t) be its derivative.
1. Disease progresses on the time interval I if h'(t) > 0 for all t in I.
2. Disease regresses on the time interval I if h'(t) < 0 for all t in I.
3. Disease neither progresses nor regresses on the interval I if h'(t) = 0 for all t in I.
Definition 3.4 (Speed of progression and regression) Let h(t) be a hazard function defined on time interval I and let h'(t) and h"(t) be first and second derivative of h(t) respectively.
1. If h'(t) > 0 and h"(t) > 0 for all t in I, then the disease progression speeds up on the time interval I.
2. If h'(t) > 0 and h"(t) < 0 for all t in I, then the disease progression slows down on the time interval I.
3. If h'(t) < 0 and h"(t) > 0 for all t in I, then the disease regression slows down on the time interval I.
4. If h'(t) < 0 and h"(t) < 0 for all t in I, then the disease regression speeds up on the time interval I.
Definitions 3.3 and 3.4 can assist clinicians, researches and pharmacologists to monitor disease status over time. If for instance the goal of the drug treatment study is to slow down the disease progression and to understand the time course and management of disease, the above mentioned definitions may be very useful. See for example, the paper by Chan and Holford  who studied the rate of deterioration of degenerative diseases over time.
When the effect of risk factors is to change the baseline hazard function by a proportionate amount at all times t, we call the model a proportional hazards model. Suppose vector X is a p-dimensional vector of covariates and assume g(X|θ) is a non negative function of X satisfying the condition that g(0|θ) = 1. Let h0(t) which has been defined in section 3 as h(t), be called baseline hazard function. This is the hazard function when there are no covariates in the model. The hypertabastic proportional hazard model assumes a hazard function h(t|X, θ) of the formh(t|X, θ) = h0(t)g(X|θ)
where θ is a vector of unknown parameters. The hypertabastic survival function S(t|X, θ) for the proportional hazards model is defined asS(t|X, θ) = [S0(t)]g(X|θ)
where the baseline survival function S0(t) is defined as S(t) in section 3. The hypertabastic probability density function for the proportional hazard model is denoted by f(t|X, θ) and is equal tof(t|X, θ) = f0(t) [S0(t)]g(X|θ)-1g(X|θ)
where the baseline probability density function f0(t) is defined as f(t) in section 2. Depending on the type of censoring, the maximum likelihood function technique along with an appropriate log-likelihood function may be used to estimate the unknown parameters in this model. Most data used in survival analysis have only right censoring, therefore we will focus on right censoring. Consider a sample of right censored survival time's data of n individuals t1, t2, ..., t n with associated p-dimensional covariate vectors X1, X2, ..., X n and an unknown parameter vector θ = (θ1, θ2, ..., θ p ).
The maximum likelihood estimate of (p+2) dimensional parameter vector λ = (α, β, θ1, θ2, ..., θ p ) is the vector . Asymptotically, is normally distributed with mean vector λ and variance-covariance matrix V. An estimate of V can be obtained by calculating the inverse of the observed information matrix.
To assess the statistical significance of model parameters, one can use the well known statistical tests such as likelihood ratio test, Wald test, or the score test.
The hazard ratio HR(X i , X j ) for individuals i and j with covariate vectors X i and X j is given byHR(X i , X j ) = g(X i |θ)/g(X j |θ)
The most common form of g(X|θ) is exp [-θ T X] . If we use , then the hazard ratio HR(X i |X j ) = exp [-θ T (X i - X j )] where HR(X i , X j ) is independent of time t. If X i = X and X j = 0, then the hazard ratio becomes HR(X i , 0) = g(X|θ) and if we use , then , where -θ j x j is the elasticity of hazard rate with respect to covariate x j . Symbolically,∂h(t|X, θ)/∂x j ]·[x j /h(t|X, θ)] = -θ j x j .
The elasticity of hazard function with respect to covariate x j is a measure of the responsiveness of the hazard function to change in covariate x j . Intuitively, elasticity of the hazard function with respect to change in covariate x j is percent change in failure rate divided by percent change in covariate x j .
When the covariates act multiplicatively on the time-scale, the model is called accelerated failure time model [3, 4]. The hypertabastic accelerated failure time model assumes a hazard function h(t|X, θ) of the formh(t|X, θ) = h0(tg|X, θ))g(X, θ)
where h0(•) is the baseline hypertabastic hazard function and the hypertabastic survival function for the accelerated failure time model isS(t|X, θ) = S0(tg(X|θ))
where S0(•) is the baseline hypertabastic survival function. Finally, the hypertabastic probability density function for the accelerated failure time model isf(t|X, θ) = f0(tg(X|θ))g(X|θ)
where f0(•) is the baseline hypertabastic probability density function. The maximum likelihood technique can be used to estimate the parameters of this model and statistical tests similar to section 4 can be used to assess the significance of model covariates. For the right censored data, the hypertabastic accelerated failure time model would have a log-likelihood function of the form
The maximum likelihood estimate of (p+2) dimensional parameter vector (α, β, θ1, θ2, ..., θ p ) and testing of hypothesis regarding model parameters are similar to methods of section 4.
If the effect of risk factor is to change the odds of survival beyond time t by a proportionate amount, then the model is called proportional odds model. The odds of surviving beyond time t are expressed asS(t|X, θ)/(1 - S(t|X, θ)) = g(X|θ)S0(t)/(1 - S0(t))
where S0(t) is the baseline hypertabastic survival function. The hypertabastic survival function for the proportional odds model is given byS(t|X, θ) = 1/[1+((1 - S0(t))/S0(t))g(X|θ)]
and the hypertabastic baseline odds function of survival beyond time t is given byS0(t)/(1 - S0(t)) = Sech [W(t)]/(1 - Sech [W(t)])
The ratio of the odds of survival of individual i relative to individual j is equal to exp [-θ T (X i - X j )]. The hypertabastic proportional odds model can be fitted by maximizing an appropriate likelihood function.
Average -log likelihoods and their standard errors for Weibull (W), hypertabastic (HT), log-logistic (LL) and log-normal (LN) models based on 1000 simulations from eleven parameter combinations of two parameter Weibull, log-normal and gamma distributions
In simulations where we sample from a two-parameter Weibull distribution, obviously, the Weibull model fits the data with highest precision in all instances. Hypertabastic model is a close second, outperforming log-normal and log-logistic models for all combinations of parameters, with log-normal being the worst. Similarly when sampling is from a two-parameter log-normal distribution, the log-normal model outperforms all other models. The hypertabastic model and the log-logistic show similar results, with the log-logistic being slightly better in eight combinations of parameters and the Weibull model performing the worst. Finally, when sampling from a two parameter gamma distribution, the Weibull model fits with the most precision in seven out of eleven combinations. That is because the Weibull distribution and the gamma distribution have hazard functions which are similar in shapes. In another four instances, the hypertabastic model slightly outperforms the Weibull model. The log-logistic model comes in third, however it is close to the hypertabastic and the Weibull for several combinations, while the log-normal does the worst in all instances.
All models presented in the next two examples were fitted using Mathematica 6.0 (Wolfram Research, Inc.) and SASv9 (SAS Institute Inc., Cary, NC). They were fitted using both time and log(time). However, only log(time) results are presented for the multiple myeloma proportional hazards and brain cancer accelerated failure time models since log(time) is commonly used as a default approach in many packages and procedures. Besides the hypertabastic survival model we fit Weibull, log-normal, log-logistic and Cox models. We present all parameter estimates along with their standard errors and compare -2log-likelihood estimates, as well as AIC, to assess the goodness-of-fit of the models. SAS programs used to fit the hypertabastic proportional hazards models with time and log(time) for multiple myeloma data are provided [see Additional file 1].
Statistical results for hypertabastic proportional hazards and four comparison models for multiple myeloma data
Statistical results for the hypertabastic accelerated failure time model and four comparison models for glioma brain cancer data
In this paper we have introduced a new survival model called hypertabastic survival model. The overall results of our simulation indicate that the hypertabastic model performs best when the data is generated from the Weibull distribution. When the data comes from the gamma distribution, both the Weibull and hypertabastic distributions perform well. In the case when we generate survival data from the log-normal distribution, the log-logistic and hypertabastic distributions perform well but the gamma and Weibull perform poorly. In the last case, the Weibull distribution performance is the worst. We believe that the hypertabastic survival model is to some extent robust with respect to variations in the distribution. We believe that the hypertabastic hazard function can play a role in modeling failure rates in medical, biological and engineering fields. The hypertabastic model can also assist physicians and clinicians with their treatment planning through its ability to predict patients' outcome as well as the risk of disease recurrence. Gilbert et al.  argued that the pattern of instantaneous risk over time is more interesting than the cumulative risk. For instance, in a cancer study where recurrence time of malignant tumor is of interest, a bimodal hazard curve may represent the elevated incidences of early and late recurrences and the magnitude of the hazard rates at the peaks may reveal the intensity of the failure rates. Therefore we recommend that clinicians, practitioners and analysts consider using this model along with other models, and compare it to the models they ordinarily use before making any decision as to which model would provide the best fit and prediction.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.