 Research
 Open Access
 Published:
A parametric method for cumulative incidence modeling with a new fourparameter loglogistic distribution
Theoretical Biology and Medical Modelling volume 8, Article number: 43 (2011)
Abstract
Background
Competing risks, which are particularly encountered in medical studies, are an important topic of concern, and appropriate analyses must be used for these data. One feature of competing risks is the cumulative incidence function, which is modeled in most studies using non or semiparametric methods. However, parametric models are required in some cases to ensure maximum efficiency, and to fit various shapes of hazard function.
Methods
We have used the stable distributions family of Hougaard to propose a new fourparameter distribution by extending a twoparameter loglogistic distribution, and carried out a simulation study to compare the cumulative incidence estimated with this distribution with the estimates obtained using a nonparametric method. To test our approach in a practical application, the model was applied to a set of real data on fertility history.
Conclusions
The results of simulation studies showed that the estimated cumulative incidence function was more accurate than nonparametric estimates in some settings. Analyses of real data indicated that the proposed distribution showed a much better fit to the data than the other distributions tested. Therefore, the new distribution is recommended for practical applications to parameterize the cumulative incidence function in competing risk settings.
Background
In medical research with timetoevent data, there may be more than one final outcome of interest, and this circumstance can complicate the statistical analysis. In such cases, events other than the desired one(s) are considered as competing risks when their occurrence prevents the event of interest [1, 2]. An important quantity in competing risk settings is the cumulative incidence function (CIF), which makes it possible to calculate the probability of a particular event. In contrast, the causespecific hazard function (CSHF) calculates the instantaneous rate of the event. For example, in fertility studies in women, researchers are interested in calculating the cumulative live birth rate in the presence of competing risks over time. Competing events, such as the probability of stillborn fetuses or abortions, can be calculated.
Most competing risk analyses of CIF are estimated non or semiparametrically [3, 4]. However, the parametric model is another available approach for modeling CIF. The advantage of parametric methods compared to non and semiparametric ones is that if a parametric model is selected correctly, it can predict the probability of the occurrence of events in the long term and provide additional insights about the time to failure and hazard functions [5]. Also, when the survival pattern follows a particular parametric model, the estimates from true model fit are usually more accurate than the nonparametric estimates.
The best known distributions for modeling CIF are the Weibull and Gompertz distributions. However, these are suitable only for hazard functions that increase or decrease monotonically; they are inadequate when the hazard function shape is unimodal. In such cases, simple distributions such as the twoparameter loglogistic or lognormal distributions are likely to be better choices. One approach to the construction of flexible parametric models is to add a shape parameter to provide a wide range of hazard shapes and improve the models in survival data. In 1996, Mudholkar et al. proposed a generalized Weibull family with a range of hazard shapes [6] and Foucher et al. in 2005 applied this distribution in semiMarkov models [7]. In 2006, Sparling et al. presented a threeparameter family of survival distributions that included the Weibull, negative binomial, and loglogistic distributions as special cases [8]. These distributions can fit Ushapes or unimodal shapes for the hazard function, and therefore can be appropriate for survival data.
In light of the issues summarized above, a more efficient parametric distribution with various shapes of hazard patterns would appear to be useful for estimating CIF in competing risk situations. In recent years, various parametric distributions have been developed specifically for analyzing competing risk data that offer more flexibility. For example, in 2006 Jeong introduced a new parametric distribution for modeling CIF [5]. In 2009, Wahed et al. developed Weibull's distribution, resulting in a betaWeibull fourparameter distribution for use in competing risks [9]. Here, we propose a new fourparameter loglogistic distribution by extension of a twoparameter loglogistic distribution that contains different kinds of hazard shapes in survival data and increases the efficiency of the CIF over the nonparametric approaches. Also, this is an improper distribution which enjoys more flexibility for modeling of CIF. Therefore, it would be suitable for competing risk models. We have performed a simulation study to compare CIF estimates obtained with the fourparameter distribution and a nonparametric method. After using simulated data to assess the method, we analyzed a real data set to examine the efficiency of our proposed distribution.
Methods
Introduction of the new distribution
The survival function according to a twoparameter loglogistic distribution is as follows:
where λ > 0 and τ > 0 are the scale and shape parameters, respectively. If τ ≤ 1, the hazard function decreases monotonically, whereas if τ > 1, the hazard function is unimodal [10].
Survival function of the fourparameter loglogistic distribution
The twoparameter loglogistic distribution is expanded on the basis of the family of Hougaard stable distributions, whose survival function is as follows:
where H is the cumulative hazard function [11]. If a twoparameter loglogistic cumulative hazard function is used instead of H, we obtain a new distribution that is improper. In addition, to reduce the number of parameters, the substitution υ = θ^{2α} is used [12]. The survival function of the new distribution is constructed as:
where the parameter space is θ > 0, λ > 0, τ > 0, ∞ < α < ∞. The survival function must be between zero and one, as shown in the Appendix. If α < 0, the survival function is improper. This is an important characteristic of CIF modeling that differs from the twoparameter loglogistic distribution and other distributions.
Hazard function
The hazard function can be directly obtained from equation (3), as:
Because of the complexity of this hazard function formula, there is no simple mathematical expression for different types of hazard function. The flexibility of the hazard function is shown in Figure 1. Compared to the twoparameter model, the fourparameter loglogistic distribution has a flexible hazard function that can be monotonically decreasing or increasing, unimodal, or Ushaped.
Cumulative incidence function
Competing risks data are represented as a pair (T, δ) where δ is the indicator variable, defined as δ = 0 if the observation is censored, and as δ = 1,2,...,K where K is the number of competing events. T is the time to first event or censoring. The two major quantities in the analysis of competing risks data are CSHF and CIF. The CSHF rate for event k is the instantaneous event rate for an individual who experiences event k at time t given that the subject experiences no other type of event up to t. The CIF for event k, F_{ k } (t) = P(T ≤ t, δ = k), is the cumulative probability of observing event k by time t. The CIF for event k is defined as follows:
where S(u) = P(T > u) and h_{ k }(u) is the hazard function for the k th causespecific event. In the literature, parametric methods are proposed to estimate CIF with the CSHF method [5, 9, 13]. Here we have also used the CSHF method to model CIF.
To estimate the CIF nonparametrically, the overall survival function should be replaced with the KaplanMeier estimate and the causespecific cumulative hazard function with the NelsonAalen estimate [3].
Estimation method
For convenience, we have assumed throughout this paper that there were two events: the desired event k = 1 and a competing event k = 2; and that n is the sample size. Because the two event are mutually exclusive, the overall survival function factored into a product of two causespecific survival functions, i.e. S(t, ψ) = S_{ 1 }(t,ψ_{ 1 }) S_{ 2 }(t, ψ_{ 2 }). Therefore, the likelihood function of the parametric inference is constructed as:
where ψ_{ k } = (λ_{ k } , τ_{ k } , θ_{ k } , α_{ k } ) is the parameter vector for event k, S_{ k } (t, ψ_{ k } ) is the survival function for event k, and f_{ k } (t, ψ_{ k } ) is the density function of event k based on a fourparameter loglogistic distribution.
If event k occurs, δ_{ ki } = 1; otherwise δ_{ ki } = 0 (k = 1,2, i = 1,2,...,n). The covariance matrix, {I}^{1}\left({\widehat{\psi}}_{1},{\widehat{\psi}}_{2}\right), is estimated by the inverse of the Fisher information matrix [14]. According to the invariant property of the maximum likelihood estimate (MLE), the CIF is estimated by substituting \widehat{\psi} in expression (5), which yields {\widehat{F}}_{k}\left(t\right)={\int}_{0}^{t}\u015c\left(u\right){\u0125}_{k}\left(u\right)\phantom{\rule{0.3em}{0ex}}du.
Simulation study
A simulation study was used to compare the cumulative incidence estimate of the proposed distribution with a threeparameter distribution proposed by Sparling [8] and the nonparametric method at different times. As described by Beyersmann in 2009, we first simulated survival times T with allcause hazards h_{ 1 }(t) + h_{ 2 }(t) on the basis of a twoparameter loglogistic distribution, with λ_{ 1 } = 0.3, τ_{ 1 }= 2.97 for the event of interest and λ_{ 2 } = 0.03, τ_{ 2 } = 1.1 for the competing event (based on fertility data). The event type was then determined by a binomial experiment with probability h_{ 1 }(t)/(h_{ 1 }(t) + h_{ 2 }(t)) on event type 1 [15, 16]. Additionally, we generated censoring times with a binomial experiment. The data sets were simulated with sizes n = 1000, and a 7% censoring level. Using the data thus produced, we applied the fourparameter loglogistic, Sparling distributions, and nonparametric method to these data. Accordingly, 1000 samples were generated and the bias and empirical mean square error (MSE) of the CIF at time t were calculated as follows:
where F_{ 1 }(t) is the true value of CIF at time t[17].
To test the efficiency of the parametric distribution proposed here, we used another simulation study. Failure times were generated on the basis of a twoparameter Weibull distribution with k_{ 1 } = 1.4, p_{ 1 }= 0.45 for the event of interest and k_{ 2 } = 1.04, p_{ 2 } = 0.03 for the competing event. We used the same method to fit the new distribution to these data.
The maximum likelihood estimates of the parameter vectors were calculated by PROC NLMIXED in SAS v. 9.1, and the nonparametric estimate of CIF was obtained with the "cuminc" R function from the "cmprsk" library. Because the determination of a suitable initial value to fit the models is an important problem in numerical studies, many initial values were examined to find a suitable convergence.
Results
Table 1 summarizes the results of the first simulation in which the fourparameter loglogistic, Sparling distribution and nonparametric methods were fit for different times with n = 1000. The results showed that the bias and MSE of the CIF estimates obtained with the fourparameter method for the event of interest at t = 1.25 to t = 2 were smaller than with the Sparling distribution and the nonparametric method. For the competing event, the bias and MSE of the CIF estimates were lower than with the nonparametric method.
The results of the second simulation are summarized in Table 2. Up to t = 1.5, the bias and the MSE of the CIF estimates obtained with the nonparametric method for the event of interest were lower than with the fourparameter method, but after t = 2, the bias and MSE of the CIF estimates for the competing event with the new distribution were equivalent or slightly lower than with the nonparametric method. For the competing event, the bias and MSE of the CIF estimates were lower than with the nonparametric method at all times.
In summary, these two simulations indicate that the fourparameter modeling of CIF was as efficient as the nonparametric method and the Sparling distribution and sometimes led to better estimates of CIF. Moreover, the fourparameter loglogistic model performed well under a Weibull distribution.
Example: women's fertility history
We tested the proposed distribution on a set of real data. In a crosssectional study, the fertility history of 858 women aged 1549 years in rural areas of the Shiraz district (southwestern Iran) was reviewed (unpublished data). The women were selected by multistage random sampling from a list of villages in 2008. Only the first pregnancy of each woman was included in this study. A selfadministered questionnaire regarding fertility history was used. After women with an undesired first pregnancy were excluded, the final sample consisted of 652 women. Live birth as a result of the first delivery was our desired event, and a stillborn fetus or abortion was the competing event. The event time was defined as the interval between marriage and a live birth, a competing event or censoring. Also, women who had not given birth on the date of interview (7% in this data set) were censored.
The estimated cumulative incidence of live births and abortions or stillborn fetuses based on the two and fourparameter loglogistic, Weibull, Gompertz and Sparling distributions and the nonparametric estimates are shown in Figure 2. Up to time t = 3, the cumulative incidence of live births increased rapidly; thereafter, cumulative incidence tended to plateau. This means that the probability of live births during the first four years after marriage increased rapidly, and remained approximately constant thereafter. The curves also show that the fourparameter loglogistic distribution was closer to the nonparametric estimate than the other distributions at all times. For shorter intervals since marriage, the twoparameter loglogistic and Sparling distributions were closer to the nonparametric estimates than to the Weibull and Gompertz distributions. After t = 5, all distributions were close to the observed data.
Table 3 shows the Akaike information criterion (AIC), Bayesian information criterion (BIC) and estimated cumulative incidence for two events in different times. Based on AIC and BIC criteria, the fourparameter loglogistic model with the lowest AIC and BIC showed a better fit to the data than the twoparameter loglogistic, Sparling, Weibull or Gompertz distributions. Because the twoparameter loglogistic distribution is nested within the Sparling and the fourparameter loglogistic distributions, we can compute likelihoodratio chisquare statistics to test the fit of the nested models. The likelihoodratio chisquare statistics and their corresponding pvalues are:
χ^{2} = 69.2, df = 1, p < 0.001 for twoparameter loglogistic versus Sparling and χ^{2} = 217.1, df = 2, p < 0.001 for twoparameter loglogistic versus fourparameter loglogistic. Likelihoodratio test, AIC and BIC show the fourparameter loglogistic distribution fits the data better than twoparameter loglogistic and Sparling distributions. These results confirm the findings in Figure 2, and again indicate that the proposed distribution shows a closer fit to the observed data than the other distributions to which it is compared.
Discussion
Although nonparametric methods such as the KaplanMeier approach are widely used in survival analysis and may show a very close fit to the data, they do not provide additional information about the nature of the data. Therefore, in this study our ultimate aim was to develop a new parametric distribution by extension of the twoparameter loglogistic distribution. The addition of third and fourth parameters allows the model to capture Ushaped hazards.
Our simulation study showed that the parametric estimate of CIF with the new distribution was slightly less biased and had a smaller MSE than the estimate obtained using nonparametric methods. Simulations with the twoparameter loglogistic and Weibull distributions showed that our proposed fourparameter distribution had appropriate efficiency. Also, analyses of real data indicated that the proposed distribution showed a much better fit to the data than the other distributions tested. Our results are consistent with other studies in finding that an appropriate parametric model yields more precise estimates of cumulative incidence than nonparametric methods, and is thus a potentially suitable way to describe quantities of competing risks [9, 18]. In contrast, if a parametric model is misspecified, the quantities will be estimated incorrectly, which will clearly bias the inferences [12]. However, our proposed distribution captures various hazard shapes well, which extends its applicability to a variety of survival data.
In addition to this advantage, the proposed distribution is improper for α < 0. This property makes our proposed distribution superior to other distributions such as the Weibull, twoparameter loglogistic, threeparameter Sparling and generalized Weibull models [6, 8]. This characteristic of our distribution also makes it possible to evaluate the direct effect of covariates on CIF, which is not possible in the CSHF model [19, 20]. The potential applications of direct modeling of CIF and parametric regression models with the fourparameter loglogistic distribution will be examined in forthcoming papers.
Conclusions
Despite the complexity of this distribution for modeling CIF (which is one of its limitations), the results of our simulation study and realdata application show that the new distribution achieves a much better fit to the data than other distributions that use fewer parameters. Whereas the twoparameter loglogistic is a proper distribution, the fourparameter loglogistic is an improper distribution in the subset of parameter space. Therefore, this distribution is suitable for parameterizing CIF directly in competing risk models. Moreover, it is can be added to a family of distributions and also potentially useful for parameterizing survival data in general.
Appendix
The survival function of the new distribution is as follows:
The parameter space is θ > 0, τ > 0, λ > 0, ∞ < α <∞. The survival function must be between zero and one for all values in the parameter space. If (θ^{2}[(log(1+λt^{τ} )/θ+1] ^{α} /α1) > 0, then the condition holds. First, if α > 0, log(1+λt^{τ} )/θ + 1 must be positive, which implies that log(1+λt^{τ} )/θ > 0 since λ > 0, τ > 0 and θ > 0, log(1+λt^{τ} )/θ is always positive. Thus, the condition holds for α > 0. The same result follows for the case α < 0.
Authors' information
Corresponding author: SMT Ayatollahi, Ph.D., FSS, C.Stat. Professor of Biostatistics, The Medical School, Shiraz University of Medical Sciences, Shiraz, Islamic Republic of Iran. P.O.Box 713451874.
Abbreviations
 CIF:

cumulative incidence function
 CSHF:

causespecific hazard function MSE: mean square error
 MLE:

maximum likelihood estimate
 AIC:

Akaike information criterion
 BIC:

Bayesian information criterion.
References
 1.
Pintilie M: Competing Risks, A Practical Perspective. 2006, Chichester: John Wiley & Sons
 2.
Putter H, Fiocco M, Geskus RB: Tutorial in biostatistics, Competing risks and multistate models. Statistics in Medicine. 2007, 26: 23892430. 10.1002/sim.2712.
 3.
Gray RJ: A class of Ksample tests for comparing the cumulative incidence of a competing risk. Annals of Statistics. 1988, 16: 114154. 10.1214/aos/1176350951.
 4.
Fine JP, Gray RJ: A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 1999, 94: 496509. 10.2307/2670170.
 5.
Jeong JH: A new parametric family for modelling cumulative incidence functions: application to breast cancer data. Journal of the Royal Statistical Society, Series A. 2006, 169 (2): 289303. 10.1111/j.1467985X.2006.00409.x.
 6.
Mudholkar GS, Srivastava DK, Kollia GD: A Generalization of the Weibull Distribution with Application to the Analysis of Survival Data. Journal of the American Statistical Association. 1996, 91 (436): 15751583. 10.2307/2291583.
 7.
Foucher Y, Mathieu E, Philippe SaintPierre P, Durand JF, Daures JP: A SemiMarkov Model Based on Generalized Weibull Distribution with an Illustration for HIV Disease. Biometrical. 2005, 47 (6): 19.
 8.
Sparling YH, Younes N, Lachin JM: Parametric survival models for intervalcensored data with timedependent covariates. Biostatistics. 2006, 7 (4): 599614. 10.1093/biostatistics/kxj028.
 9.
Wahed AS, Loung M, Jeong JH: A new generalization of Weibull distribution with application to a breast cancer data set. Statistics in Medicine. 2009, 28: 20772094. 10.1002/sim.3598.
 10.
Klein JP, Moeschberger ML: Survival Analysis Techniques for Censored and Truncated Data. 2003, New York: Springer
 11.
Hougaard P: Survival models for heterogeneous populations derived from stable distributions. Biometrika. 1986, 73: 387396. 10.1093/biomet/73.2.387.
 12.
Haile SR: Inference on competing risks in breast cancer data. PhD Thesis, University of Pittsburgh, Biostatistics Department. 2008
 13.
Benichou J, Gail MH: Estimates of absolute causespecific risk in cohort studies. Biometrics. 1990, 46: 813826. 10.2307/2532098.
 14.
Jeong JH, Fine JP: Direct parametric inference for the cumulative incidence function. Applied Statistics. 2006, 55: 187200. 10.1111/j.14679876.2006.00532.x.
 15.
Beyersmann J, Latouche A, Buchholz A, Schumacher M: Simulating competing risks data in survival analysis. Statistics in Medicine. 2009, 28: 956971. 10.1002/sim.3516.
 16.
Bender R, Augustin T, Blettner M: Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine. 2005, 24: 17131723. 10.1002/sim.2059.
 17.
Burton A, Altman DG, Royston P, Holder RL: The design of simulation studies in medical statistics. Statistics in Medicine. 2006, 25: 42794292. 10.1002/sim.2673.
 18.
Cheng Y: Modeling cumulative incidences of dementia and dementiafree death using a novel threeparameter logistic function. International Journal of Biostatistics. 2009, 5 (1): Art (29)
 19.
Fine JP: Regression modeling of competing crude failure probabilities. Biostatistics. 2001, 2 (1): 8597. 10.1093/biostatistics/2.1.85.
 20.
Jeong JH, Fine JP: Parametric regression on cumulative incidence function. Biostatistics. 2007, 8: 184196.
Acknowledgements
This work was supported by grant number 905604 from Shiraz University of Medical Sciences, Shiraz, Islamic Republic of Iran. The authors would like to thank K. Shashok (Author AID in the Eastern Mediterranean), N. Shokrpour at Emam Reza Polyclinic and the Center for Development of Clinical Research of Nemazee Hospital and Dr J. MillwardSadler for their editing services.
Author information
Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
ZS and NZ were responsible for the design, simulation, analysis and interpretation. SMTA supervised the study and interpreted the results. All authors read and approved the final manuscript.
Zahra Shayan and Najaf Zare contributed equally to this work.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Shayan, Z., Ayatollahi, S. & Zare, N. A parametric method for cumulative incidence modeling with a new fourparameter loglogistic distribution. Theor Biol Med Model 8, 43 (2011). https://doi.org/10.1186/17424682843
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/17424682843
Keywords
 Bayesian Information Criterion
 Hazard Function
 Survival Function
 Compete Risk Model
 Cumulative Incidence Function