A parametric method for cumulative incidence modeling with a new fourparameter loglogistic distribution
 Zahra Shayan†^{1},
 SeyyedMohammadTaghi Ayatollahi^{1}Email author and
 Najaf Zare†^{1}
DOI: 10.1186/17424682843
© Shayan et al; licensee BioMed Central Ltd. 2011
Received: 12 July 2011
Accepted: 11 November 2011
Published: 11 November 2011
Abstract
Background
Competing risks, which are particularly encountered in medical studies, are an important topic of concern, and appropriate analyses must be used for these data. One feature of competing risks is the cumulative incidence function, which is modeled in most studies using non or semiparametric methods. However, parametric models are required in some cases to ensure maximum efficiency, and to fit various shapes of hazard function.
Methods
We have used the stable distributions family of Hougaard to propose a new fourparameter distribution by extending a twoparameter loglogistic distribution, and carried out a simulation study to compare the cumulative incidence estimated with this distribution with the estimates obtained using a nonparametric method. To test our approach in a practical application, the model was applied to a set of real data on fertility history.
Conclusions
The results of simulation studies showed that the estimated cumulative incidence function was more accurate than nonparametric estimates in some settings. Analyses of real data indicated that the proposed distribution showed a much better fit to the data than the other distributions tested. Therefore, the new distribution is recommended for practical applications to parameterize the cumulative incidence function in competing risk settings.
Background
In medical research with timetoevent data, there may be more than one final outcome of interest, and this circumstance can complicate the statistical analysis. In such cases, events other than the desired one(s) are considered as competing risks when their occurrence prevents the event of interest [1, 2]. An important quantity in competing risk settings is the cumulative incidence function (CIF), which makes it possible to calculate the probability of a particular event. In contrast, the causespecific hazard function (CSHF) calculates the instantaneous rate of the event. For example, in fertility studies in women, researchers are interested in calculating the cumulative live birth rate in the presence of competing risks over time. Competing events, such as the probability of stillborn fetuses or abortions, can be calculated.
Most competing risk analyses of CIF are estimated non or semiparametrically [3, 4]. However, the parametric model is another available approach for modeling CIF. The advantage of parametric methods compared to non and semiparametric ones is that if a parametric model is selected correctly, it can predict the probability of the occurrence of events in the long term and provide additional insights about the time to failure and hazard functions [5]. Also, when the survival pattern follows a particular parametric model, the estimates from true model fit are usually more accurate than the nonparametric estimates.
The best known distributions for modeling CIF are the Weibull and Gompertz distributions. However, these are suitable only for hazard functions that increase or decrease monotonically; they are inadequate when the hazard function shape is unimodal. In such cases, simple distributions such as the twoparameter loglogistic or lognormal distributions are likely to be better choices. One approach to the construction of flexible parametric models is to add a shape parameter to provide a wide range of hazard shapes and improve the models in survival data. In 1996, Mudholkar et al. proposed a generalized Weibull family with a range of hazard shapes [6] and Foucher et al. in 2005 applied this distribution in semiMarkov models [7]. In 2006, Sparling et al. presented a threeparameter family of survival distributions that included the Weibull, negative binomial, and loglogistic distributions as special cases [8]. These distributions can fit Ushapes or unimodal shapes for the hazard function, and therefore can be appropriate for survival data.
In light of the issues summarized above, a more efficient parametric distribution with various shapes of hazard patterns would appear to be useful for estimating CIF in competing risk situations. In recent years, various parametric distributions have been developed specifically for analyzing competing risk data that offer more flexibility. For example, in 2006 Jeong introduced a new parametric distribution for modeling CIF [5]. In 2009, Wahed et al. developed Weibull's distribution, resulting in a betaWeibull fourparameter distribution for use in competing risks [9]. Here, we propose a new fourparameter loglogistic distribution by extension of a twoparameter loglogistic distribution that contains different kinds of hazard shapes in survival data and increases the efficiency of the CIF over the nonparametric approaches. Also, this is an improper distribution which enjoys more flexibility for modeling of CIF. Therefore, it would be suitable for competing risk models. We have performed a simulation study to compare CIF estimates obtained with the fourparameter distribution and a nonparametric method. After using simulated data to assess the method, we analyzed a real data set to examine the efficiency of our proposed distribution.
Methods
Introduction of the new distribution
where λ > 0 and τ > 0 are the scale and shape parameters, respectively. If τ ≤ 1, the hazard function decreases monotonically, whereas if τ > 1, the hazard function is unimodal [10].
Survival function of the fourparameter loglogistic distribution
where the parameter space is θ > 0, λ > 0, τ > 0, ∞ < α < ∞. The survival function must be between zero and one, as shown in the Appendix. If α < 0, the survival function is improper. This is an important characteristic of CIF modeling that differs from the twoparameter loglogistic distribution and other distributions.
Hazard function
Cumulative incidence function
where S(u) = P(T > u) and h_{ k }(u) is the hazard function for the k th causespecific event. In the literature, parametric methods are proposed to estimate CIF with the CSHF method [5, 9, 13]. Here we have also used the CSHF method to model CIF.
To estimate the CIF nonparametrically, the overall survival function should be replaced with the KaplanMeier estimate and the causespecific cumulative hazard function with the NelsonAalen estimate [3].
Estimation method
where ψ_{ k } = (λ_{ k } , τ_{ k } , θ_{ k } , α_{ k } ) is the parameter vector for event k, S_{ k } (t, ψ_{ k } ) is the survival function for event k, and f_{ k } (t, ψ_{ k } ) is the density function of event k based on a fourparameter loglogistic distribution.
If event k occurs, δ_{ ki } = 1; otherwise δ_{ ki } = 0 (k = 1,2, i = 1,2,...,n). The covariance matrix, ${I}^{1}\left({\widehat{\psi}}_{1},{\widehat{\psi}}_{2}\right)$, is estimated by the inverse of the Fisher information matrix [14]. According to the invariant property of the maximum likelihood estimate (MLE), the CIF is estimated by substituting $\widehat{\psi}$ in expression (5), which yields ${\widehat{F}}_{k}\left(t\right)={\int}_{0}^{t}\u015c\left(u\right){\u0125}_{k}\left(u\right)\phantom{\rule{0.3em}{0ex}}du$.
Simulation study
where F_{ 1 }(t) is the true value of CIF at time t[17].
To test the efficiency of the parametric distribution proposed here, we used another simulation study. Failure times were generated on the basis of a twoparameter Weibull distribution with k_{ 1 } = 1.4, p_{ 1 }= 0.45 for the event of interest and k_{ 2 } = 1.04, p_{ 2 } = 0.03 for the competing event. We used the same method to fit the new distribution to these data.
The maximum likelihood estimates of the parameter vectors were calculated by PROC NLMIXED in SAS v. 9.1, and the nonparametric estimate of CIF was obtained with the "cuminc" R function from the "cmprsk" library. Because the determination of a suitable initial value to fit the models is an important problem in numerical studies, many initial values were examined to find a suitable convergence.
Results
The results of parametric and nonparametric estimates of CIF based on a fourparameter loglogistic and Sparling simulation for different times.
Time  

0.75  1.00  1.25  1.50  2.00  3.00  5.00  
True value of CIF for event 1  0.11  0.23  0.36  0.49  0.68  0.85  0.92 
Distribution  
Fourparameter loglogistic  
CIF  0.06  0.18  0.32  0.45  0.64  0.82  0.91 
Bias  0.05  0.05  0.04  0.04  0.04  0.03  0.01 
MSE × 10^{2}  0.30  0.30  0.20  0.20  0.20  0.10  0.01 
Sparling  
CIF  0.07  0.17  0.30  0.44  0.65  0.83  0.91 
Bias  0.04  0.06  0.06  0.05  0.03  0.02  0.01 
MSE × 10^{2}  0.17  0.40  0.39  0.32  0.12  0.05  0.02 
Nonparametric  
CIF  0.07  0.18  0.31  0.44  0.64  0.82  0.91 
Bias  0.04  0.05  0.05  0.05  0.04  0.03  0.01 
MSE x10^{2}  0.20  0.27  0.26  0.29  0.22  0.10  0.02 
True value of CIF for event 2  0.020  0.030  0.033  0.037  0.043  0.050  0.052 
Distribution  
Fourparameter loglogistic  
CIF  0.052  0.054  0.055  0.055  0.056  0.057  0.057 
Bias  0.032  0.024  0.022  0.018  0.013  0.007  0.005 
MSE × 10^{2}  0.100  0.100  0.010  0.040  0.020  0.010  0.010 
Sparling  
CIF  0.048  0.053  0.056  0.058  0.060  0.061  0.062 
Bias  0.028  0.023  0.023  0.021  0.017  0.011  0.010 
MSE × 10^{2}  0.100  0.100  0.100  0.100  0.040  0.020  0.020 
Nonparametric  
CIF  0.059  0.059  0.059  0.059  0.059  0.059  0.059 
Bias  0.039  0.029  0.026  0.023  0.016  0.009  0.007 
MSE × 10^{2}  0.150  0.100  0.070  0.050  0.030  0.010  0.010 
The results of parametric and nonparametric estimates of CIF based on a fourparameter loglogistic simulation for different times.
Time  

0.75  1.00  1.25  1.50  2.00  3.00  5.00  
True value of CIF for event 1  0.19  0.27  0.35  0.43  0.56  0.75  0.91 
Distribution  
Fourparameter loglogistic  
CIF  0.13  0.21  0.29  0.37  0.52  0.73  0.89 
Bias  0.06  0.06  0.06  0.06  0.04  0.02  0.02 
MSE × 10^{2}  0.42  0.49  0.47  0.45  0.22  0.06  0.04 
Nonparametric  
CIF  0.14  0.22  0.30  0.38  0.52  0.72  0.89 
Bias  0.05  0.05  0.05  0.05  0.04  0.03  0.02 
MSE × 10^{2}  0.26  0.25  0.26  0.29  0.23  0.14  0.05 
True value of CIF for event 2  0.017  0.023  0.027  0.031  0.037  0.046  0.051 
Distribution  
Fourparameter loglogistic  
CIF  0.021  0.027  0.032  0.036  0.043  0.052  0.058 
Bias  0.004  0.004  0.005  0.005  0.006  0.006  0.007 
MSE × 10^{2}  0.003  0.003  0.010  0.010  0.010  0.010  0.010 
Nonparametric  
CIF  0.014  0.014  0.036  0.036  0.049  0.055  0.058 
Bias  0.003  0.009  0.009  0.005  0.012  0.009  0.007 
MSE × 10^{2}  0.002  0.010  0.010  0.010  0.020  0.010  0.010 
In summary, these two simulations indicate that the fourparameter modeling of CIF was as efficient as the nonparametric method and the Sparling distribution and sometimes led to better estimates of CIF. Moreover, the fourparameter loglogistic model performed well under a Weibull distribution.
Example: women's fertility history
We tested the proposed distribution on a set of real data. In a crosssectional study, the fertility history of 858 women aged 1549 years in rural areas of the Shiraz district (southwestern Iran) was reviewed (unpublished data). The women were selected by multistage random sampling from a list of villages in 2008. Only the first pregnancy of each woman was included in this study. A selfadministered questionnaire regarding fertility history was used. After women with an undesired first pregnancy were excluded, the final sample consisted of 652 women. Live birth as a result of the first delivery was our desired event, and a stillborn fetus or abortion was the competing event. The event time was defined as the interval between marriage and a live birth, a competing event or censoring. Also, women who had not given birth on the date of interview (7% in this data set) were censored.
The Akaike information criterion (AIC), Bayesian information criterion (BIC) and the estimates of the cumulative incidence function under competing risks based on different distributions with the nonparametric method.
Time (years)  

Distribution  0.75  1  1.5  2  3  5  10  AIC  BIC 
Twoparameter loglogistic  1894.0  1912.0  
Live birth  0.1145  0.2317  0.4946  0.6857  0.8556  0.9307  0.9497  
Stillborn fetus or abortion  0.0189  0.0246  0.0333  0.0375  0.0457  0.0514  0.0477  
Fourparameter loglogistic  1685.3  1721.1  
Live birth  0.0257  0.2373  0.5552  0.6949  0.8133  0.8876  0.9274  
Stillborn fetus or abortion  0.0200  0.0278  0.0370  0.0419  0.0467  0.0503  0.0525  
Two parameter Weibull  2195.0  2212.0  
Live birth  0.1942  0.2749  0.4292  0.5626  0.7532  0.9098  0.9472  
Stillborn fetus or abortion  0.0173  0.0225  0.0310  0.0372  0.0457  0.0507  0.0526  
Two parameter Gompertz  2299.9  2317.9  
Live birth  0.2862  0.3617  0.4890  0.5897  0.7317  0.8718  0.9425  
Stillborn fetus or abortion  0.0185  0.0231  0.0307  0.0365  0.0441  0.0507  0.0533  
threeparameter Sparling  1817.2  1856.0  
Live birth  0.0856  0.2198  0.5416  0.7290  0.8539  0.9047  0.9242  
Stillborn fetus or abortion  0.0188  0.253  0.0345  0.0394  0.0439  0.0473  0.0499  
Nonparametric  
Live birth  0.0062  0.2601  0.5542  0.6723  0.8194  0.8934  0.9287  
Stillborn fetus or abortion  0.0170  0.0279  0.0405  0.0437  0.0455  0.0490  0.0535 
χ^{2} = 69.2, df = 1, p < 0.001 for twoparameter loglogistic versus Sparling and χ^{2} = 217.1, df = 2, p < 0.001 for twoparameter loglogistic versus fourparameter loglogistic. Likelihoodratio test, AIC and BIC show the fourparameter loglogistic distribution fits the data better than twoparameter loglogistic and Sparling distributions. These results confirm the findings in Figure 2, and again indicate that the proposed distribution shows a closer fit to the observed data than the other distributions to which it is compared.
Discussion
Although nonparametric methods such as the KaplanMeier approach are widely used in survival analysis and may show a very close fit to the data, they do not provide additional information about the nature of the data. Therefore, in this study our ultimate aim was to develop a new parametric distribution by extension of the twoparameter loglogistic distribution. The addition of third and fourth parameters allows the model to capture Ushaped hazards.
Our simulation study showed that the parametric estimate of CIF with the new distribution was slightly less biased and had a smaller MSE than the estimate obtained using nonparametric methods. Simulations with the twoparameter loglogistic and Weibull distributions showed that our proposed fourparameter distribution had appropriate efficiency. Also, analyses of real data indicated that the proposed distribution showed a much better fit to the data than the other distributions tested. Our results are consistent with other studies in finding that an appropriate parametric model yields more precise estimates of cumulative incidence than nonparametric methods, and is thus a potentially suitable way to describe quantities of competing risks [9, 18]. In contrast, if a parametric model is misspecified, the quantities will be estimated incorrectly, which will clearly bias the inferences [12]. However, our proposed distribution captures various hazard shapes well, which extends its applicability to a variety of survival data.
In addition to this advantage, the proposed distribution is improper for α < 0. This property makes our proposed distribution superior to other distributions such as the Weibull, twoparameter loglogistic, threeparameter Sparling and generalized Weibull models [6, 8]. This characteristic of our distribution also makes it possible to evaluate the direct effect of covariates on CIF, which is not possible in the CSHF model [19, 20]. The potential applications of direct modeling of CIF and parametric regression models with the fourparameter loglogistic distribution will be examined in forthcoming papers.
Conclusions
Despite the complexity of this distribution for modeling CIF (which is one of its limitations), the results of our simulation study and realdata application show that the new distribution achieves a much better fit to the data than other distributions that use fewer parameters. Whereas the twoparameter loglogistic is a proper distribution, the fourparameter loglogistic is an improper distribution in the subset of parameter space. Therefore, this distribution is suitable for parameterizing CIF directly in competing risk models. Moreover, it is can be added to a family of distributions and also potentially useful for parameterizing survival data in general.
Appendix
The parameter space is θ > 0, τ > 0, λ > 0, ∞ < α <∞. The survival function must be between zero and one for all values in the parameter space. If (θ^{2}[(log(1+λt^{ τ } )/θ+1] ^{ α } /α1) > 0, then the condition holds. First, if α > 0, log(1+λt^{ τ } )/θ + 1 must be positive, which implies that log(1+λt^{ τ } )/θ > 0 since λ > 0, τ > 0 and θ > 0, log(1+λt^{ τ } )/θ is always positive. Thus, the condition holds for α > 0. The same result follows for the case α < 0.
Authors' information
Corresponding author: SMT Ayatollahi, Ph.D., FSS, C.Stat. Professor of Biostatistics, The Medical School, Shiraz University of Medical Sciences, Shiraz, Islamic Republic of Iran. P.O.Box 713451874.
Notes
List of abbreviations
 CIF:

cumulative incidence function
 CSHF:

causespecific hazard function MSE: mean square error
 MLE:

maximum likelihood estimate
 AIC:

Akaike information criterion
 BIC:

Bayesian information criterion.
Declarations
Acknowledgements
This work was supported by grant number 905604 from Shiraz University of Medical Sciences, Shiraz, Islamic Republic of Iran. The authors would like to thank K. Shashok (Author AID in the Eastern Mediterranean), N. Shokrpour at Emam Reza Polyclinic and the Center for Development of Clinical Research of Nemazee Hospital and Dr J. MillwardSadler for their editing services.
Authors’ Affiliations
References
 Pintilie M: Competing Risks, A Practical Perspective. 2006, Chichester: John Wiley & SonsView ArticleGoogle Scholar
 Putter H, Fiocco M, Geskus RB: Tutorial in biostatistics, Competing risks and multistate models. Statistics in Medicine. 2007, 26: 23892430. 10.1002/sim.2712.View ArticlePubMedGoogle Scholar
 Gray RJ: A class of Ksample tests for comparing the cumulative incidence of a competing risk. Annals of Statistics. 1988, 16: 114154. 10.1214/aos/1176350951.View ArticleGoogle Scholar
 Fine JP, Gray RJ: A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association. 1999, 94: 496509. 10.2307/2670170.View ArticleGoogle Scholar
 Jeong JH: A new parametric family for modelling cumulative incidence functions: application to breast cancer data. Journal of the Royal Statistical Society, Series A. 2006, 169 (2): 289303. 10.1111/j.1467985X.2006.00409.x.View ArticleGoogle Scholar
 Mudholkar GS, Srivastava DK, Kollia GD: A Generalization of the Weibull Distribution with Application to the Analysis of Survival Data. Journal of the American Statistical Association. 1996, 91 (436): 15751583. 10.2307/2291583.View ArticleGoogle Scholar
 Foucher Y, Mathieu E, Philippe SaintPierre P, Durand JF, Daures JP: A SemiMarkov Model Based on Generalized Weibull Distribution with an Illustration for HIV Disease. Biometrical. 2005, 47 (6): 19.Google Scholar
 Sparling YH, Younes N, Lachin JM: Parametric survival models for intervalcensored data with timedependent covariates. Biostatistics. 2006, 7 (4): 599614. 10.1093/biostatistics/kxj028.View ArticlePubMedGoogle Scholar
 Wahed AS, Loung M, Jeong JH: A new generalization of Weibull distribution with application to a breast cancer data set. Statistics in Medicine. 2009, 28: 20772094. 10.1002/sim.3598.PubMed CentralView ArticlePubMedGoogle Scholar
 Klein JP, Moeschberger ML: Survival Analysis Techniques for Censored and Truncated Data. 2003, New York: SpringerGoogle Scholar
 Hougaard P: Survival models for heterogeneous populations derived from stable distributions. Biometrika. 1986, 73: 387396. 10.1093/biomet/73.2.387.View ArticleGoogle Scholar
 Haile SR: Inference on competing risks in breast cancer data. PhD Thesis, University of Pittsburgh, Biostatistics Department. 2008Google Scholar
 Benichou J, Gail MH: Estimates of absolute causespecific risk in cohort studies. Biometrics. 1990, 46: 813826. 10.2307/2532098.View ArticlePubMedGoogle Scholar
 Jeong JH, Fine JP: Direct parametric inference for the cumulative incidence function. Applied Statistics. 2006, 55: 187200. 10.1111/j.14679876.2006.00532.x.Google Scholar
 Beyersmann J, Latouche A, Buchholz A, Schumacher M: Simulating competing risks data in survival analysis. Statistics in Medicine. 2009, 28: 956971. 10.1002/sim.3516.View ArticlePubMedGoogle Scholar
 Bender R, Augustin T, Blettner M: Generating survival times to simulate Cox proportional hazards models. Statistics in Medicine. 2005, 24: 17131723. 10.1002/sim.2059.View ArticlePubMedGoogle Scholar
 Burton A, Altman DG, Royston P, Holder RL: The design of simulation studies in medical statistics. Statistics in Medicine. 2006, 25: 42794292. 10.1002/sim.2673.View ArticlePubMedGoogle Scholar
 Cheng Y: Modeling cumulative incidences of dementia and dementiafree death using a novel threeparameter logistic function. International Journal of Biostatistics. 2009, 5 (1): Art (29)Google Scholar
 Fine JP: Regression modeling of competing crude failure probabilities. Biostatistics. 2001, 2 (1): 8597. 10.1093/biostatistics/2.1.85.View ArticlePubMedGoogle Scholar
 Jeong JH, Fine JP: Parametric regression on cumulative incidence function. Biostatistics. 2007, 8: 184196.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.