Skip to main content

Time-homogeneous Markov process for HIV/AIDS progression under a combination treatment therapy: cohort study, South Africa



As HIV enters the human body, its main target is the CD4 cell which it turns into a factory that produces millions of other HIV particles. These HIV particles target new CD4 cells resulting in the progression of HIV infection to AIDS. A continuous depletion of CD4 cells results in opportunistic infections, for example tuberculosis (TB). The purpose of this study is to model and describe the progression of HIV/AIDS disease in an individual on antiretroviral therapy (ART) follow up using a continuous time homogeneous Markov process. A cohort of 319 HIV infected patients on ART follow up at a Wellness Clinic in Bela Bela, South Africa is used in this study. Though Markov models based on CD4 cell counts is a common approach in HIV/AIDS modelling, this paper is unique clinically in that tuberculosis (TB) co-infection is included as a covariate.


The method partitions the HIV infection period into five CD4-cell count intervals followed by the end points; death, and withdrawal from study. The effectiveness of treatment is analysed by comparing the forward transitions with the backward transitions. The effects of reaction to treatment, TB co-infection, gender and age on the transition rates are also examined. The developed models give very good fit to the data.


The results show that the strongest predictor of transition from a state of CD4 cell count greater than 750 to a state of CD4 between 500 and 750 is a negative reaction to drug therapy. Development of TB during the course of treatment is the greatest predictor of transitions to states of lower CD4 cell count. Transitions from good states to bad states are higher on male patients than their female counterparts. Patients in the cohort spend a greater proportion of their total follow-up time in higher CD4 states.


From some of these findings we conclude that there is need to monitor adverse reaction to drugs more frequently, screen HIV/AIDS patients for any signs and symptoms of TB and check for factors that may explain gender differences further.


The life cycle of HIV starts as it enters the human body, its major target being a white blood cell called T-helper cells or CD4 cells [1]. Once these cells are infected, HIV takes over and turns them into factories that produce thousands of copies of the virus. The HIV makes use of the enzyme Reverse Transcriptase to change copies of its Ribonucleic Acid (RNA) into Deoxyribose nucleic Acid (DNA). The viral DNA then enters the nucleus of the host cell and combines with cell DNA and starts making copies of viral RNA. The enzyme protease helps in assembling the viral particles into thousands of new viruses, which will bud and destroy the host cell. These new viruses will then be ready to attack other CD4 + T cells. Hence, the importance of CD4+ T cell count in understanding the progression of HIV. Depreciation of the CD4+ T cells in the human body leads to the deterioration of the human immune system, which is why the virus is called the Human Immunodeficiency Virus (HIV).

As the immune system is compromised, the individual is now prone to opportunistic infections like tuberculosis (TB). TB may occur at any stage of HIV disease and is frequently the first recognised presentation of underlying HIV [2]. HIV and TB coinfection is characterised by challenges that include poor adherence and overlapping toxicities resulting in an impaired CD4 T-cell recovery with antiretroviral therapy (ART) due to the effect of drug-drug interaction [3].

The need to address the challenges associated with HIV/AIDS progression in the presence of TB coinfection has prompted this study and also to analyse HIV disease history based CD4 multi-states and death/loss to follow-up in a single model. However, most studies use Kaplan-Meier analysis and Cox proportional hazards regression models [3,4,5]. Kaplan-Meier survival methods and Cox proportional hazards regression are commonly employed tools to model mortality and time to viral suppression and/or subsequent rebound and occasionally used to model time to CD4 recovery. However, survival models are not appropriate for all studies, particularly in the presence of competing risks and when multiple or recurrent outcomes are of interest. In particular, when modelling HIV/AIDS progression, Markov models are relatively straight forward to analyse both CD4 stage and death or loss to follow-up within a single model which survival models fail to do. Markov models can accommodate censored data, competing risks (informative censoring), multiple outcomes, recurrent outcomes, frailty and non-constant survival probabilities [6]. Examination of the conditions of the stochastic processes at various points in time, categorisation of the conditions, and examination of the external influences on the stochastic processes can be done using Markov models [7]. Markov models are favourable to the modelling of diseases in particular cases where the disease is grouped into a set of exhaustive and mutually exclusive health states, thereby forming a multi-state model [8]. History is naturally generated as the multi-states evolve over time; it contains information on previous visits, time of entry into various states, and the length of stay in states.

Continuous-time homogeneous Markov models have been used since early in the epidemic to model disease progression of HIV/AIDS patients, and there has been some recent renewed interest in the use of these models. In 1989 Longini et al. used a 5-state Markov model based on the clinical indicators of the HIV disease progression [9]. In 1998 Alioum et al. estimated the effects of gender, age, mode of transmission and ART on HIV progression using a 3-state Markov model [10]. In 2011 Reddy [11] carried out a research almost similar to Alioum et al. in South Africa. However, Reddy used a 5-state Markov model with 4 CD4 based transient states followed by the absorbing state, ARV initiation. Reddy’s model is characterised by high rates of immune deterioration since the study was carried out on ARV naïve patients. In 2009, Binquet et al. used a multi-state Markov model to analyse the impact of gender, intravenous drug use, weight loss, low haemoglobin, CD8 cell count and HIV viral load on HIV evolution in the era of highly active antiretroviral therapy (HAART) [12]. Recently, in 2013 Grover et al. assessed the impact of ART using a 5-staged multistate Markov model and went further to examine the effects of explanatory variables; age, sex and mode of transmission on the transition rates [13].

In this study, we use 7-staged continuous-time Markov model to assess the disease progression of HIV/AIDS patients receiving ART from a clinic in Bela-Bela, South Africa. The first 5 stages are based on CD4 cell counts and the end points are either death or withdrawal from study. In addition to the gender and age differences in HIV/AIDS progression, we further assess the effects of having TB as the initial marker of HIV/AIDS, developing TB during the course of treatment, developing some adverse effects to treatment (Reaction), CD4 baseline and viral load baseline. Though Markov models based on CD4 cell counts is a common approach in HIV/AIDS modelling, this paper is unique clinically in that tuberculosis (TB) co-infection is included as a covariate.

In medical research, the state of the patient at observation time is the only thing known with certainty. The researcher may know the time interval in which a transition has occurred, but not the exact time. Thus, homogeneous Markov models which are interval censored can handle such data [14]. The transition intensities, probabilities and the distribution functions associated with the times are the basic building blocks of the Markov processes [15]. For a continuous-time Markov model, transitions can occur at any (real-valued) time instant. For a time-homogeneous Markov jump process, the holding time in state i are modelled using exponential distributions. The exponential distributions may be adequate for many real-life situations, for example, time until death, and waiting time before moving to another state. However, the exponential distributions are memoryless continuous distributions, hence a limitation in the application of Markov processes. The ‘memoryless’ property could be seen as a problematic assumption in this setting. It is likely the case that patients starting on ART who respond well to treatment will continue to respond well to treatment - contradicting the Markov assumption and memoryless property.

Transition probabilities for continuous-time homogeneous models only depend on the difference between the two observation times. That is, for all t ≥ 0 the probability of moving from state i to state j is given by:

$$ {p}_{ij}\left(s,t\right)=P\left[{X}_t=j|{F}_t\right]=P\left({X}_t=j|{X}_s=i\right)=P\left({X}_{t-s}=j|{X}_0=i\right),\forall t\ge 0,t>s. $$

This is the Markov property, where F t is the natural Filtration of the stochastic process. P[X t  = j| F t ], therefore, represents the probability that the stochastic process X t is in state j at time t given the history of the process up to time t. The Markov property implies that all the history of the process is contained in the state currently occupied, X s  = i. The transition probabilities of a continuous time homogeneous Markov process X t , t ≥ 0 is given by:

$$ {p}_{ij}(t)=P\left({X}_t=j|{X}_0=i\right) $$

The equations obey the Chapman-Kolmogorov equations:

$$ {p}_{ij}\left(t+s\right)={\sum}_{k\in X}{p}_{ik}(s){p}_{kj}(t)\kern1.25em \forall s,t>0 $$

In this paper we describe, using the theory of continuous time Markov processes, and using real data on an evolving disease such as AIDS. Also, the effects of covariates, including TB, on baseline transition rates is considered. Models with and without covariates are fitted and compared using the likelihood ratio test.

The next section explores the methods of Markov modelling and an illustrative case study on HIV progression is given. In this section, data used in the analysis is described and we explain formulation of the model based on the data. This is followed by a section on the results and discussions. The final section concludes on the findings.


A continuous-time homogeneous Markov model

Formulation of the continuous-time homogeneous model is done by considering transition probabilities over narrow interval of time ∆t. In this study \( \Delta t=\frac{1}{2} \) year making it appropriate to assume that transition rates over these intervals are constant. These transition rates, also known as transition intensities or forces of transition, are the fundamental concept in continuous time Markov jump processes. They can take values greater than 1, unlike probabilities. In order to differentiate the transition probabilities and avoid technical problems with mathematics, the assumption is that the functions p ij (t) are continuously differentiable and are subject to the initial condition:

$$ {p}_{ij}(0)={\delta}_{ij}=\left\{\begin{array}{c}0\ if\ i\ne j\\ {}1\ if\ i=j\end{array}\right. $$

δ ij is a Kronecker delta, p ii (0) = 1 means that at t = 0 the system maintains its original state and p ij (0) = 0 means that there is no change of state when no time elapses. The force of transition from state i to j is defined as:

$$ {\left.{\alpha}_{ij}=\frac{d}{dt}{p}_{ij}(t)\right|}_{t=0}=\underset{\Delta t\to 0}{\lim}\frac{p_{ij}\left(\Delta t\right)-{\delta}_{ij}}{\Delta t} $$

α ij , for i = 1, …, 5 and j = 1, …, 7, does not vary over time and satisfies the following conditions; \( {\sum}_{j\in X}{\alpha}_{ij}=0 \) and \( {\alpha}_{ii}=-{\sum}_{i\ne j}{\alpha}_{ij} \).

Once the transition intensities are known, the transition probabilities can be obtained by solving a system of differential equations known as the Kolmogorov’s forward equation subject to the initial conditions stated in eq. (2). The Kolmogorov’s forward equation is as follows:

$$ \frac{d}{dt}{p}_{ij}(t)={\sum \limits}_{\forall k}{p}_{ik}(t){\alpha}_{kj}\kern1.25em \forall i,j $$

where k is a state that the system can pass through as it makes a transition from state i to state j. The time homogeneous models are fitted for this data to assess effectiveness of the treatment by comparing the forward transition and the reverse transitions. This then lead to building of models that allow transitions in both directions.

Data description

The model is initially applied on 319 HIV patients on anti-retroviral therapy (ART) from a Wellness clinic in Bela Bela, South Africa, from year 2005 to year 2009. Two hundred and twenty-seven of these patients were females and 92 were males at treatment commencement (t = 0). After 3 years of treatment uptake, 173 females and 71 males were remaining in the study. Thirty-eight females had died and 16 withdrew and their status was not known after 3 years of treatment up take. Nineteen of the males died during the first three years and two had withdrawn and it was not known whether they were alive or dead. A 2-year old (subject 81) together with subject 82 were detected by the residuals plot as an outlier and it was removed from the analysis meaning that the remaining 317 patients were used for analysis. About 50 and 65% of the female and male deaths respectively occurred during the first 6 months of treatment uptake. The interquartile range of patient ages is (33; 47.5) years with a mean and median age of 39.53 and 40 years respectively. The ages were negatively skewed (skew = −0.24) which means that there were more younger patients than older patients in this cohort.

At time t = 0 there were 242 individuals with CD4 baseline (CD4BL) cell count below 200, 59 individuals with CD4 cell count between 200 and 350, 11 individuals with CD4 cell count between 350 and 500, 6 individuals with CD4 cell count between 500 and 750 and 1 individual with CD4 cell count above 750. At t = 0 the CD4 cell count had mean of 156 cells/mm3, a median of 116 cell/mm3 and the maximum CD4 cell count was 1202 cells/mm3. The mean viral load baseline (VLBL) for these patients was 105,573.35 copies/mm3 and it ranged from 56 to 818,600 copies/mm3. The median viral load was 58,523.00 copies/mm3. From these individuals 155 had a WHO stage baseline (WSBL) of 4 which is related to severe HIV symptoms. WSBL is the categorisation of HIV/AIDS at baseline basing on the clinical markers as defined by World Health Organisation (WHO).

Although some individuals developed TB (DTB) during the course of treatment, 109 patients had TB as an initial marker of HIV. From the individuals who had TB before (TBB4) commencement of antiretroviral therapy (ART), 66 had a CD4 baseline below 200cells/mm3, 20 had a CD4 baseline between 200 and 350cells/mm3, 2 had CD4 baseline between 350 and 500cells/mm3, 2 between 500 and 750cells/mm3 and 19 had unknown CD4 baseline. These patients completed their TB treatment before commencement of ART. Fifty-two patients developed TB during the treatment period and 12 of these patients had TB before commencement of treatment. During the first 6 months of treatment uptake, 35 patients died and from these deaths, only five were attributed to having TB before commencement of ART.

A combination therapy was administered to all HIV-infected individual in the cohort. The therapeutic intervention inhibits the actions of reverse transcriptase enzyme and/or protease of new infectious free HIV by the HIV-infected cell. The drug regimens at t = 0 were mainly a combination of d4T-3TC-EFV (administered to 207 patients) and d4T-3TC-NVP (administered to 83 patients). The second line regimens were mainly a combination of AZT-3TC-EFV/NVP and were given to patients who developed some adverse reaction. These second line regimens were frequently used from 2 to 4 years post-treatment commencement. The therapeutic intervention lowers the number of infectious free virus particles in the circulation, and in some cases to beyond detection. This results in a reduction on the density of infected cells, causing a rise on the CD4 cell count of infected individuals. So generally the CD4 cell count of an individual receiving therapeutic intervention is expected to rise to well above 500 cell/mm3, assuming a proper adherence to treatment. Hence the use of increase in CD4 cell count as the marker of efficacy of treatment.

During the course of treatment, some individuals developed some adverse reaction (React) to treatment. For these individuals the adverse reactions were treated and drugs administered to them were changed. Change of treatment was also based on the viral load monitoring.

For the purpose of analysis, the variables are coded into the model as follows:

  WSBL Gender Age CD4BL VLBL DTB TBB4 React
1 4 Male ≤ 40years ≤ 350 ≥10 000 Yes Yes Yes
0 other Female > 40years > 350 < 10 000 No No No

Model formulation

At any time t + ∆t, the state of an HIV-infected individual is defined basing on the CD4 cell count level or whether the individual is dead or has withdrawn as follows:

State 1-CD4 ≥ 750 State 2–500 ≤ CD4 < 750
State 3–350 ≤ CD4 < 500 State 4–200 ≤ CD4 < 350
State 5-CD4 < 200 State 6-Death
State 7-Withdrawal.  

Basing on these seven states, progression of HIV positive individuals on treatment is defined by the state diagram on Fig. 1 below. The arrows in the diagram show possible transitions between the seven states defined above.

Fig. 1
figure 1

The State Diagram for HIV Progression of Individuals on ART

The information in Fig. 1 shows that state 6 and 7 are absorbing states hence no transitions from these states. As HIV progresses in an individual’s body, there is a possibility of an individual being in the same state in consecutive visit times.

Basing on the classification above, Table 1 summarises transition counts that took place for the whole period of study 2005 to 2009.

Table 1 Transition Counts from 2005 to 2009

Table 1 shows that, transition counts from state i to i ± j are higher for all the values in which j = 1 than for j > 1 where i, j { 1, …, 5}. As a result a bidirectional model is proposed which defines transitions from state i to i ± 1 or from i to j = 6; 7.

The model is formulated basing on the assumptions that between times (t, t + ∆ t), where ∆t is a very small value, there is a transition from any one of the states i = 1, 2, …, 5 (transient states) to state j = 1, 2, …, 7 defined as follows:

  • CD4 cell count of an individual is expected to rise due to efficacy of treatment at a rate of α ij ,  where j = i − 1;

  • Some individuals fail to adhere to treatment therapy. These individuals can move to a state of lower CD4 cell count at a rate of α ij ,  where j = i + 1;

  • From any state i = 1, 2, …, 5 an infected individual can die (state 6) at a rate of αi6;

  • An individual in state i = 1, 2, …, 5 can decide to withdraw (state 7) at a rate of αi7;

  • An individual can remain in the same state at a rate of α ii  =  − λ i  =  − (αi, i − 1 + αi, i + 1 + αi6 + αi7). This is based on the fact that the sum of transition rates from any state is equal to zero.

These assumptions can be represented by the following transition rate matrix Q(t):

$$ Q(t)=\left(\begin{array}{ccccccc}-\left({\alpha}_{12}+{\alpha}_{16}+{\alpha}_{17}\right)& {\alpha}_{12}& 0& 0& 0& {\alpha}_{16}& {\alpha}_{17}\\ {}{\alpha}_{21}& -\left({\alpha}_{21}+{\alpha}_{23}+{\alpha}_{26}+{\alpha}_{27}\right)& {\alpha}_{23}& 0& 0& {\alpha}_{26}& {\alpha}_{27}\\ {}0& {\alpha}_{32}& -\left({\alpha}_{32}+{\alpha}_{34}+{\alpha}_{36}+{\alpha}_{37}\right)& {\alpha}_{34}& 0& {\alpha}_{36}& {\alpha}_{37}\\ {}0& 0& {\alpha}_{43}& -\left({\alpha}_{43}+{\alpha}_{45}+{\alpha}_{56}+{\alpha}_{57}\right)& {\alpha}_{45}& {\alpha}_{56}& {\alpha}_{57}\\ {}0& 0& 0& {\alpha}_{54}& -\left({\alpha}_{54}+{\alpha}_{56}+{\alpha}_{57}\right)& {\alpha}_{56}& {\alpha}_{57}\\ {}0& 0& 0& 0& 0& 0& 0\\ {}0& 0& 0& 0& 0& 0& 0\end{array}\right) $$

Once the transition rate matrix has been obtained, the matrix of transition probabilities can be obtained using Kolmogorov’s forward differential equations defined in (3). This yields the following differential equations for the Markov jump processes:

$$ \frac{dp_{i1}(t)}{dt}=-\left({\alpha}_{12}+{\alpha}_{16}+{\alpha}_{17}\right){p}_{i1}(t)+{\alpha}_{21}{p}_{i2}(t)\kern1em \mathrm{for}\kern0.5em i=1,2; $$
$$ \frac{dp_{i2}(t)}{dt}={\alpha}_{12}{p}_{i1}(t)-\left({\alpha}_{21}+{\alpha}_{23}+{\alpha}_{26}+{\alpha}_{27}\right){p}_{i2}(t)+{\alpha}_{32}{p}_{i3}(t)\kern0.5em \mathrm{for}\kern0.5em i=1,2,3; $$
$$ \frac{dp_{i3}(t)}{dt}={\alpha}_{23}{p}_{i2}(t)-\left({\alpha}_{32}+{\alpha}_{34}+{\alpha}_{36}+{\alpha}_{37}\right){p}_{i3}(t)+{\alpha}_{43}{p}_{i4}(t)\kern0.5em \mathrm{for}\kern0.5em i=2,3,4; $$
$$ \frac{dp_{i4}(t)}{dt}={\alpha}_{34}{p}_{i3}(t)-\left({\alpha}_{43}+{\alpha}_{45}+{\alpha}_{46}+{\alpha}_{47}\right){p}_{i4}(t)+{\alpha}_{54}{p}_{i5}(t)\kern0.5em for\kern0.5em i=3,4,5; $$
$$ \frac{dp_{i5}(t)}{dt}={\alpha}_{45}{p}_{i4}(t)-\left({\alpha}_{54}+{\alpha}_{56}+{\alpha}_{57}\right){p}_{i5}(t)\kern0.5em \mathrm{for}\kern0.5em i=4,5; $$
$$ \frac{dp_{i6}(t)}{dt}={\sum}_{k=1}^5{p}_{ik}(t){\alpha}_{k6}\kern0.5em \mathrm{for}\kern0.5em i=1,\dots, 5; $$
$$ \frac{dp_{i7}(t)}{dt}={\sum}_{k=1}^5{p}_{ik}(t){\alpha}_{k7}\kern0.5em \mathrm{for}\kern0.5em i=1,\dots, 5. $$

Equations (4) to (10) represent all the possible transition probabilities from state i, for i = 1, 2, ...5, to state j = 1, …, 7. p ij (t) represents the probability that a patient in state i makes a transition to state j and its coefficients represent the transition rates. For example, in equation (4), −(α12 + α16 + α17) = α11. These states denoted by i are defined based on the CD4 cell count grouping. So there is a possibility of a backward or forward movement transition between transient states due to failure or efficacy of treatment respectively. There is no possible transition from state i = 6 and state i = 7 because these states are absorbing states where i = 6 represents death of an infected individual and state i = 7 represents withdrawal from treatment by an infected individual. All the analysis is done using the package ‘msm’ for multistate modelling in R software. The package was developed by Jackson in 2011 [16].

Results and discussions

Estimation of the transition rate matrix

Estimation of the transition intensities is done using the method of maximum likelihood to estimate the transition intensities. The likelihood, L, is given by:

$$ L={e}^{\alpha_{11.}{t}_1+{\alpha}_{22.}{t}_2+\dots +{\alpha}_{77.}{t}_7}\times {\alpha}_{11}^{n_{11}}{\alpha}_{12}^{n_{12}}\dots {\alpha}_{77}^{n_{77}}, $$

where ti : i = 1, 2, …, 7, is the total number of observed waiting/holding time in state i, \( {\alpha}_{ii}=-{\sum}_{i\ne j}{\alpha}_{ij} \) and n ij is the number of transitions observed from state i to state j. The estimates are obtained by taking the logarithm of the likelihood and differentiating this with respect to each of the transition intensities α ij 's. This leads to the maximum likelihood estimates of the transition intensities as \( {\alpha}_{ij}=\frac{n_{ij}}{t_i} \), where n ij is the number of transitions from state i to state j, and t i the total observed waiting/holding time in state i.

The plot of residuals for each of the individuals in the study was drawn to identify the outliers (subjects with higher influence) in the data. Once the outliers are identified they can simply be deleted and the model is re-fit. According to Titman in 2007 [17] residuals for multi-state models can be determined as follows;

If n subjects and a parameter vector θϵΘ, with maximum likelihood estimator based upon the whole data \( \widehat{\theta} \). Let \( {\widehat{\theta}}_{(j)} \) represent the estimate with subject j deleted. Thus the quantity \( {\widehat{\theta}}_{(j)}-\widehat{\theta} \) for j = 1, …, n is of interest. The influence of each point on each parameter can be compared separately and to get a measure of the overall influence of a particular subject we take the scalar quantity;

$$ {\left({\widehat{\theta}}_{(j)}-\widehat{\theta}\right)}^{\hbox{'}}I\left(\widehat{\theta}\right)\left({\widehat{\theta}}_{(j)}-\widehat{\theta}\right) $$

where I(θ) is the observed Fisher information matrix at the maximum likelihood estimates for the full data. Consider the contribution to the score function of each subject evaluated to the maximum likelihood estimate for the full model. Highly influential subjects will have scores of high magnitude. For a single subject, the score residual is given by an analogous scalar measure:

$$ {U}_j{\left(\widehat{\theta}\right)}^{\hbox{'}}I{\left(\widehat{\theta}\right)}^{-1}{U}_j\left(\widehat{\theta}\right) $$

where \( {U}_j\left(\widehat{\theta}\right) \) is the vector of first derivatives of the log-likelihood for that subject at maximum likelihood estimates θ. That is, \( U\left(\theta \right)=\frac{\partial l}{\partial \theta}\left(\theta \right), \) is determined using the derivative of the transition probability matrix P(t) with respect to θ. These derivatives were given by Kalbfleisch and Lawless [18]. The residuals plot displays the residuals for each subject in the order labelled by subject identifiers. Subjects with a higher influence on the maximum likelihood estimates will have higher score residuals [16]. The plot helps to identify any outliers in the data. Figure 2 below shows the plot of residuals.

Fig. 2
figure 2

The score residuals plot for detecting outliers

Results from Fig. 2 show that patients with ID numbers 81 and 82 are outliers as indicated by their positions from the rest of the patients in the cohort. The corresponding residuals for these values are 1.58315799 and 1.58315999, respectively compared to the rest of the subjects whose residuals below 1. Patient number 81 is a 2 year old enrolled whilst in state 1 and maintained the state throughout the study period. Patient number 82 was enrolled whilst in state 5 and, during the third visit, was already in state 1 and maintained it throughout the study period. These patients are excluded from the analysis leaving us with 317 subjects. Table 2 shows transition intensities α ij for i = 1, 2, …, 5 and j = 1, 2, …, 6, 7 for the two models, one with outliers and the other one without outliers. The corresponding confidence interval is also given for each transition intensity. The state space is X t  = {1, 2, 3, 4, 5, 6, 7}.

Table 2 Transition intensities and their corresponding confidence intervals for the model with and the model without outliers

The results from Table 2 show narrow confidence intervals which is an indication that the suggested continuous time Markov model gives a precise estimate of the data. Results from Table 2 show that transitions to better CD4 cell count states are higher than transitions to worse CD4 cell count states which is an indication of efficacy of ART. The model with outliers has got a higher log-likelihood than the model without outliers as expected since the model with outliers has got a greater dimension. A further analysis on the transition intensities was also done for each of the CD4 baseline (CD4BL) and WHO stage baseline (WHOSBL) levels coded as follows:

$$ \mathrm{CD}4\mathrm{BL}=\left\{\begin{array}{llllll}\mathbf{1};& & & CD4& >& 750\\ {}\mathbf{2};& 500& <& CD4& \le & 750\\ {}\mathbf{3};& 350& <& CD4& \le & 500\\ {}\mathbf{4};& 200& <& CD4& \le & 350\\ {}5;& & & CD4& \le & 200\end{array}\kern1em \mathrm{and},\kern0.6em \mathrm{WHOSBL}=\left\{\begin{array}{ll}1;& Asymptomatic\\ {}2;& Mild symptoms\\ {}3;& Advanced symptoms\\ {}4;& Severe symptoms\end{array}\right.\right. $$

The results are shown in Appendix 2 and 3 for CD4BL and WHOSBL respectively. The results from Appendix 2 show that transition rates to CD4 recovery (2 to 1, 3 to 2, 4 to 3 and 5 to 4) were high for patients who initiated therapy when their CD4 baseline level was well above 350 per mm3. These rates of CD4 recovery decrease with as the CD4 cell count at treatment initiation decrease with a baseline CD4 cell count below 200 per mm3 recording the lowest rates of CD4 recovery. The results from Appendix 3 show that regardless of the WHO stage baseline, transition rates to CD4 recovery are higher than transition rates to CD4 deterioration. The rates of CD4 recovery are the highest for transitions from state 5 to state 4. Transition rates to state 6 (death) are the highest for those individuals who had severe HIV symptoms (WHOBL = 4) and these intensities decrease as the symptoms decrease from severe to asymptomatic levels.

Expected holding times

The expected holding time in each state also known as the mean sojourn time describes the average time an individual spends in each state in a single stay before he/she makes a transition to another state. The mean sojourn time in each state i for i = 1, 2, …, 5, is estimated as \( \frac{1}{\lambda_i} \), where \( {\lambda}_i={\sum}_{i\ne j}{\alpha}_{ij} \) is the total force of transition out of state i. For example, the expected holding time in state 1 is 1/(0.887 + 0.1016 + 0.1418) ≈ 0.8844 as shown in Table 3 below:

Table 3 Expected holding times in each state

Results from Table 3 show estimates of the holding time, the standard error (SE), the lower bound (L) and the upper bound (U) for each of the transient state i. From the results, if an individual is in state 5 (corresponding to a CD4 count below 200cell/mm3) he spends more time in that state before making a transition to other states. This could be due the time taken by an individual to respond to treatment since state 5 is the worst state in HIV/AIDS progression.

The jump chain

This is when a Markov process is observed at the times it makes transitions to a new state. In other words a jump chain is a stochastic matrix R of probabilities where each row sums to one, on the state space X t , which gives the conditional probability of the next state an individual goes to after leaving state i. If α ii  > 0 then given that there is a jump to a different state, it means we never stay in state i, we make a jump out resulting in having R ii  = 0 and if α ii  = 0 then we never leave state i meaning that R ii  = 1 (States 6 and 7). The computed matrix of probabilities of each state being next (also known as the jump chain), together with the mean sojourn times in each state, fully define a continuous-time Markov model. This is a more intuitively meaningful description of a model than the transition intensity matrix. The matrix for the probabilities that the next state after state i is state j is approximated as \( {p}_{ij}=\frac{\alpha_{ij}}{\lambda_i} \) , for each i and j such that i ≠ j. α ij is the force of transition from state i to state j and α ii is the total force of transition out of state i. For example, \( {p}_{12}=\frac{\alpha_{12}}{\lambda_1}=\frac{0.8872}{0.8872+0.1016+0.1418}=0.7847 \), as shown in the matrix below. The results are shown Table 4 below:

Table 4 Probability of each State being next (R ij )

The results from Table 4 show that Ri, 1 − 1 > Ri, i + 1, which shows that the probability of jumping to a better state is higher than the probability of jumping to a worse state. This is more pronounced for individuals in state 5 where the probability of jumping to state 4 (recovery) is 0.8123 which is very high compared to probability of making a jump to state 6. This is an indication of the effectiveness of treatment. Probability of the death state being next is the highest for those patients with CD4 counts less than 500. These probabilities increase with the decreasing number of CD4 counts.

Forecast of the total length of stay in each state

We need to forecast the total time spent in the good states and the bad states by individuals who are on HIV treatment before death or withdrawal from the study. Estimates of the forecasted total lengths of time spent in each state j between two future time points t1 and t2 are estimated using the formula:

$$ {L}_j=\underset{t_1}{\overset{t_2}{\int }}{P}_{ij}(t) dt $$

where i is the state at the start of the process, which defaults to 1. The results are shown below:

State1 State2 State3 State4 State5 State6 State7
8.988960 8.806075 7.767124 3.520485 1.153648 Inf Inf

The results show that each individual is forecasted to spend approximately 8.99 half years in state 1, 8.8 half years in state 2, 7.77 half years in state 3, 3.52 half years in state 4 and finally 1.153 half years in state 5. These results show that HIV positive individuals on treatment are expected to spend more time in good states compared to the time spent in bad states.

Percentage prevalence for the model without covariates.

Using the fitted time-homogeneous Markov model, the percentage prevalence were plotted to compare the expected values with the observed values. The results are shown in Fig. 3 below:

Fig. 3
figure 3

Comparison of observed and expected prevalence from the time-homogeneous model without covariates

The results from Fig. 3 show that for the state i = 1, …, 6 the expected prevalence fit the observed data perfectly well except for the withdrawal state where the expected prevalence overestimate the observed. The plots further show a sharp decrease on state 5 percentage prevalence with the fitted model, underestimating the model for observed data up to time = 7 half years. The percentage prevalence for the death state is increasing at a slow rate and from time = 2 half years to time = 8 half years the percentage prevalence is stable.

Effects of covariates on transition intensities

A continuous-time Markov model for the effects of covariates; Age, CD4BL, VLBL, WSBL, Reaction, DTB, TBB4 and Gender is fitted. Identification of covariates that have a significant contributory effect is done by entering each covariate one after the other and performing the likelihood ratio test in comparison to the model without covariates. All the other variables proved to be significant to the progression except for the variable gender which could not be eliminated because of its demographic importance. The baseline transition intensities \( \left({\alpha}_{ij}^{(0)}\right) \) relate to the transitions from state i to state j. Baseline transition intensities and linear effect of each of the covariates is estimated and the results are shown in two separate Table 5 and Appendix 1 respectively:

Table 5 Baseline intensities and their corresponding confidence intervals for the covariate effects

The fitted time homogeneous model with covariates has -2xLL = 3699.259, which represents an improvement of 242.712 compared to the model without covariates. A Likelihood ratio test is performed to compare the two nested models that were fitted, the one without covariates and the other with covariates. The value of the \( LRT=-2 lo{g}_e\left(\frac{L_s\left(\widehat{\theta}\right)}{L_g\left(\widehat{\theta}\right)}\right) \) where \( {L}_s\left(\widehat{\theta}\right) \) is the simple model (no covariates) and \( {L}_g\left(\widehat{\theta}\right) \) is the general model (with covariates). A likelihood ratio test statistic of 1770.618 is compared to a χ2 distribution with 144 degrees of freedom. The test was performed and the results are shown below:

  −2logLR Df p-value
with.covariates 1770.186 144 10−4

The results show that the model with covariates fits significantly better than the model without covariates.

Hazard ratios of covariates on transition intensities

In this section the hazard ratios for each of the covariates; VLBL-viral load baseline, DTB-develop TB during treatment period, TBB4-develop TB before treatment, Gender, React-reaction to treatment, CD4BL-CD4 baseline, WSBL-WHO stage baseline and Age are estimated. The relationship between these covariates and the transition intensities is defined by the following equation:

$$ {\alpha}_{ij}\left(\boldsymbol{Z}\right)={\alpha}_{ij}^{(0)}\exp \left({\beta}_{ij}^{\hbox{'}}\boldsymbol{Z}\right),\kern0.75em i\ne j, $$

where Z = [VLBL, DTB, TBB4, Gender, React, CD4BL, WSBL, Age ] is a k = 8-dimensional vector of covariates and β ij is a vector of k regression parameters relating the instantaneous rate of transitions from state i to state j to the covariates Z and baseline intensities \( {\alpha}_{ij}^{(0)} \) relating to the transition from state i to state j as shown in Table 5 above. Estimates of β ij ’s, regression coefficients, were calculated and the results are shown in Appendix 1. The regression coefficients can be interpreted similarly to those in the proportional hazards regression model [19]. The results are shown in Table 6.

Table 6 Hazard ratios for the covariates on intensities

The -2xLL for the model fitted in Table 6 is 3699.259. The results show that the strongest predictor of transition from state 1 to 2 is a negative reaction to treatment, which has a hazard ratio of 4.715. This means that patients who developed some form of reaction were over 4 times more likely to transit from a level of CD4 ≥ 750 to a level of 500 ≤ CD4 < 750 than patients who did not react to treatment. However, from all the other states, hazard ratios for the patients who reacted to treatment are higher for immune recovery than for immune deterioration.

The strongest predictor of immune deterioration from a CD4 level between 350 and 500 to a CD4 level between 200 and 350 (3 to 4) is developing TB during treatment, with a hazard ratio of over 2. Developing TB is also the strongest predictor of immune deterioration from 4 to 5, with a hazard ratio also greater than 2. This means that TB is the major cause of further immune deterioration when the immune system is too weak. Hence the recommendation that HIV patients should continuously have their TB status checked. Those individuals who had TB before enrolment had the strongest predictor for the transition from state 3 to state 6. These patients had a hazard ratio of over 2 times more likely to die from state 3 than those who were enrolled without having TB. However, for these individuals, transitions to better states were generally higher than transitions to worse states for almost all states.

A hazard ratio of 6.46 for the predictor variable male shows that males were over 6 times more likely to transit from state 2 to 3 than their female counterparts. The hazard ratios of males from a bad state to a better state are less than 1, which is an indication that males are less likely to respond to treatment compared to females.

The hazard ratios for the transitions to a better state for patients who were enrolled with CD4 counts below 350 are less than one, but hazards to worse states are greater than one, an indication that starting treatment when the CD4 levels are below 350 retards immune recovery. The transitions to the death state for individuals who started treatment when they were on the WHO stage of 4 are all more than one, meaning that starting treatment with a WHO stage of 4 is a leading cause of being absorbed in the death state.

Percentage prevalence for the model with covariates

The prevalence for the model with covariates were plotted to examine areas of poor fit of the time-homogeneous model with covariates. The plots are shown in Fig. 4.

Fig. 4
figure 4

Comparison of observed and expected prevalence from the time-homogeneous model with covariates

Figure 4 confirms that the inclusion of covariates on the model improves the fitness of the model since the expected prevalence is now perfectly closer to the observed prevalence for all states than the model without covariates.


In this paper a continuous-time homogeneous Markov model is fitted to explore predictors of HIV/AIDS progression for patients on antiretroviral therapy. A continuous-time homogeneous model is fitted with and without covariates and comparison of these two models is done using the likelihood ratio test. Parameters that define progression of HIV/AIDS were estimated and these include transition intensities, mean sojourn times and probability of each state being next or jump chains. The fitted model is used to analyse the effects of the covariates on the transition intensities. These covariates were reaction to treatment, development of TB during treatment and gender among others.

Results from the likelihood ratio test show that the model with covariates provides a better fit than the model with no covariates with a p-value =10{−4}. The results show that transition rates to immune recovery are generally higher than the transition rates to immune deterioration. However, the results show that the strongest predictor of immune deterioration from state 1 (CD4 cell count greater than 750) to state 2 (CD4 cell count between 500 and 750) is reaction to treatment. These patients are 4 times more likely to transit from state 1 to state 2 than those who did not react to treatment.

Patients who developed TB during the course of treatment have higher chances of immune deterioration than immune recovery compared to those who did not develop any TB co-infection. These incidences are quite high for transition from state 4 (CD4 cell count between 200 and 350) to state 5 (CD4 cell count below 200). For these states the immune system is still weak. As a result patients on antiretroviral drugs should consistently be screened for TB co-infection. Patients who had initially been diagnosed with TB before commencement of ART recover better from HIV/AIDS disease except that transitions to death for patients with CD4 cell count between 350 and 500 cells/mm3 are two times higher than that of patients who were not initially diagnosed with TB.

From this cohort, transitions to bad states are higher for males than for their female counterparts. This is quite pronounced on transitions from state 2 (CD4 cell count between 500 and 750) to state 3 (CD4 cell count between 350 and 500) where the hazards for males are 6 times that of females. This result is consistent with the findings from Maskew and others, they discovered that men gain fewer CD4 cell counts than did women [20]. An assessment of published studies by Castillo and others [21] from both resource-limited and resource-rich countries suggest an improved survival outcomes for females than males. However, the studies they assessed do not show a clear sex disparity in the disease progression or in treatment effects of viral suppression and immunologic recovery.

The results from the fitted model show that the rates of immune recovery were much higher than the rates of immune deterioration which is an indication of effectiveness of treatment. Patients who started treatment when their CD4 baseline was at least 350 had higher rates of immune recovery than those who had a lower CD4 baseline. This result is commensurate with the findings from Moore and Keruly who also discovered that patients with baseline CD4 cell count above 350 cells/mm3 returned to nearly normal CD4 cell count after 6 years [22]. The probability of dying increases with decreasing CD4 count of the individual at enrolment. This is supported by the findings of [23,24,25], who also concluded that being in the AIDS defining stage leads to the highest probability of reaching the death state.

The mean sojourn times revealed that patients take longer time in the AIDS defining states (CD4 cell count below 200) before they move to the other states. Research has also shown that CD4 cell count rises gradually despite the suppressed viral load particularly in older patients. Hence, there is need to use both CD4 cell count and viral load in monitoring the efficacy of treatment. The younger people below the age of 40 have higher chances of immune recovery than the older ones. This finding is supported by some previous studies who concluded lower mean CD4 increases for older patients than younger patients [20, 26]. Alioum and others further argued that this could be caused by the fact that older subjects may have a reduced capacity to generate CD4 cells in response to the viral killing [10].

Although continuous time Markov models can handle multiple or recurrent outcomes compared to the Kaplan Meier analysis and Cox proportional hazards models, the assumption of constant hazard function that is frequently unrealistic [27] and puts limitations on the disease history behaviour [28], especially on HIV/AIDS progression for patients on ART. Some studies have shown that if a patient responds well to treatment and manages to achieve viral load suppression within the first 6 months, that patient is likely to continue responding well to treatment [29]. This goes against the Markov and memoryless properties of the models. Thus a limitation in the application of time homogeneous Markov processes.


  1. Perelson AS, Nelson PW. Mathematical analysis of hiv-1 dynamics in vivo. SIAM Rev. 1999;41(1):3–44.

    Article  Google Scholar 

  2. Sonnenberg P, Glynn JR, Fielding K, Murray J, Godfrey-Faussett P, Shearer S. How soon after infection with HIV does the risk of tuberculosis start to increase? A retrospective cohort study in south African gold miners. J Infect Dis. 2007;191:150–8.

    Article  Google Scholar 

  3. Cingolani A, Lepri AC, Castagna A, Goletti D, De Luca A, Scarpellini P, Fanti I, Antinori A, d’Armino Monfarte A, Girardi E. Impaired CD4 T-cell count response to combined antiretroviral therapy in antiretroviral-naive HIV-infected patients presenting with tuberculosis as AIDS-defining condition. CID. 2012;54(6):853–61.

    CAS  Article  Google Scholar 

  4. Ndumbi P, Falutz J, Pai NP, Tsoukas CM, et al. PLoS One. 2014;9(4):e94018.

  5. Phillips AN, Staszewski S, Weber R, Kirk O, Francidi P, Miller V, Vernazza P, Lundgren JD, Ledergerber B. HIV viral response to antiretroviral therapy according to the baseline CD4 cell count and viral load. JAMA. 2001;286:2560–7.

    CAS  Article  PubMed  Google Scholar 

  6. Abner EL, Nelson PT, Schmitt FA, Browning SR, Fardo DW, Wan L, Jicha GA, Cooper GE, Smith CD, Caban-Holt AM, Van Eldik LJ, Kryscio RJ. Self-reported head injury and risk of late-life impairment and AD pathology in an AD center cohort. Dement Geriatr Cogn Disord. 2014;37:294–306.

    Article  PubMed  Google Scholar 

  7. Mullins LJ. Management and Organisational behaviour. 4th ed. London: Pitman Publishing; 1996.

    Google Scholar 

  8. Foucher Y, Mathieu E, Saint-Pierre P, Durand JF, Daures JP. A semi-Markov based on generalised Weibull distribution with an illustration for HIV disease. Biom J. 2005;6:1–9.

    Google Scholar 

  9. Longini IM, Clark WS, Byers RH, Ward J, Darrow WW. Statistical analysis of the stages of HIV infection using a Markov model. Stat Med. 1989;8:831–43.

    Article  PubMed  Google Scholar 

  10. Alioum A, Leroy V, Commenges D, et al. Effects of gender, age, transmission category, and antiretroviral therapy on the progression of human immuno virus infection using multistate Markov models. Group d'Epidemiologie clinique du SIDA en Aquitaine, Epidemiology. 1998;9(6):605–12.

    CAS  Google Scholar 

  11. Reddy T. HIV disease progression in South Africa using multistate Markov models. SACEMA. 2011;

  12. Binquet C, Le Teuff G, Abrahamovicz M, Mahboubi A, Yazdanpanah Y, Rey D, Rabaud C, Chirouze C, Berger JL, Faller JP, Chavanet P, Quantin C, Piroth L, Groupe InterCOrevih du Nord-Est (ICONE). Markov modelling of HIV infection evolution in the HAART era. Epidemiol Infect. 2009;137(9):1272–82. Epub 2009 Jan 12

  13. Grover G, Gadpayle KA, Swain PK, Deka B. A multistate Markov model based on CD4 cell count for HIV/AIDS patients on antiretroviral therapy (ART). Int J Stat Med Res. 2013;2:144–51.

    Google Scholar 

  14. Gibson EL. Continuous time multi-state models for survival analysis. Winston-Salem: Wake Forest University, Department of Mathematics; 2008.

    Google Scholar 

  15. Halim D. Maximum likelihood estimation for generalized semi-Markov processes. Discrete event dynamics systems: theory and applications. Dep Ind Eng. 1996;6(2):73–104.

    Google Scholar 

  16. Jackson CH. Multi-state models for Panel Data: The msm package for R. J Stat Softw. 2013;38(8).

  17. Titman, A.C., Model diagnostics in multi-state models of biological systems, Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, 2007.

    Google Scholar 

  18. Kalbfleisch JD, Lawless JF. The analysis of panel data under a Markov assumption. J Am Stat Assoc. 1985;80:863–71.

    Article  Google Scholar 

  19. Cox DR. The statistical analysis of dependencies in point processes. In Stochastic Point Processes. Wiley; 1972.

  20. Maskew M, Brennary AT, Westreich D, McNamara L, MacPhail P, and Fox MP. Gender Differences in Mortality and CD4 Count Response Among HIV-Positive Patients Virally Suppressed Within 6 Months of Antiretroviral Therapy Initiation. J Womens Health. 2012; 22(00).

  21. Castilho JL, Melhekin VV, Sterling TR. Sex Differences in HIV Outcomes in the Highly Active Antiretroviral Therapy Era: A Systematic Review. AIDS Res Hum Retroviruses. 2014;30(5):446–56.

  22. Moore RD and Keruly JC. CD4+Cell Count 6 Years after Commencement of Highly Active Antiretroviral Therapy in Persons with Sustained Virologic Suppression CID. 2007; 44: 441–6.

  23. Biadgilign S, Reda AA, Digaffe T. Predictors of mortality among HIV infected patients taking antiretroviral treatment in Ethiopia: A retrospective cohort study. AIDS Res Ther. 2012;9:15

  24. Goshu TA and Dessie ZG. Modeling progression of HIV/AIDS disease stages using semi-Markov processes. 2013.

    Google Scholar 

  25. Lifson AR, Krantz EM. Grambsch PL. Macalino GE. Crum-Cianflone NF. Ganesan A, Okulicz JF, Eaton A, Powers JH, Eberly LE, Agan BK. Clinical, demographic and laboratory parameters at HAART initiation associated with decreased post-HAART survival in a U.S. military cohort. AIDS Res Ther. 2012; 9(4).

  26. Hasibi M, Hajiabdolbaghi M, Hamzelou S, Sardashti S, Faroughi M, Jozani ZB, Alinaghi SAS. Impact of age on CD4 response to combination antiretroviral therapy: a study in Tahran, Iran. Orld J AIDS. 2014;4:156–62.

    Article  Google Scholar 

  27. Lange JM, Hubbard RA, Inoue LYT, Minin VN. A joint model for multistate disease processes and random informative observation times, with applications to electronic medical records data. Biometrics. 2015;71(1):90–101.

    Article  PubMed  Google Scholar 

  28. Saint-Pierre P, Combescure C, Daures JP and Godard P. The analysis of Asthma control under a Markov assumption with use of covariates. Statistics in Medicine. 2003;22(24):3755–3770.

  29. Maartens G, Cotton M, Meintjes G, Mendelson M, Rabie H, Maharaj S. Clinical guidelines 9th Edition. Copyright Aids for AIDS management (Pty) Ltd. 2012.

Download references


This study would not have been a success without the assistance of the Microbiology Department at the University of Venda in providing the secondary data through Professor Pascal O. Bessong.


Not applicable

Availability of data and materials

The data are available by contacting the corresponding author and can be submitted upon request.

Author information

Authors and Affiliations



CS and DC drafted the manuscript. CS and DC contributed to the analysis and interpretation of the data. Both authors participated in critical revision of the manuscript drafts and approved the final version.

Corresponding author

Correspondence to Claris Shoko.

Ethics declarations

Authors’ information

CS is a PhD student and DC is a senior lecturer at the University of the Free State.

Ethics approval and consent to participate

Not applicable

Consent for publication

The authors give their consent to publish this work.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix 1

Table 7 Linear effects of covariates on transition intensities

Appendix 2

Table 8 Transition intensities for each CD4 baseline

Appendix 3

Table 9 Transition for the WHO Stage Baseline Line

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Shoko, C., Chikobvu, D. Time-homogeneous Markov process for HIV/AIDS progression under a combination treatment therapy: cohort study, South Africa. Theor Biol Med Model 15, 3 (2018).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • HIV/AIDS progression
  • Homogeneous Markov models
  • Reaction to treatment