Markov modelling of viral load adjusting for CD4 orthogonal variable and multivariate conditional autoregressive mapping of the HIV immunological outcomes among ART patients in Zimbabwe

Background This study aimed to jointly model HIV disease progression patterns based on viral load (VL) among adult ART patients adjusting for the time-varying “incremental transients states” variable, and the CD4 cell counts orthogonal variable in a single 5-stage time-homogenous multistate Markov model. We further jointly mapped the relative risks of HIV disease progression outcomes (detectable VL (VL ≥ 50copies/uL) and immune deterioration (CD4 < 350cells/uL) at the last observed visit) conditional not to have died or become loss to follow-up (LTFU). Methods Secondary data analysis of individual-level patients on ART was performed. Adjusted transition intensities, hazard ratios (HR) and regression coefficients were estimated from the joint multistate model of VL and CD4 cell counts. The mortality and LTFU transition rates defined the extent of patients’ retention in care. Joint mapping of HIV disease progression outcomes after ART initiation was done using the Bayesian intrinsic Multivariate Conditional Autoregressive prior model. Results The viral rebound from the undetectable state was 1.78times more likely compared to viral suppression among patients with VL ranging from 50-1000copies/uL. Patients with CD4 cell counts lower than expected had a higher risk of viral increase above 1000copies/uL and death if their VL was above 1000copies/uL (state 2 to 3 (λ23): HR = 1.83 and (λ34): HR = 1.42 respectively). Regarding the time-varying effects of CD4 cell counts on the VL transition rates, as the VL increased, (λ12 and λ23) the transition rates increased with a decrease in the CD4 cell counts over time. Regardless of the individual’s VL, the transition rates to become LTFU decreased with a decrease in CD4 cell counts. We observed a strong shared geographical pattern of 66% spatial correlation between the relative risks of detectable VL and immune deterioration after ART initiation, mainly in Matabeleland North. Conclusion With high rates of viral rebound, interventions which encourage ART adherence and continual educational support on the barriers to ART uptake are crucial to achieve and sustain viral suppression to undetectable levels. Area-specific interventions which focus on early ART screening through self-testing, behavioural change campaigns and social support strategies should be strengthened in heavily burdened regions to sustain the undetectable VL. Sustaining undetectable VL lowers HIV transmission in the general population and this is a step towards achieving zero HIV incidences by 2030. Supplementary Information The online version contains supplementary material available at 10.1186/s12976-021-00145-y.

Conclusion: With high rates of viral rebound, interventions which encourage ART adherence and continual educational support on the barriers to ART uptake are crucial to achieve and sustain viral suppression to undetectable levels. Area-specific interventions which focus on early ART screening through self-testing, behavioural change campaigns and social support strategies should be strengthened in heavily burdened regions to sustain the undetectable VL. Sustaining undetectable VL lowers HIV transmission in the general population and this is a step towards achieving zero HIV incidences by 2030.
Keywords: CD4 counts, Viral load, Joint multistate models, Joint spatial maps, Multivariate conditional autoregressive prior, HIV disease progression Background Antiretroviral therapy (ART) has since been the backbone of HIV prevention and control as it reduces viral load (VL) replication in the human host by blocking the virus life cycle [1]. Once the virus replication has been inhibited, the CD4 cell counts increase, and the individual life expectancy is expanded [2]. HIV patients' management involves monitoring VL and CD4 cell counts prognostic markers through laboratory repeated measurements. The immunological markers can be utilized in understanding the HIV disease progression patterns among ART patients [2].
Multistate Markov models are mathematical models which have been used to evaluate HIV disease processes; however, the number of states, state cuts-off points and the number of transitions vary across studies [3,4]. This explains the flexibility of the multistate models' implementation, but the models become complex as the number of states and transitions increase. These models have been used extensively in HIV disease progression using CD4 cell counts states [5,6] or VL states [7,8] to define the model states (categories) separately. Of these two prognostic markers, VL is the preferred marker in HIV monitoring due to its high sensitivity. However, there had been a delay in the implementation and rolling out of VL testing in most developing countries due to costrelated challenges [9].
The low-middle-income countries (LMIC) have traditionally relied on the use of CD4 cell counts in HIV disease progression monitoring as this has been the readily available laboratory marker; however, if both VL and CD4 cell counts are available, it is imperative to model these prognostic markers jointly to understand better the HIV disease progression patterns. Joint modelling of these two prognostic markers helps to explain those effects which one marker cannot explain in the absence of the other marker.
In HIV disease monitoring programmes, Bayesian spatial modelling is an emerging tool to analyze spatially related multidimensional data with an underlying spatial process to guide policy [8,10]; however, joint spatial modelling using the Bayesian intrinsic Multivariate Conditional Autoregressive (MCAR) prior has not been fully utilized in this field. The advantage of joint mapping is that it gives an understanding of the HIV dynamics and the spatial overlap between the joint mapped outcomes.
The main objective of our study was to jointly model HIV disease progression using two 5-stage timehomogenous multistate Markov models based on CD4 cell counts and VL. We firstly fitted two timehomogeneous multistate Markov models with states defined by CD4 cell counts and VL. In each multistate model, we jointly model these prognostic markers with one marker defining the finite multistate model states and the other marker forming the covariate matrix components of the regression model. The first covariate was an orthogonal variable generated using the principal component analysis (PCA) [11]. The second covariate was the time-varying "incremental transient states" variable to estimate the changes in transition rates over time with respect to how the marker changes [7]. Uniquely to this study is the inclusion of these two covariates in a single multistate model covariate matrix which has been a gap in previous studies that incorporated either the orthogonal variable or the time-varying covariate only [11]. We further performed a joint spatial mapping of HIV disease progression outcomes (detectable VL (VL ≥ 50copies/uL) and immune deterioration (CD4 < 350cells/ uL) at the last observed visit conditional not to have died or become loss to follow-up (LTFU) to describe the spatial overlap of the two outcomes.

Data source, description and study design
This study was a secondary analysis of data from patients' records compiled for monitoring and guiding programme planning. The data used in this study came from the Zimbabwe National ART programme collected through the electronic patients' management system (ePMS) database described elsewhere [12,13]. The electronic system was implemented nationally to improve and increase efficiency in HIV patients' management and monitor their response to ART. A stratified sampling of health facilities offering HIV services was done; hence, a representative sample of primary, secondary, tertiary and quaternary health facilities which provide HIV services was achieved during the ePMS roll out. However, a continual up-scale is currently ongoing, which will ensure that the system covers all health facilities.
We considered individuals aged 15 years and above, with at least two repeated measurements of both CD4 cell counts and VL who initiated ART between 2004 and 2017. However, since this was programme data, the follow-up visits of the patients were intermittent, and there may have been clinical reasons for requesting CD4 count and VL measurements from patients. Each individual had an average follow-up period of 2 years, and the majority of the patients (79.2%) in the final sample considered for analysis were enrolled on ART between 2009 and 2015.
where LTFU was defined as a failure of a patient to report for drug refill for at least 90 days from the last appointment date or if the patient missed the next scheduled visit date and never showed up again. The schematic presentation of the 5-stage multistate model is shown in Fig. 1. Based on these possible transitions shown in Fig. 1, the corresponding transition rates are defined by a 5 × 5 transition matrix Q(t) with λ ij elements defining the movement between state i and state j with properties P 5 j¼1 λ ij ¼ 0 and λ ii = − ∑ i ≠ j λ ij . In reality, the individuals who become LTFU can return to the clinic for continual monitoring. However, in this study, we could not ascertain any returns of these participants after they became LTFU. This means the λ 5j transition rates from the LTFU state were not estimated as the data could not support this.
The effects of the covariates on each model were modelled using the semi-parametric proportionality hazards. We aimed to describe how the uncorrelated factor of VL values can explain the component of mortality, LTFU or HIV disease progression transition rates that cannot be explained by the CD4 cell counts alone ignoring VL measurements. To achieve this, we generated an orthogonal variable through the PCA technique, which is a data reduction approach used to combine highly correlated variables into uncorrelated components to improve model efficiency as described elsewhere [11]. The orthogonal variable was then included as a covariate in the proportionality hazard model. The second covariate was the time-varying VL measurements variable which was categorized into "incremental transient states". The cutoff points for the time-varying VL covariate were similar to those defined in equation [2] above, excluding the mortality and the LTFU states. We simultaneously modelled the effects of these two covariates in a single model. Therefore, we fitted two proportional hazard multistate models of the generic form: where λ ij, k /Z(t) is the transition rate between state i and state j at the time t given a covariate matrix Z and λ ij, (0) is the baseline hazard rate of the model. The two multistate Markov models fitted were: where equation [4] defines the CD4 cell counts multistate Markov model with λ ij(CD4, 0) as the baseline transition intensity for an individual k with positive residual values for the orthogonal effect of VL ðP Ã VLðkÞ ¼ 0Þ and the time-varying VL transient levels (VL states, k = 1(VL < 50 copies/uL)). Similarly, equation [5] defines the VL multistate Markov model with λ ij(VL, 0) representing the baseline transition intensities for an individual k with positive residual values for the orthogonal effect CD4 cell counts ðP Ã CD4ðkÞ ¼ 0Þ and the time-varying CD4 cell counts levels (CD4 states, k = 1(CD4 ≥ 500 cells/uL)). The Since the analyses were purely based on existing modelling approached within the "msm" R library, additional covariates like baseline CD4 cell counts, baseline VL values, age, sex and WHO staging could be adjusted for in the model; however, in this study, we encountered a convergence warning as more covariates were added. When the multistate Markov model fails to converge, it means the optimization criterion could not converge to the maximum likelihood; hence, the standard errors of the estimated parameters were not calculated, and the confidence interval estimates were missing on the printout of results. This modelling challenge is normally faced if there is data sparsity in some cells within the multistate model. Usually, the scaling factor is increased to a higher value adjusted to normalize the likelihood and prevent overflow within the optimization process; however, in our case, this was not helpful.
The final model selection was based on the Akaike Information Criterion (AIC) values which are defined as AIC = − 2(Log − likelihood) + 2q, where 2q represents the variance component, q is the number of parameters to be estimated in the fitted model and the bias is defined by −2(Log − likelihood). The better fitting model is the model with the lowest AIC value. The nested models were assessed using the likelihood ratio test (LRT) defined as: where L 1 ðφÞ is the likelihood for the simple (unsaturated) model with few covariates and L 2 ðφÞ is the likelihood for the full (saturated) model with additional covariate(s). A significant p-value < 0.05 for the LRT leads to the rejection of the null hypothesis (simple model is better) in favour of the alternative hypothesis (full model is better). To assess how well the final selected models predict HIV disease, mortality and LTFU, percentage prevalence in each state were plotted to compare the observed and the expected frequencies.

Joint mapping of HIV progression based on two prognostic markers
Furthermore, an intrinsic MCAR prior model by Besag et al. (1991) [14] was fitted to jointly map the relative risks of the two HIV disease progression immunological outcomes (immune deterioration (CD4 < 350cells/uL) and detectable VL (VL≥50copies/ uL) at the last observed visit) and estimate their shared geographical pattern. The choice of the VL immunological outcome cut-off point was based on the global goal to achieve an undetectable VL to minimize viral transmission in the general population [15] while the choice of the CD4 cell counts immunological outcome cut-off point was based on earlier studies that have shown that individuals with CD4 cell counts of 200-350cells/uL immune deterioration patterns are not significantly different from those with CD4 cell counts less than 200cells/uL [5]. The MCAR model is a special type of Gaussian Markov random field prior models (GMRF) [14,16]. Let the number of observed individuals with detectable VL (VL≥50copies/uL) beY VL, r , and the number of observed individuals with immune deterioration (CD4 < 350cells/ uL) be Y CD4, r at the last observed visit as reported for the region S r , where the set of regions {S r }, r = 1, 2, 3, …, R represents a finite number of the areas partitioned from the entire study area. In this study, we considered partitioning the Zimbabwe country into 10 provinces, i.e. R = 10. The geospatial variations are observed through an aerial map partitioned based on administrative spatial units. For each region, the expected number of individuals for each prognostic marker outcome (E VL, r and E CD4, r )was calculated using the following generic formula: E r;q ¼ n r;q r q ≡ n r;q P r y r;q P r n r;q for r where r q is the overall detectable VL (VL ≥ 50copies/uL) or immune deterioration (CD4 < 350 cell/uL) rate in the whole study region, n r, q is the at-risk population in the region r and y r, q is the total counts of individuals observed in the region. This approach is commonly referred to as "internal standardization". The observed frequencies can be considered as the random variables while the expected frequencies are thought of as fixed and known functions of the at-risk population n r, q in the region r. In this study, we assumed that the observed count data follow a Poisson distribution for the two prognostic marker outcomes, i.e.
where α q is an intercept term representing the baseline (log) relative risk of disease progression outcomes q across the study region. We further assumed that the log relative risks are spatially correlated across the regions, and the log relative risks for the two prognostic markers are also correlated within each region r due to shared region-level unmeasured risk factors. These assumptions were supported through the intrinsic bivariate CAR prior of a 2 × R dimensional matrix of S q, r values. The spatial prior is expressed as: where S 1(−q) , S 2(−q) denotes the elements of the 2 × R matrix, S r ¼ ðS r;1 ; S r;2 Þ and S r;p ¼ P f in σ r S fp =n r where σ r and n r denote the set of labels of the "neighbours" of the region r and the number of neighbours, respectively assuming a p = 2. The matrix V is a 2 × 2 covariance matrix with diagonal elements v 11 and v 22 denoting the conditional variances of S 1 and S 2 respectively, and an off-diagonal element v 12 denoting the conditional within-area covariance between S 1 and S 2 .
The MCAR model was performed in the OpenBUGS, which is open-source statistical software and the code used is provided in the Additional file 1 section. We ran 10,000 Markov chain Monte Carlo (MCMC) simulations, burn-in of 2000 and thinning of 10. A prior sensitivity analysis was done through varying prior distributions and parameter values. We ran simultaneously two chains and the results reported are based on the better chain of initial values. Model diagnostics were assessed through trace plots which should traverse rapidly in the same region, density plots which should show a smooth curve and autocorrelation plots which should show a quick sharp drop in early lags.

Results
A total of 3896 participants contributed 8655 follow-up observations. There were 2551(65.5%) females and 1345(34.52%) were males. The average age was 38.23 ± 11.37 years and 1388(35.6%) participants were aged 35-44 years. At baseline, the median CD4 cell count was 211cells/uL with an interquartile range (IQR) of 114-320cells/uL and 1846 (47.4%) participants had CD4 cell counts of below 350cells/uL. The median VL was 57copies/uL (IQR: 48-66copies/uL) and 2557 (65.6%) participants had VL between 50 and 1000 copies/uL at baseline. All participants were on three-drug combination therapy and none were receiving protease inhibitors (PI) or integrase inhibitors INSTI) regimen. Most of the participants (95%) were on a first-line three-drug combination therapy which was a combination therapy of two nucleoside reverse transcriptase inhibitors (NRTI) namely Tenofovir (TDF) and Lamivudine (3TC); and one non-NRTI namely Efavirenz (EFV), that is, TDF + 3TC + EFV, while 5% were on second-line drugs. Table 1 shows the simple regression model results used to generate the orthogonal covariates for both the CD4 cell counts and the VL model. We observed a significant correlation between CD4 cell counts and VL for both models and both slopes were negative as expected. Table 2 presents a detailed summary of the loglikelihood values, LRT statistics, LRT p-values and the AIC values. Using the LRT results for the nested models, we assessed a better fit model between the no covariate model ½λ ij ðtÞ ¼ λ ijð0Þ expðβ 0 ij Þ; i≠j and the orthogonal adjusted model [λ ij (t) = λ ij(0) exp(β ij × P * )] for both the CD4 cell counts and VL multistate models.
Both the CD4 cell count and the VL multistate models showed that the model with the orthogonal variable (model 2) was a better fit compared to the no variate model (model 1). Similarly, the multistate models which adjusted for the time-varying effects of the other prognostic marker, [λ ij, were better fit compared to the no covariate models (model 1). Simultaneously adjusting for the orthogonal variable and the time-varying effects on both the CD4 cell count model and VL model (model 4) further improve the model, P-value< 0.05; hence, the interpretation of results was based on these models (model 4).

Modelling of CD4 cell counts adjusting for the viral load orthogonal and time-varying effects
The orthogonal and the time-varying VL effects were regressed in a single model. In this model, the movement between state i to state j for i > j defines an increase in CD4 cell counts which indicates immune recovery process while i < j defines a decrease in CD4 cell counts, which means immune deterioration process.
In Table 3, results show that the rates of decrease in CD4 cell counts (immune deterioration) among patients in state 2 (CD4 cell count range of 350-500 cells/uL; λ 23 = 0.0404) were 11.22 times higher than the rates of increase in CD4 cell counts (immune recovery) among patients in state 3 (CD4 < 350cells/uL; λ 32 = 0.0036). Patients in state 2 (CD4 cell count range of 350-500 cells/ uL) were more likely to become LTFU (λ 25 = 0.0551). The positive VL log-linear effects showed an increased risk of immune deterioration for an individual in state 2 (CD4 cell count range of 350-500 cells/uL) to state 3 (CD4 < 350cells/uL) ðβ 23 P Ã VLðkÞ ¼ 0:4762Þ . Similarly, the log-linear effects of the time-varying VL variable showed an increased risk of immune deterioration for the λ 23 transition (β 23 VL states, k = 1.0791).
The time-varying VL values and the negative VL (residuals covariate) had an increasing effect on the risk of death in this cohort for individuals with CD4 < 350cell/uL (hazard ratio (HR) =1.75 and HR = 1.67, respectively). Regarding the time-varying effects of VL on the CD4 cell counts transition rates, as the CD4 cell counts increased for an individual with a CD4 cell count range of 350-500 cells/ uL (λ 23 ), the transition rates increased with an increase in VL levels over time. Similarly, individuals with CD4 ≥ 500cells/uL (λ 15 ) and those with CD4 < 350cell/uL (λ 35 ) showed an increased risk of becoming LTFU as their VL increased over time. A comparable trend was observed for individuals with CD4 < 350cell/uL (λ 34 ) who exhibited an increased risk of death with an increase in VL over time.

Modelling of viral load adjusting for the CD4 cell counts orthogonal and time-varying effects
We fitted a time-homogeneous multistate Markov model to evaluate HIV disease progression based on VL defined states. The effects of the orthogonal and the timevarying CD4 cell counts effects were accounted for in a single model. In this model, the movement between state i to state j for i > j defines a VL suppression while i < j defines VL rebound.
Similarly, the negative CD4 cell count residual was associated with an increased risk of viral increase from a VL range of 50-1000copies/uL to VL ≥ 1000copies/uL ð β 23 P Ã CD4ðkÞ ¼ 0:6052; HR ¼ 1:83Þ ; and an increased risk of death among individuals with VL ≥ 1000copies/uL ð β 34 P Ã CD4ðkÞ ¼ 0:3475; HR ¼ 1:42Þ . Regarding the timevarying effects of CD4 cell counts on the VL transition rates, as the VL increased (λ 12 and λ 23 ) the transition rates increased with a decrease in the CD4 cell counts over time while as the VL decreased (λ 21 , λ 31 and λ 32 ) the transition rates decreased with a decrease in the CD4 cell counts over time. The mortality rates of individuals with VL < 50copies/uL and those with VL range between 50-1000copies/uL increased over time as CD4 cell count decreases (λ 14 and λ 24 ). Regardless of the individual's VL state, the transition rates to become LTFU decreased with a decrease in CD4 cell counts. Individuals with high CD4 cell counts were more likely to become LTFU in this cohort (λ 15 , λ 25 and λ 35 ); however, patients with CD4 ≥ 500cells/uL and a VL range of 50-1000copies/uL had an increased risk of becoming LTFU.

Multistate Markov models assessment
We performed a post-estimation test to assess which of the two multistate models fit better on this data in describing the HIV disease progression, mortality and becoming LTFU. We used the prevalence plot to compare the observed and the expected percentage prevalence for CD4 cell counts and VL multistate models. The VL multistate model showed a perfect fit for state 3(VL ≥ 1000copies/uL) and fair (moderate) fit for state 1 (VL < 50copies/uL) and state 4 (died). State 3 (VL ≥ 1000copies/uL) and state 5 (LTFU) of this model were not perfectly fitted by this model (Fig. 2). Regarding the CD4 cell counts multistate model, state 1 (CD4 ≥ 500cell/uL) was perfectly fitted while state 2 (350 ≤ CD4 < 500) and state 4 (died) were fairly fitted. State 3 (CD4 < 350cell/uL) was perfectly fitted between time 0 and time 5 years only while state 5 (LTFU) was not perfectly fitted (Fig. 3). Since the percentage prevalence plots could not give conclusive results on the better model and the LRT was not appropriate as the models were not nested, we used the AIC values. The VL multistate model had an AIC = 7002.71 while the CD4 cell counts multistate model had an AIC = 7337.46; hence, the VL multistate model was a better fit model.
Sub-analysis of the spatial covariate (province of ART enrolment) on the multistate models and joint mapping of the two immunological HIV conditions using the multivariate intrinsic CAR model We further assessed the effects of the region variable (province) on the VL and CD4 cell count multistate  models to identify any association between the observed transition processes and the region. We estimated the HR summarised in Table 5. In general, we observed that there was an association between the transition intensities and the province administrative level. We found that 57.6% of the transition rates for CD4 cell count and VL models adjusting for province covariate showed the same effect direction (risk or protective effect) of the hazard ratios; however, there were variations in the magnitude of the risk of transitions among provinces. Immune deterioration (decrease in CD4 cell counts) was evident among patients belonging to Mashonaland East (P4) (state 1 to 3 and state 2 to 3) and Matabeleland North (P7) (state 1 to 2), while the immune recovery was most evident among patients from Masvingo (P6).
Becoming LTFU was observed to be highest among patients from Mashonaland East (P4) regardless of their CD4 cell count. We found that the risk of viral rebound to above 50copiels/uL was high among patients from Matabeleland North (P7) (state 1 to 2), Mashonaland East (P4) (state 1 to 3) regions. In contrast, viral suppression to undetectable levels (VL < 50copies/uL) was evident among patients from Masvingo (P6) (state 2 to 1) and Matabeleland North (P7) (state 3 to 1). The risk of becoming LTFU was high among patients from Mashonaland East (P4) if their VL < 50copiels/uL, Masvingo (P6) if their VL range was between 50-1000copies/uL and Mashonaland West (P5) if their VL ≥ 1000copies/uL.
To get a pictorial view of the spatial patterns and correlation between the CD4 cell counts marker and the VL measurements, we fitted the multivariate intrinsic CAR prior model with the province as the spatial unit. We jointly modelled those patients who had a VL ≥ 50copies /uL (VL state 2 and 3 combined) at the end of the follow-up to define that group that might not have attained undetectable VL or have a VL rebound to detectable levels, and those patients who had a CD4 < 350cells/uL (CD4 state 3) to define that group that is still in the immune-deterioration phase at the end of the follow-up period. Table 6 shows the posterior estimates after the joint mapping of the two immunological outcomes for HIV disease progression among ART patients based on CD4 cell counts and VL. The posterior correlation between the spatially structured risk components of having a detectable VL (VL ≥ 50copies/uL) and immune deterioration (CD4 < 350cells/uL) was 61.3% (95% credible interval (CI): 47-97%). This strong correlation suggests strong shared geographical patterns of the risk of immune deterioration defined by these two prognostic markers. The baseline (log) relative risk of the HIV disease progression based on detectable VL (VL ≥ 50copies/ uL) was estimated at − 0.472 while that of immune deterioration (CD4 < 350cells/uL) was − 0.043.
The joint mapping of the posterior relative risk of the HIV disease progression defined by the VL (RR1) and CD4 cell counts (RR2) under the MCAR model is shown in Fig. 4. The dark blue or deep grey colours indicate areas with high relative risk while the light blue or light grey colours show regions with the lowest relative risks of the defined conditions. These maps show a geographical overlap of detectable VL (VL ≥ 50copies/uL) and immune deterioration (CD4 < 350cells/uL) in seven provinces. Patients in Matabeleland North province bordering Botswana and Zambia; and Mashonaland East province toward the Mozambique border had higher relative risks of detectable VL (VL ≥ 50copies/uL), and immune deterioration (CD4 < 350cells/uL) outcomes in addition to Bulawayo and Harare metropolitan provinces.

Discussion
The main objective of our study was to jointly model HIV disease progression using two 5-stage timehomogenous multistate Markov models based on CD4 cell counts and VL. The multistate models accounted for the orthogonal and the time-varying covariates  simultaneously which has been a gap in previous studies. The spatial overlap of the HIV disease progression outcomes (detectable VL (VL ≥ 50copies/uL) and immune deterioration (CD4 < 350cells/uL) at the last observed visit conditional not to have died or become loss to follow-up (LTFU) was also described. We fitted different time-homogeneous multistate Markov models and observed that multistate models with both the orthogonal and the time-varying variables were the best fitting models to describe the HIV disease progression. The VL multistate model was superior in predicting the HIV disease progression patterns in this cohort compared to the CD4 cell count multistate model. These findings support what earlier studies have reported on the superiority of VL in monitoring HIV disease progression patterns compared to CD4 cell counts [7,18]. Our findings further support the global guidelines on the use of VL as the primary laboratory marker to routinely monitor HIV disease progression amongst ART patients. The VL multistate model with CD4 cell counts in the covariate matrix showed that VL rebound and VL increase transition rates (state 1 to 2; state 1 to 3 and state 2 to 3) were higher than the VL suppression transition rates (state 2 to 1; state 3 to 2 and state 3 to 1). This means viral rebound was more likely compared to viral suppression in this cohort. We also found that as the VL increased (λ 12 and λ 23 ) the transition rates increased with a decrease in the CD4 cell counts over time while as the VL decreased (λ 21 , λ 31 and λ 32 ) the transition rates decreased with a decrease in the CD4 cell counts decreased over time. This finding aligns with the negative correlation which exists between these two prognostic markers that as CD4 cell count levels decreases, VL increases. These finding could be explained by poor ART adherence which remains a challenge among ART patients [19]. Moreover, treatment failure or treatment side-effects may be possible underlying factors to explain these results [18]. With effective and potent ART; and without non-adherence challenges, the VL is expected to Fig. 4 The joint mapping of the posterior relative risk of the HIV disease progression defined by the viral load (RR1) and CD4 cell counts (RR2) under the multivariate conditional autoregressive model. The maps were generated from OPENBUGS version 3.2.3 [17] https://www.mrc-bsu.cam.ac.uk/software/bugs/openbugs/ decrease as the CD4 cell counts increase. Therefore, interventions that support ART adherence like support groups, quick identification of treatment failure and continual education of the effectiveness of ART should be strengthened.
Becoming LTFU and mortality were more likely among patients with a VL ≥ 1000copies/uL (state 3 to 4 and state 3 to 5). Patients with CD4 cell counts lower than expected (negative orthogonal variable) were associated with an increased VL and mortality if VL > 50copies/uL. This means as patients with high VL immune deteriorate, the risk of death becomes higher. These findings support other earlier studies that once the immune system deteriorates, the chances of immune recovery become slim, and thus when most deaths occur [20][21][22]. The risk of becoming LTFU was high among patients with VL ≥ 50copies/uL and CD4 < 350cells/uL. This can be explained by the fact that more ill patients tend to drop out from the ART programme and are classified as LTFU [5]. However, some of these patients if tracked, would have died, which may result in a misclassification error of the LTFU outcome. In contrast, patients with higher CD4 cell counts were associated with becoming LTFU. This could be explained by the fact that much sicker patients are more likely to be bedridden; hence, they are hospital-bound while healthier patients may become LTFU as a result of "silent-transfers" to nearby health [23,24].
Adjusting for the spatial covariate in the VL multistate model, as expected, the regions with a high risk of viral rebound were also more likely to have a high risk of low CD4 cell counts. The intrinsic MCAR prior model further confirmed a strong overlapping geographical correlation between individuals with a detectable VL (VL ≥ 50copies/ uL) and immune deterioration (CD4 < 350cells/uL) of 66%; hence, a shared geographical pattern of relative risks between these two outcomes exists. Patients staying in provinces that border with nearby countries had high relative risks of immune deterioration and detectable VL, particularly, those from Matabeleland North province in the northern part of Zimbabwe. Matabeleland North province has a busy truck route to neighbouring countries and high mobility of local and tourists. Earlier studies looking at spatial heterogeneity of viral suppression in this province reported similar results [8,25]. People in this province are likely to present late for health care [26], delay ART initiation [27] or engage in sexual activities with multiple partners which subsequently compromise viral suppression among HIV patients [28,29]. Therefore, interventions such as self-testing [30], pre-exposure prophylaxis (PrEP) pills for high-risk groups [31] should be intensified in such regions.
The reported results should be inferred in light of some limitations. Firstly, the dataset used in this study is very small and might not be an accurate representation of the Zimbabwean HIV population. We included only health facility data liked to the ePMS with both VL and CD4 cell counts repeated measurements. In this regard, the MCAR assumption might not have been fully satisfied. The spatial effects were observed at a higher level which might not precisely pinpoint the marginalized areas to guide policy. This was as a result of our small sample size in this study; however, future studies should consider lower administrative level spatial units. Moreover, the MCAR model only describes the spatial interaction across the error terms to explain spatial autocorrelation; however, this model falls short when a more direct presentation of spatial interaction is desired. The VL and CD4 cell counts measurements were not randomly done rather differential monitoring was implemented due to resource constraints. The joint model could not account for the biological order of the association between VL and CD4 cell count, that is, VL change may precede CD4 change. We could not adjust for ART adherence, comorbidities (tuberculosis, diabetes and hypertension); and demographic characteristics (age and gender) due to model convergence issues. Multistate Markov models can estimate multiple transition rates and outcomes simultaneously compared to the Cox proportional hazard models. However, the assumption of constant hazard function does not reflect reality and also the Markov process has a memory loss property which may be a limitation in HIV studies [6]. Despite these limitations, this study managed to provide useful information in HIV monitoring through jointly modelling two HIV prognostic markers and identifying regions with poor immunological outcomes.

Conclusion
In conclusion, the findings from this study provide a foundation on the HIV disease progression patterns in Zimbabwe to guide policy going forward and motivate future statistical modelling to consider such modelling approaches in infectious disease to guide policy and programme management. We found that VL models were much more superior compared to CD4 cell count models in HIV monitoring. We also observed that after joint modelling of CD4 cell counts and VL in a single model, the rates of VL rebound risks are still higher than viral suppression. We observed that having a high VL (VL ≥ 1000copies/uL) increased the risk of becoming LTFU and death. The spatial overlap between VL and CD4 cell counts indicates the inter-linkages between these two markers in HIV monitoring and the shared geographical correlation of HIV disease progression immunological outcomes was high. With the global efforts to achieve zero HIV incidence, interventions which encourage ART adherence like support groups and continual educational support on the barriers for ART uptake is crucial. Region-specific interventions which focus on