Assessing parameter identifiability in compartmental dynamic models using a computational approach: application to infectious disease transmission models

Background Mathematical modeling is now frequently used in outbreak investigations to understand underlying mechanisms of infectious disease dynamics, assess patterns in epidemiological data, and forecast the trajectory of epidemics. However, the successful application of mathematical models to guide public health interventions lies in the ability to reliably estimate model parameters and their corresponding uncertainty. Here, we present and illustrate a simple computational method for assessing parameter identifiability in compartmental epidemic models. Methods We describe a parametric bootstrap approach to generate simulated data from dynamical systems to quantify parameter uncertainty and identifiability. We calculate confidence intervals and mean squared error of estimated parameter distributions to assess parameter identifiability. To demonstrate this approach, we begin with a low-complexity SEIR model and work through examples of increasingly more complex compartmental models that correspond with applications to pandemic influenza, Ebola, and Zika. Results Overall, parameter identifiability issues are more likely to arise with more complex models (based on number of equations/states and parameters). As the number of parameters being jointly estimated increases, the uncertainty surrounding estimated parameters tends to increase, on average, as well. We found that, in most cases, R0 is often robust to parameter identifiability issues affecting individual parameters in the model. Despite large confidence intervals and higher mean squared error of other individual model parameters, R0 can still be estimated with precision and accuracy. Conclusions Because public health policies can be influenced by results of mathematical modeling studies, it is important to conduct parameter identifiability analyses prior to fitting the models to available data and to report parameter estimates with quantified uncertainty. The method described is helpful in these regards and enhances the essential toolkit for conducting model-based inferences using compartmental dynamic models. Electronic supplementary material The online version of this article (10.1186/s12976-018-0097-6) contains supplementary material, which is available to authorized users.


Background
Mathematical modeling is commonly applied in outbreak investigations for analyzing mechanisms behind infectious disease transmission and explaining patterns in epidemiological data [1,2]. Models also provide a quantitative framework for assessing intervention and control strategies and generating epidemic forecasts in real time. However, the successful application of mathematical modeling to investigate epidemics depends upon our ability to reliably estimate key transmission and severity parameters, which are critical for guiding public health interventions. In particular, parameter estimates for a given system are subject to two major sources of uncertainty: noise in the data and assumptions built in the model [3]. Ignoring this uncertainty can result in misleading inferences and potentially incorrect public health policy decisions.
Appropriate and flexible approaches for estimating parameters from data, evaluating parameter and model uncertainty, and assessing goodness of fit are gaining increasing attention [4][5][6][7][8]. For instance, model parameters can be estimated by connecting models with observed data through various methods, including least-squares fitting [9], maximum likelihood estimation [10,11], and approximate Bayesian computation [12,13]. An important, yet often overlooked step in estimating parameters is examining parameter identifiabilitywhether a set of parameters can be uniquely estimated from a given model and data set [14]. Lack of identifiability, or nonidentifiability, occurs when multiple sets of parameter values yield a very similar model fit to the data. Nonidentifiability may be attributed to the model structure (structural identifiability) or due to the lack of information in a given data set (practical identifiability), which could be associated with the number of observations, spatial-temporal resolution (e.g., daily versus weekly data), and observation error. A parameter set is considered structurally identifiable if any set of parameter values can be uniquely mapped to a model output [15]. As such, structural identifiability is the first step in understanding which model parameters can be estimated from data of certain state(s) of the system at a specific spatial-temporal resolution. Structurally identifiable parameters may still be non-identifiable in practice due to a lack of information in available data. The so-called "practical identifiability" considers real-world data issues: amount of noise in the data and sampling frequency (e.g., data collection process) [14].
Several methods have been proposed to examine structural identifiability of a model without the need of experimental data; these include Taylor series methods [15,16], differential algebra-based methods [17,18], and other mathematical approaches [15,19]. These methods tend to work better in the context of simple rather than complex models. Model complexity, in general, is a function of the number of parameters necessary to characterize the states of the system and the spectrum of dynamics that can be recovered from the model. Model complexity affects the ability to reliably parameterize the model given the available data [3], so there is a need for flexible, mathematically-sound approaches to address parameter identifiability in models of varying complexity. Here, we present a general computational method for quantifying parameter uncertainty and assessing parameter identifiability through a parametric bootstrap approach. We demonstrate this approach through examples of compartmental epidemic models with variable complexity, which have been previously employed to study the transmission dynamics and control of various infectious diseases including pandemic influenza, Ebola, and Zika.

Compartmental models
Compartmental models are widely used in epidemiological literature as a population-level modeling approach that subdivides the population into classes according to their epidemiological status [1,20]. Compartmental dynamic models are specified by a set of ordinary differential equations and parameters that track the temporal progression of the number of individuals in each of the states of the system [3,21]. Dynamic models follow the general form: Where _ x i is the rate of change of the system states (where i = 1, 2, …, h) and Θ = (θ 1 , θ 2 , …, θ m ) is the set of model parameters.
The basic reproductive number (denoted R 0 ) is often a parameter of interest in epidemiological studies, as it is a measure of potential for a given infectious disease to spread within a population. Mathematically, it is defined as the average number of secondary infections produced by a single index case in a completely susceptible population [22]. R 0 represents an epidemic threshold for which values of R 0 < 1 indicate a lack of disease spread, and values of R 0 > 1 are consistent with epidemic spread. In the midst of an epidemic, R 0 estimates provide insight to the intensity of interventions required to achieve control [23]. R 0 is a composite parameter value, as it depends on multiple model parameters (e.g., transmission rate, infectious period), and while R 0 is not directly estimated from the model, it can be calculated by relying on the uncertainty of individual parameters.
A simple and commonly utilized compartmental model is the SEIR (susceptible-exposed-infectious-removed) model [1]. We apply our methodology to this low-complexity model and work through increasingly more complex models as we demonstrate the approach for assessing parameter identifiability.

Model 1: Simple SEIR (pandemic influenza)
We analyze a simple compartmental transmission model that consists of 4 parameters and 4 states (Fig. 1). We apply this model to the context of the 1918 influenza pandemic in San Francisco, California [23]. Individuals in the model are classified as susceptible (S), exposed (E), infectious (I), or recovered (R) [1]. We assume constant population size, so S + E + I + R = N, where N is the total population size. Susceptible individuals progress to the exposed class at rate βI(t)/N, where β is the transmission rate, and I(t)/ N is the probability of random contact with an infectious individual. Exposed, or latent, individuals move to the infectious class at rate k, where 1/k is the average latent period. Infectious individuals recover (move to recovered class) at rate γ, where 1/γ corresponds to the average infectious period.
The transmission process can be modeled using the following system of ordinary differential equations (where the dot denotes time derivative): The auxiliary variable C(t) tracks the cumulative number of infectious individuals from the start of the outbreak. It is not a state of the system of equations, but simply a class to track the cumulative incidence cases; meaning, individuals from the population are not moving to class C. The number of new infections, or the incidence curve, is given by _ CðtÞ. For this model, there is only one class contributing to new infections (I), so R 0 , or the basic reproductive number, is simply the product of the transmission rate and the average infectious period: R 0 = β γ .
Model 2: SEIR with asymptomatic and hospitalized/ diagnosed and reported We use a simplified version of a complex SEIR model that consists of 8 parameters and 6 system states (Fig. 2). This model was originally developed for studying the transmission dynamics of the 1918 influenza pandemic in Geneva, Switzerland [24]. In the model, individuals are classified as susceptible (S), exposed (E), clinically ill and infectious (I), asymptomatic and partially infectious (A), hospitalized/ diagnosed and reported (J), or recovered (R). Hospitalized individuals are assumed to be as infectious as individuals in the I class. Again, constant population size is assumed, so S + E + I + A + J + R = N. Susceptible individuals progress to the exposed class at rate β[I(t) + J(t) + qA(t)]/N, where β is the transmission rate, and q is a reduction factor of transmissibility in the asymptomatic class (0 < q < 1). A proportion, ρ, of exposed/latent individuals (0 < ρ < 1) become clinically infectious at rate k, while the rest (1-ρ) become partially infectious and asymptomatic at the same rate k. Asymptomatic cases progress to the recovered class at rate γ 1 . Clinically ill and infectious individuals are diagnosed at a rate α or recover without being diagnosed at rate γ 1 . Diagnosed individuals recover at rate γ 2 . The transmission process can be modeled using the following system of ordinary differential equations: Simple SEIR -Population is divided into 4 classes: susceptible (S), exposed (E), infectious (I), and recovered/removed (R). Class C represents the auxiliary variable C (t) and tracks the cumulative number of infectious individuals from the start of the outbreak. This is presented as a dashed line, as it is not a state of the system of equations, but simply a class to track the cumulative incidence cases; meaning, individuals from the population are not moving to class C. Parameter(s) above arrows denote the rate individuals move between classes. Parameter descriptions and values are found in Table 1 In the above system, C(t) represents the cumulative number of diagnosed/reported cases from the start of the outbreak, and _ CðtÞ is the incidence curve of diagnosed cases.
For this model, there are three classes contributing to new infections (A, I, J), so the reproductive number is the sum of the contributions from each of these classes: R 0 = R 0 A + R 0 I + R 0 J , where: R 0 A = (fraction of asymptomatic cases) x (transmission rate) x (relative transmissibility from asymptomatic cases) x (mean time in asymptomatic class) R 0 I = (fraction of symptomatic cases) x (transmission rate) x (mean time in clinically infectious class) R 0 J = (fraction of symptomatic cases that are hospitalized) x (transmission rate) x (mean time in hospital) [24] Here,

Model 3: The Legrand et al. model (Ebola)
We analyze an Ebola transmission model [25] comprised of 15 parameters and 6 states (Fig. 3). This model subdivides the infectious population into three stages to account for transmission in three settings: community, hospital, and unsafe burial ceremonies. Individuals are classified as susceptible (S), exposed (E), infectious in the community (I), infectious in the hospital (H), infectious after death at funeral (F), or recovered/removed (R). Constant population size is assumed, so S + E + I + H + F + R = N. Susceptible individuals progress to the exposed class at rate (β I I(t) + β H H(t) + β F F(t))/N where β I , β H , and β F represent the transmission rates in the community, hospital, and at funerals, respectively. Exposed individuals become infectious at rate α. A proportion, 0 < θ < 1, of infectious individuals are hospitalized at rate γ h . Of the proportion of infectious individuals that are not hospitalized (1-θ), a proportion, 0 < δ 1 < 1, move to the funeral class at rate γ d , and the rest (1-δ 1 ) move to the recovered/removed class at rate γ i . A proportion, 0 < δ 2 < 1, of hospitalized individuals progress to funeral class at rate γ dh ¼ 1 The remaining proportion (1-δ 2 ) are recovered/removed at rate γ ih ¼ 1 . δ 1 and δ 2 are calculated such that δ represents the case fatality ratio (Table 3). Individuals in the funeral class are removed at rate γ f . The transmission process is modeled by the following set of ordinary differential equations: Here, C(t) represents the cumulative number of all infectious individuals, and _ CðtÞ is the incidence curve for infectious cases.  Table 2 R 0 F = (fraction of cases that have traditional burial ceremonies) x (transmission rate at funerals) x (mean time in funeral class)

Model 4: Zika model with human and mosquito populations
The last example is a compartmental model of Zika transmission dynamics that includes 16 parameters and 9 states and incorporates transmission between two populationshumans and vectors (Fig. 4). This model was designed to investigate the impact of both mosquitoborne and sexually transmitted (human-to-human) routes of infection for cases of Zika virus [26]. In the human population, individuals are classified as susceptible (S h ), asymptomatically infected (A h ), exposed (E h ), symptomatically infectious (I h1 ), convalescent (I h2 ), or recovered (R h ). The mosquito, or vector, population is broken into susceptible (S v ), exposed (E v ), and infectious (I v ) classes. Note that the subscript 'h' is used for humans and 'v' is used for vectors. Constant population size is assumed in both populations, so A proportion 0 < θ < 1 of susceptible humans move to the exposed class at rate ab( where a is the mosquito biting , asymptomatically infected (A h ), exposed (E h ), symptomatically infectious (I h1 ), convalescent (I h2 ), or recovered (R h ). Class C represents the auxiliary variable C(t) and tracks the cumulative number of newly infectious individuals. The mosquito, or vector, population (subscript v; outlined in dark blue) is divided into 3 classes: susceptible (S v ), exposed (E v ), and infectious (I v ) classes. Parameter(s) above arrows denote the rate individuals/ vectors move between classes. Parameter descriptions and values are found in Table 4 Fig. 3 Model 3: The Legrand et al. Model -Population is divided into 6 classes: susceptible (S), exposed (E), infectious in the community (I), infectious in the hospital (H), infectious after death at funeral (F), or recovered/removed (R). Class C represents the auxiliary variable C(t) and tracks the cumulative number of newly infectious individuals. Parameter(s) above arrows denote the rate that individuals move between classes. Parameter descriptions and values are found in Table 3 rate, b is the transmission probability from an infectious mosquito to a susceptible human, β is the transmission rate between humans, α is the relative (human-to-human) transmissibility from exposed humans to susceptible, and τ is the relative transmissibility from convalescent humans compared to susceptible. Exposed individuals progress to symptomatically infectious at rate κ h and then progress to the convalescent stage at rate γ h1 . Convalescent individuals recover at rate γ h2 . The remaining proportion of susceptible individuals (1 -θ) become asymptomatically infected at the same rate, ab( Asymptomatic humans recover at rate γ h and do not contribute to new infections in this model. Susceptible mosquitos move to the exposed class at rate ac[(ρE h (t) + I h1 (t))/N h ], where c is the transmission probability from a symptomatically infectious human to a susceptible mosquito, and ρ is the relative human-to-mosquito transmission probability from exposed humans to symptomatically infected. Exposed mosquitos become infectious at rate κ v . Mosquitos also leave the population at rate μ v , where 1/μ v is the mosquito lifespan.
The transmission process, including both populations, is represented by the set of differential equations below: C(t) represents the cumulative number of symptomatically infectious human cases, and _ CðtÞ contains the incidence curve for symptomatic human cases.
For this example, we have two transmission processes to consider when calculating R 0 : sexual transmission (R hh ) and mosquito-borne (R hv ). The human population has three classes contributing to new infections: exposed, symptomatically infectious, and convalescent, so: The mosquito population only has one infectious class (I v ); the reproductive number is given by: The overall basic reproductive number, considering both transmission routes, is given by the following eq. [26]: Simulated data For each model we simulate 200 epidemic datasets (directly from the corresponding set of ordinary differential equations) with Poisson error structure using the daily time series data of case incidence, or total number of new cases daily. Parameters for each model are set at values based on their corresponding application: the 1918 influenza pandemic in San Francisco (Model 1) [23], 1918 pandemic influenza in Geneva (Model 2) [24], 1995 Ebola in Congo (Model 3) [25], and 2016 Zika in the Americas (Model 4) [26]. As explained below, the simulated data are generated using a bootstrap approach, and we then use these data to study parameter identifiability within a realistic parameter space for each model. Parameter descriptions and their corresponding values for each model are given in Tables 1, 2, 3  and 4.

Parameter estimation
To estimate parameter values, we fit the model to each simulated dataset using nonlinear least squares estimation. The lsqcurvefit function in Matlab (Mathworks, Inc.) is used to find the least squares best fit to the data. This process searches for the set of parametersΘ= (θ 1 ,θ 2 ,…,θ m ) that minimizes the sum of squared differences between the simulated data and the model solution [3]. The model solution f ðt i ;ΘÞ represents the best fit to the time series data. For this method, the initial parameter predictions affect the solution for the model as local minima occur. While we know the true parameter values (used to generate the data), this is unrealistic for a real-world modeling scenario. We vary the initial guesses of the parameter values to vary according to a uniform distribution in the range of +/− 0.1 around the true value. Another approach would consist of repeating the least squares fitting procedure several times with different initial parameter guesses and selecting the best model fit.
For each model, the sets of parameters are denoted by Θ i , where i represents the number of parameters being jointly estimated. We begin with estimating one model parameter, while fixing the rest, and then increase the number of parameters jointly estimated by one until all parameters of interest are included. Population size, N, is always fixed to the true value. Also, while R 0 is not being directly estimated from the model, it is a composite parameter that can be calculated using individual parameter estimates.
For each model described above, we explore parameter identifiability for the following sets of parameters. Here, the symbol^is used to indicate an estimated parameter, while the absence of this symbol indicates that the parameter is set to its true value from the simulated data.

Bootstrapping method
We use the parametric bootstrap approach [3,27,28] for simulating the error structure around the deterministic model solution in order to evaluate parameter identifiability. This computational approach involves repeatedly sampling observations from the best-fit model solution. Here we use a Poisson error structure, which is the most popular distribution for modeling count data [3]. The step-by-step approach to quantify parameter uncertainty is as follows: 1. Obtain the deterministic model solution (total daily incidence series) using nonlinear least-squares estimation (Section 2.3).

Parameter identifiability
When a model parameter is identifiable from available data, its confidence interval lies in a finite range of values [29,30]. Using the bootstrapping method outlined in Section 2.4, we obtain 95% confidence intervals from the distributions of each estimated parameter. A small confidence interval with a finite range of values indicates that the parameter can be precisely identified, while a wider range could be indicative of lack of identifiability. To assess the level of bias of the estimates, we calculate the mean squared error (MSE) for each parameter. MSE is calculated as: where θ represents the true parameter value (in the simulated data), and b θ i represents the estimated value of the parameter for the i th bootstrap realization.
When a parameter can be estimated with low MSE and narrow confidence, this suggests that the parameter is identifiable from the model. On the other hand, larger confidence intervals or larger MSE values may be suggestive of non-identifiability.

Model 1: Simple SEIR
Additional files 1, 2 and 3: illustrate the empirical distributions of the estimated parameters, where Additional file 1: represents the results forΘ 1 (β only), Additional file 2: for Θ 2 (β and γ), and Additional file 3: forΘ 3 (β, γ, and κ). The figures also show the original simulated data and the 200 simulated datasets for each estimated parameter set.
Estimating only β (Θ 1 ), results in precise (small confidence interval range) and unbiased (small MSE) estimates of β. Similarly, estimating β and γ (Θ 2 ) provides precise and unbiased estimates for both parameters. The precision of the estimates can be seen in Fig. 5: the confidence intervals for the estimates (represented by red vertical lines) remain close to the true parameter value (blue horizontal dotted line). The MSE plot (Fig. 6) shows an MSE value of < 10 − 7 for β in Θ 1 and values of < 10 − 4 for both β and γ in Θ 2 .
Simultaneously estimating all 3 parameters, β, κ, and γ (Θ 3 ), results in wider confidence intervals and larger MSE than the two previous subsets. The confidence intervals for β (0.516, 0.636) and γ (0.223, 0.277) have a narrow range and enclose the true values of the parameters. The MSE for these two are larger compared to the previous subsets, though all MSE values are < 10 − 2 . The confidence interval for κ has a slightly larger range (0.440, 0.613), though this correlates with a small latent period difference of less than a day. Also, the MSE for κ is comparable to the other parameters. This indicates that all three parameters can be identified from daily incidence data of the epidemic curve with Poisson error structure.
Moreover, R 0 can be estimated precisely with unbiased results. Despite the larger confidence intervals for the other parameters estimated in Θ 3 (compared to Θ 1 , Θ 2 ), the range around R 0 is still very precise: (2.286, 2.317). Similarly, MSE for R 0 is < 10 − 4 for all runs. This indicates that the estimates of R 0 are robust to variation or bias in the other parameter estimateswe will continue to explore this theme in the proceeding models.
Results for Θ 4 and Θ 5 indicate that none of the parameters can be well-identified from case incidence data while simultaneously estimating > 3 parameters. For each, multiple parameters have MSE values > 10 − 2 (Fig. 8), and the confidence intervals are comparatively wide. Additionally, the confidence intervals for ρ (Θ 4 : (0.602, 0.858); Θ 5 : (0.608, 0.763)) do not include the true value of 0.60.
Looking at confidence intervals and MSE (Figs. 7 & 8) for R 0 , we find again that R 0 is identifiable across each Θ i . The confidence intervals for R 0 all have a range < 0.2, and the MSE values for each Θ i are < 10 − 2 . These R 0results are consistent with those in Model 1, despite the identifiability issues of other parameters seen here in Model 2. This is an important result, indicating that even when identifiability issues exist in other model parameters, we can still provide reliable estimates of R 0 without having to know the true values of the other parameters. It also shows that while noise in the data may affect parameter estimation for some parameters, composite parameters, like R 0 , can still be accurately calculated from the same data.
For Θ i where i > 4, none of the parameters can be identified from the model/data. Each parameter (for runs Θ 5 -Θ 7 ) has either a large confidence range and/or comparatively large MSE. Some parameters have MSE values < 10 − 2 (Fig. 10), but the wide range of uncertainty around these parameters is still indicative of non-identifiability (Fig. 9).
Remarkably, R 0 can be precisely estimated with unbiased results for parameter sets Θ 1 -Θ 4 (Figs. 9 & 10). When  simultaneously estimating five or more parameters, however, the associated uncertainty of all the parameters results in non-identifiability of R 0 . For Θ 5 , for example, R 0 estimates vary widely in the range (0.683, 2.821) with an MSE of 0.467. As previously mentioned, R 0 is a threshold parameter (epidemic threshold at R 0 = 1), so given the confidence interval including the critical value 1, we would not have the ability to distinguish between the potential for epidemic spread versus no outbreak.

Model 4: Zika model with human and mosquito populations
For this complex model, we find again that when estimating only 1 or 2 parameters (Θ 1 , Θ 2 ), the parameters can be recovered precisely with unbiased results (Figs. 11 & 12). When jointly estimating more than two parameters (Θ i : i > 2), non-identifiability issues arise. It can be seen that the confidence intervals and MSE for β and γ h1 are very small, and thus they are identifiable. However, all of the confidence intervals and MSE values for each of the other parameters (Θ i : i > 2) are representative of non-identifiability. The parameter estimates have a large amount of uncertainty, represented by the large confidence intervals, and are also biased estimates of the true value: MSE > 10 − 2 for all.
In terms of R 0 , we can see that this composite parameter of interest is identifiable for all Θ i (Figs. 11 & 12). Despite the large confidence intervals associated with some parameters (ex: Θ 6 -γ h2 : (0.047, 0.573)), when estimating more than two parameters, R 0 can still be estimated with low

Discussion
In this paper we have introduced a simple computational approach for assessing parameter identifiability in compartmental models comprised of systems of ordinary differential equations. We have demonstrated this approach through various examples of compartmental models of infectious disease transmission and control. Using simulated time series of the number of new infectious individuals, we analyzed the identifiability of model characterizing transmission and the natural history of the disease. This type of analysis based on simulated data provides a crucial step in infectious disease modeling, as inferences based on estimates of non-identifiable parameters can lead to incorrect or ineffective public health decisions. Parameter identifiability and uncertainty analyses are essential for assessing the stability of the parameter estimates. Hence, it is important for researchers to be mindful that a good fit to the data does not imply that parameter estimates can be reliably used to evaluate hypotheses regarding transmission mechanisms. Moreover, quantifying the uncertainty surrounding parameter estimates is key when making inferences that guide public health policies or interventions.
Our bootstrap-based approach is sufficiently general to assess identifiability for compartmental modeling  applications. We have shown that this method works well for models of varying levels of complexity, ranging from a simple SEIR model with only a few parameters (Model 1) to a complex, dual-population compartmental model with a total of 16 parameters (Model 4). Other methods exist to conduct parameter identifiability analyses. Some methods, such as Taylor series methods [15,16] and differential algebra-based methods [17,18], require more mathematical analyses, which becomes increasingly complicated as model complexity increases. Other methods rely on constructing the profile likelihood for each of the estimated parameters to assess local structural identifiability [11,14,31,32]. In this method, one of the parameters (θ i ) is fixed across a range of realistic values, and the other parameters are refit to the data using the likelihood function of θ i . Thus, identifiability of the parameters is determined by the shape of the resulting likelihood profile. Depending on the assumptions of the error structure in the data and as models become increasingly more complex, derivation of the likelihood profile and confidence intervals becomes increasingly more difficult.
Overall, our analyses indicate that parameter identifiability issues are more likely to arise with more complex models (based on number of equations/states and parameters). For example, a set of 3 parameters (Θ 3 ) can be estimated with low uncertainty and bias from a simple model, like Model 1; however, for more complex models (Model 3, Model 4), estimating only 3 parameters from a single curve of case incidence resulted in lack of identifiability for at least one of the parameters in the set (Θ 3 ). Also, for Θ i (recall: i represents number of parameters being jointly estimated), as i increases, the uncertainty surrounding estimated parameters tended to increase, on average, as well (Fig. 7). One strategy to resolve parameter identifiability issues consists of restricting the number of parameters being jointly estimated while fixing other parameter values and conducting sensitivity analyses.
Importantly, we found that R 0 is a robust composite parameter, even in the presence of identifiability issues affecting individual parameters in the model. In Model 4, despite large confidence intervals and larger MSE for the estimated parameters, R 0 estimates were contained in a finite confidence interval with little bias (Figs. 11 & 12). For example, for parameter set Θ 6 , only two of the estimated parameters could be reliably identified from the data, yet R 0 could be identified with little uncertainty or bias. These findings are in line with the identifiability results of R 0 for a vector-borne disease model (similar to Model 4), even when other model parameters could not be properly estimated [14]. R 0 is often a parameter of interest, as R 0 values have been related to the size or impact of an epidemic [1]. Moreover, R 0 estimates can be used to characterize initial transmission potential, assess the risk of an outbreak, and evaluate the impact of potential interventions, so it is beneficial to know we can reliably obtain R 0 estimates, despite lack of identifiability in other parameters.
It is important to emphasize that our methodology is helpful to uncover identifiability issues which could arise from 1) the lack of information in the data or 2) the structure of the model. We also note that our examples assess identifiability of parameters by relying on the entire curve of incidence data of a single epidemic. Future work could include identifiability analyses in the context of limited data using different sections of the trajectory of the outbreak. We also assume that only one model variable (state) is observed, so future analyses could incorporate more than one observed variable to potentially improve the identifiability of parameters without changing

Conclusions
For modeling studies, we recommend conducting comprehensive parameter identifiability analyses based on simulated data prior to attempting to fit the model to data. It is important to emphasize that lack of identifiability could be due to lack of information in the data or the structure of the model. The analyses also help guide the set of parameters in the model that can be jointly estimatedidentifiability issues may not arise until any given number of parameters are being simultaneously estimated. If the analysis indicates non-identifiability of certain parameters, may have to be assessed in sensitivity analyses (rather than estimated) to address the identifiability issue.
In summary, the ability to make sound public health decisions regarding an infectious disease outbreak is crucial for the general health and safety of a population. Knowledge of whether a parameter is identifiable from a given model and data is invaluable, as estimates of nonidentifiable parameters should not be used to inform public health decisions. Further, parameter estimates should be presented with quantified uncertainty. The methodology presented in this paper adds to the essential toolkit for conducting model-based inferences. The solid red line corresponds to the best-fit of the model to the data, and the dashed red lines correspond to the 95% confidence bands around the best fit. (TIF 5423 kb)