The present study consists of two major analytic steps, i.e. (i) a systematic review of literature and (ii) mathematical modelling of an outbreak data. As for the former, this systematic review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement [12].

### Modelling method

As a second part of the study, a mathematical model was employed to analyse an outbreak data. We focus on a school-driven outbreak record in Madagascar [

13] in which the individual data of the date of illness onset as well as the time periods of mass chemoprophylaxis and school closure have been reported. Because the daily incidence was given, we describe the epidemic dynamics by employing a discrete-time renewal equation model. Let the expected incidence (i.e. the number of new cases) on calendar day

*t* be

*c* _{t}, the linear temporal dynamics of the outbreak is described by

${c}_{t}={\displaystyle {\sum}_{s=1}^{\infty}{A}_{s}{c}_{t-s}}\text{,}$

(1)

where

*A* _{s} describes the rate of secondary transmission per single primary case at infection-age

*s* (i.e., the time since infection) [

14,

15]. The linear model is employed, because the outbreak occurred in a confined setting and it is unclear if the depletion of susceptible individuals played a role. Thus, as a default assumption, we assume that

*A* _{s} is decomposed as

${A}_{s}=R{g}_{s}\text{,}$

(2)

where

*R* is the reproduction number, representing the average number of secondary cases generated by a single primary case, and

*g* _{s} is the probability mass function of the generation time, i.e. the time from infection in a primary case to infection in the secondary case caused by the primary case (see [

16] for the details of discretisation). Based on a published statistical study [

16], the generation time is assumed to be known and is a discrete function that is derived from the continuous gamma distribution with the mean 2.70 days and the variance 1.21 days

^{2}. Thus the renewal equation (

1) is rewritten as

${c}_{t}=R{\displaystyle {\sum}_{s=1}^{\infty}{g}_{s}{c}_{t-s}}\text{,}$

(3)

Subsequently, we consider three realistic features of the data-generating process. First, the outbreak investigation study classified the cases into confirmed and non-confirmed cases [

13], denoted by

*c* _{1t} and

*c* _{0t}, respectively. While all confirmed cases received antiviral treatment upon diagnosis, non-confirmed cases were not subject to treatment. Supposing that the relative risk of secondary transmission among those with antiviral treatment compared to those without is expressed by a factor (1-

*ε* _{T}) (i.e.

*ε* _{T} is the effectiveness of antiviral treatment in reducing secondary transmissions), the renewal equation should be rewritten as

${c}_{1t}+{c}_{0t}=\left(1-{\epsilon}_{T}\right)R{\displaystyle {\sum}_{s=1}^{\infty}{g}_{s}{c}_{1t-s}}+R{\displaystyle {\sum}_{s=1}^{\infty}{g}_{s}{c}_{0t-s}}\text{,}$

(4)

Second, exposed individuals undertook mass chemoprophylaxis with oseltamivir, 75 mg once a day, for 10 days starting from 12 October 2009. Third, the entire school was closed from 16 October until 1 November 2009 to prevent further transmissions at the school. Let

*δ* _{P} and

*δ* _{S} represent the relative risks of secondary transmission under mass chemoprophylaxis and school closure, respectively. These are dealt with as delta functions, i.e.,

${\delta}_{P}=\{\begin{array}{l}1,\phantom{\rule{2.75em}{0ex}}\mathrm{\text{for}}\phantom{\rule{0.25em}{0ex}}t<{\tau}_{0}\phantom{\rule{0.25em}{0ex}}\mathrm{\text{and}}\phantom{\rule{0.25em}{0ex}}t>{\tau}_{1}\\ 1-{\epsilon}_{P},\phantom{\rule{1em}{0ex}}\mathrm{\text{for}}\phantom{\rule{0.25em}{0ex}}{\tau}_{0}\le t\le {\tau}_{1}\phantom{\rule{0.25em}{0ex}}\end{array}$

(5)

and

${\delta}_{S}=\{\begin{array}{l}1,\phantom{\rule{2.75em}{0ex}}\mathrm{\text{for}}\phantom{\rule{0.25em}{0ex}}t<{\upsilon}_{0}\phantom{\rule{0.25em}{0ex}}\mathrm{\text{and}}\phantom{\rule{0.25em}{0ex}}t>{\upsilon}_{1}\\ 1-{\epsilon}_{S},\phantom{\rule{1em}{0ex}}\mathrm{\text{for}}\phantom{\rule{0.25em}{0ex}}{\upsilon}_{0}\le t\le {\upsilon}_{1}\end{array}$

(6)

where

*τ* _{0} and

*τ* _{1} represent the first and last days of antiviral prophylaxis, respectively. Similarly,

*υ* _{0} and

*υ* _{1} represent the first and last days of school closure, respectively. Parameters

*ε* _{P} and

*ε* _{S} are the effectiveness of chemoprophylaxis and school closure, respectively. The renewal equation should be updated as

$\mathrm{E}\left({c}_{1t}+{c}_{0t};{H}_{t-1}\right)={\delta}_{P}{\delta}_{S}R\left[\left(1-{\epsilon}_{T}\right){\displaystyle {\sum}_{s=1}^{\infty}{g}_{s}{c}_{1t-s}}+{\displaystyle {\sum}_{s=1}^{\infty}{g}_{s}{c}_{0t-s}}\right]\text{,}$

(7)

in which we denote the history of the series of cases up to day

*t* by

*H* _{t}, and we use the conditional expectation on the left-hand side for the sake of later statistical inference. It should be noted that

*R* in (7) is interpreted as the reproduction number in the absence of countermeasures including the chemoprophylaxis, school closure and antiviral treatment. Since the equation (

7) describes only the linear dynamics, we also considered an alternative model that explicitly accounts for the depletion of susceptible individuals. The conditional expectation for the alternative model reads

$\mathrm{E}\left({c}_{1t}+{c}_{0t};{H}_{t-1}\right)={\delta}_{P}{\delta}_{S}R{s}_{t}\left[\left(1-{\epsilon}_{T}\right){\displaystyle {\sum}_{s=1}^{\infty}{g}_{s}{c}_{1t-s}}+{\displaystyle {\sum}_{s=1}^{\infty}{g}_{s}{c}_{0t-s}}\right]\text{,}$

(8)

where

*s* _{t} represents the fraction of susceptible individuals in the beginning of day

*t*, which was calculated using the observed data, i.e.,

${s}_{t}=1-\frac{{\displaystyle {\sum}_{s=0}^{t-1}{c}_{1s}+{c}_{0s}}}{{M}_{t-1}}\text{,}$

(9)

where *M* _{t} scales the total number of boarders in the beginning of day *t* (i.e. used for scaling the cumulative incidence).

Using the Madagascar data, we estimate four parameters, i.e.,

*R*,

*ε* _{T},

*ε* _{P} and

*ε* _{S} through a likelihood-based approach. We employ three conditional likelihood functions to estimate the parameters under two different scenarios (i.e. one scenario with the depletion of susceptibles and the other without). As the first likelihood, the infection process is assumed as sufficiently characterized by Poisson process ignoring individual heterogeneity [

16]. Given observed data from time 0 up to

*t* _{f} with the total daily number of new cases (i.e., the sum of confirmed and non-confirmed cases) denoted by

*N* _{t} up to day

*t*, the likelihood function is written as

$L\left(R,{\epsilon}_{T},{\epsilon}_{P},{\epsilon}_{S};{H}_{t-1}\right)={\displaystyle {\prod}_{t=1}^{{\mathrm{t}}_{\mathrm{f}}}\frac{\mathrm{\text{exp}}\left[-\mathrm{E}\left({c}_{1t}+{c}_{0t};{H}_{t-1}\right)\right]\mathrm{E}{\left({c}_{1t}+{c}_{0t};{H}_{t-1}\right)}^{{N}_{t}}}{{N}_{t}!}}\text{.}$

(10)

Second, an alternative likelihood function is to assume that the incidence is geometrically distributed, which is known to be the case for an exponentially distributed generation time [

17].

$L\left(R,{\epsilon}_{T},{\epsilon}_{P},{\epsilon}_{S};{H}_{t-1}\right)={\displaystyle {\prod}_{t=1}^{{\mathrm{t}}_{\mathrm{f}}}\frac{1}{\mathrm{E}\left({c}_{1t}+{c}_{0t};{H}_{t-1}\right)+1}{\left(1-\frac{1}{\mathrm{E}\left({c}_{1t}+{c}_{0t};{H}_{t-1}\right)+1}\right)}^{{N}_{t}}}\text{.}$

(11)

Third, if we incorporate a gamma distributed individual heterogeneity for describing the infection process [

16], the incidence follows a negative binomial distribution, i.e.,

$\begin{array}{l}L(R,{\epsilon}_{T},{\epsilon}_{P},{\epsilon}_{S},k;{H}_{t-1})\\ ={\displaystyle \prod _{t=1}^{{\mathrm{t}}_{\mathrm{f}}}\frac{\Gamma \left({N}_{t}+k\right)}{\Gamma \left(k\right){N}_{t}!}{\left(\frac{k}{k+\mathrm{E}\left({c}_{1t}+{c}_{0t};{H}_{t-1}\right)}\right)}^{k}}{\left(\frac{\mathrm{E}\left({c}_{1t}+{c}_{0t};{H}_{t-1}\right)}{k+\mathrm{E}\left({c}_{1t}+{c}_{0t};{H}_{t-1}\right)}\right)}^{{N}_{t}}\text{,}\end{array}$

(12)

where *k* is the dispersion parameter that was estimated jointly with other parameters. Both Poisson and geometric distributions are the special cases of this negative binomial distribution with *k* → ∞ and *k* → 0, respectively. Maximum likelihood estimates of the parameters were obtained by minimizing the negative logarithm of the likelihood function (10), (11) or (12). The 95% confidence interval (CI) was computed by using the profile likelihood. To compare model fits, we employed the Akaike’s Information Criterion.