### Analysis 1: The effect of contact repetition depending on *τ*, *n* and *β*

As described in the methods section, *τ*, *n* and *β*·*n*·*τ* have been varied systematically to investigate the difference between the mean values of the outbreak sizes and under different parameter constellations. Figures 2a–c show three contour plots in which the difference between both model types is given for various *τ*, *n* and *β* values. Figure 2a gives depending on 4 ≤ *n* ≤ 20 and 2 ≤ *τ* ≤ 14 with a fixed *β*·*n*·*τ* = 1.6. The total outbreak size depends strongly on the number of contacts per day *n* but only slightly on the infectious period *τ*. In case of an infectious period between two and four days, there is a considerable change of with Δ*τ*; for 4 <*τ* ≤ 8, slight changes are observable; in case of infectious periods over eight days, the difference between both models depends mainly on *n*. Figure 2b gives depending on 4 ≤ *n* ≤ 20 and 1.2 ≤ *β*·*n*·*τ* ≤ 4.0 with a fixed *τ* = 14. It shows that the difference between both models depends strongly on both parameters, the number of daily contacts *n* and the transmission probability *β*. Differences are large for a small *n* or small *β* but negligible for a large *n* when *β* is large at the same time. Figure 2c, showing for 1.2 ≤ *β*·*n*·*τ* ≤ 4.0, 2 ≤ *τ* ≤ 14 and *n* = 4, is consistent with the observations made for the other two figures.

#### Effect of contact number

The increasing difference between and with decreasing *n* can be explained by two lines of reasoning.

First, in the case of contact repetition, there is always at least one out of the *n* contacts per day that is already infected (and thus not available for new infection): As contacts are stable over time, the infector of a susceptible individual is included in the subsequent contact list of that individual even when said individual has changed to the infectious state. Thus, at the least, the contact that originally transmitted the infection is not susceptible. In contrast, contacts change in every time step under the random mixing assumption: Hence, the infector is not more likely to appear in the contact set than any other individual. This difference between and is more pronounced for small *n* because one non-susceptible individual out of a small set of contacts means a relatively higher decrease in local resources than does one out of a large set of contacts.

Secondly, any new infection means that the infector will have one susceptible contact less for all subsequent time steps. This local depletion of resources is more pronounced for small *n* for the same reason as in the first argument. Further, stochasticity acts stronger in small local environments than in large ones [25].

Both effects can also be seen in the equation 1, which gives *R*_{0,rep}as a function of *R*_{0,ran}, *n* and *τ* (see also figure 3a; details for equation 1 are given in additional file 4):

In this equation the number of susceptible individuals in the local environment is reduced by 1 compared to the random mixing case, as we assume that every contact except the one that originally transmitted the infection is susceptible. This number of susceptible individuals (*n* - 1) is multiplied by the probability that such an individual becomes infected during the infectious period *τ*. As (*n* - 1) is smaller than *n* and [1 - (1 - *β*)^{τ}] is smaller (or equal for *τ* = 1) than *β*·*τ*, the expected number of secondary cases caused by an infectious individual in a population with a huge number of susceptible and few infected ones is always smaller in the repetitive case.

#### Effect of the per-contact transmission probability

The difference between and decreases rapidly with increasing *β*. The reason is that practically every individual will be reached and infected in case of large transmission probabilities, regardless of the underlying contact structure. Differences between both models may appear in the shape of the outbreak curve (cf. to additional files 2 and 3), but in terms of *I*_{
tot
}both models are equivalent. In case of small transmission probabilities, differences in the effective number of secondary cases generated by an infectious individual can become visible, as only a fraction of the whole population will be infected under both assumptions.

#### Effect of the infectious period

As expected, the difference between and increases with increasing *τ*. However, the change in difference is largest for Δ*τ* in a range of low *τ* values, but is almost irrelevant for high values of *τ*. This observation is explained by the *τ*-dependence of *R*_{0,rep}(equation 1, see also figure 3b): The longer the infectious period, the smaller the chances for a specific contact to remain uninfected. However, this increase in individual infection probability is partly compensated by a lower per-day transmission probability, which is needed to achieve constant *R*_{0,ran}. The interaction of these antagonistic effects results in a stabilization of *R*_{0,rep}/*R*_{0,ran}for a large *τ*.

### Analysis 2: The effect of contact repetition combined with clustering depending on *n* and *β*

The results presented previously show that depends mainly on *n* and *β*. In a second step, we investigate how the difference between model type 1 and 2 changes, if clustering is introduced in the latter. Figures 4a–d show the difference between both model types for clustering coefficients *CC* between 0.0 and 0.6 when *τ* is fixed to 14 days and when *n* and *β*·*n*·*τ* vary in the ranges mentioned above. As expected, clustering results in an increased difference between both model assumptions. This increase is most pronounced for small numbers of contacts per day. The peak of is constantly at *n* = 4 but shows a right shift on the *β*·*n*·*τ* axis for increasing *CC*.

The further dampening of disease spread by clustering can be explained by increased locality of resources: While repetition limits the number of available susceptible individuals by keeping previously infected ones in the set of contacts, clustering reduces the number of susceptible contacts because there is a higher likelihood that contacts of an infector have already become infected by others during the infectious period, as infections spread rapidly within cliques. The reason why this effect is more pronounced for small *n* rather than for large *n* is the same as in the case of unclustered, pure contact repetition: Any reduction of susceptible individuals in the set of contacts weights relatively stronger in the case of few contacts than in the case of many. The right shift of the peak of can be explained by the increased transmission probability *β* needed to pass the epidemic threshold under increased clustering compared to the constantly low levels of *β* necessary under the random mixing assumption [26].

### Analysis 3: Varying proportions of contact repetition, clustering and *β*

We simulated the difference between both model assumptions for all possible combinations of *n* = 8, 12, 16 and 20, *β*· *n*·*τ* = 1.2, 1.8, 2.4 and 3.0, *τ* = 14 and *CC* = 0.0, 0.2, 0.4 and 0.6. The simulation results are shown in figures 5a–p. The relation between the proportion of repetitive contacts per day and the average difference between this mixed model and a model assuming purely random mixing is approximately linear in the absence of clustering (for all tested cases, linear regressions between the proportion of repetitive contacts per day and the deviation of from the purely random mixing model achieve *R*^{2} > .98). However, the deviation from the random mixing model increases disproportionately with the fraction of repetitive contacts when clustering is introduced (cf. to figures 5b–d, f–h, j–l and 5n–p).

One mechanism driving this non-linear relation when clustering is present is the local depletion of resources. Repetitive contacts of an infector have a much higher chance of becoming infected than do non-repetitive contacts. Moreover, if these repetitive contacts are also highly clustered, it is likely that the disease will become trapped in those cohesive social subgroups. However, if only a few non-repetitive, non-clustered contacts are added per day, the chances of spreading the disease between otherwise unrelated regions of the social network greatly increase.

### Limitations

This paper systematically investigates a variety of epidemiologically relevant parameters needed to describe real-world transmission systems of diseases spread by droplet particles or direct physical contact. However, real-world social and biological processes involved in the transmission of infectious diseases are far more complex than captured by the archetypical model structures presented. Conceptual decisions and simplifications which could have potentially influenced the results are critically discussed in the following:

#### Model structure

We designed our two model types as SIR models, assuming that every individual is either susceptible, infectious or immune with respect to a certain disease. Transitions are only allowed from susceptible to infectious or from infectious to immune. The SIR structure is a fairly good representation for many diseases which lead to full immunity after recovery (e.g., measles). However, many diseases require other representations, as relevant intermediate states need to be covered, e.g., as with a long latency period in SEIR (Susceptible-Exposed-Infectious-Recovered) models. Another common deviation from the SIR structure arises, when recovery confers only partial or no immunity. In such cases, SIS (Susceptible-Infectious-Susceptible) representations are often chosen. In SIR or SEIR models, a total outbreak size can be defined (because the disease fades out at the end of an epidemic), whereas SIS models typically achieve an equilibrium *I*(*t*) in the long run, but the disease does not die out. Despite all the differences in model behaviour, we expect the rough picture to be the same for SIR, SEIR and SIS models, as the mechanisms behind the observed differences for SIR models that we discussed also apply to SIS and SEIR models. Thus, the general conclusions derived in this paper should also hold true for these model types.

#### Degree distribution

The number of daily contacts *n* is fixed and equal for the entire population in both modelling approaches presented. This is a reasonable simplification for the purpose of this paper, as it keeps the investigated number of interactions manageable. However, in real world systems, the number of daily contacts appears to follow a negative binomial distribution [12, 14] with some people having a relatively high number of contacts and others being almost isolated. It is known that the variance of the degree distribution impacts the spread of infectious disease, for instance, by decreasing the transmission probability needed to cause an epidemic [27]. Particularly relevant for the difference between random mixing models and models accounting for contact repetition and clustering are the correlations between the number of contacts per day and contact repetition and clustering, respectively. It is plausible to assume that individuals with many contacts tend to also have many unrepeated contacts, whereas individuals with few contacts tend to have disproportionately high levels of repetitive contacts. If the proportion of repetitive contacts and clustering is correlated with the number of contacts, individuals with few contacts are likely to be dead-end streets for infectious diseases. In contrast, highly connected individuals could be structurally more important than expected, as they bridge distinct cliques.

#### Occasional contact repetition

In our simulations, contacts repeat either daily or never. Intermediate states between both extremes of complete random mixing and complete contact repetition have been investigated by combining both models in defined proportions. However, in reality, specific persons can be met at any frequency between never and daily. It is plausible to assume that intermediate frequencies reduce the effect of repetitiveness depending on the duration of the infectious period *τ*: For short infectious periods, those with low contact frequencies might appear as unrepeated contacts whereas they unfold their full dampening potential for long infectious periods.

#### Contact intensity and duration

In our models all contacts between an infector and a susceptible individual are equally likely to result in the transmission of the infectious disease. This simplification is not a good representation of the real world: The transmission probability depends on the amount of infectious material ingested by a susceptible person [28, 29]. The uptake correlates with contact duration and intensity. Contact duration is long for highly repetitive contacts, while unrepeated contacts tend to have short duration (unpublished data). Accordingly, it can be expected that the interaction of clustering, contact repetitiveness and contact duration leads to a rapid infection of all closely tied clusters (primarily families, then workgroups and cliques at school and childcare institutions), leaving behind the people connected via mainly short, unclustered, occasional contacts.

#### Distribution of infectious period

The infectious period *τ* is fixed in our model, which contrasts to the design of classical mean-field models assuming exponentially distributed infectious periods [3, 22]. Keeling and Grenfell argue that *R*_{0} is smaller for exponential period models than for fixed period models under otherwise identical conditions, because individuals with a long *τ* rapidly exhaust the susceptible in their local neighbourhood and, therefore, cannot compensate for the large majority of individuals with extremely short infectious periods [25, 30]. However, the often assumed exponential distribution is highly unrealistic, as observed infectious periods tend to be closely centred around a mean period and are thus less dispersed [31]. Thus, assuming a fixed infectious period is a reasonable simplification of the reality that is not likely to have a major influence on as only very few individuals will use up their local susceptible resources during the infectious period in most cases. Moreover, if the infection probability is high enough to exploit almost the entire local environment (such that deviations of *τ* could affect the individual reproduction ratio), will reach the order of magnitude of the population size in either the fixed or the exponential case.

### Implications for some exemplar diseases

Information on the per-contact transmission rate *β* and the number of potentially contagious contacts *n* is often not easily accessible or available and has to be measured (or fitted) if included in models of disease spread. However, rough estimates of both variables can be obtained when *R*_{0} estimates are available and when the possible pathways of transmission are known, because *β* and *n* are linked to the basic reproduction number by *R*_{0,ran}= *β*·*n*·*τ* and the possible pathways reveal information on the possible number and structure of contacts at risk: At one extreme there is transmission via close physical contacts, which correlate mostly with intense social relations and are typically rare, repetitive and highly clustered. The other extreme is airborne transmission via tiny droplet nuclei that remain suspended indoors for a long time. In this case, vast numbers of persons can potentially be exposed, and such casual contacts are neither highly repetitive nor strongly clustered.

Table 1 provides information about the infectious period *τ*, *R*_{0} estimates and the possible pathways of transmission for a variety of infectious diseases. The implications of clustering and contact repetition for models of the diseases listed in this table are discussed below.

Typical childhood diseases like mumps, measles, pertussis (whopping cough) or chickenpox have comparatively high *R*_{0} estimates [3, 32–35], which means that one infector generates many secondary cases if a sufficient number of susceptible contact partners are available. These diseases are highly communicable – in fact, measles is one of the most highly communicable diseases in the world [36] – and thus, very short and non-intense contacts have the potential to confer infection. Accordingly, both the number of contacts per day *n* and the per-contact transmission probability *β* are very high. We further assume that a high proportion of the contacts are casual contacts, because the threshold for a contact to be potentially contagious is very low with respect to duration and intensity. Consequently, the levels of repetitiveness and clustering are low, which means that the contact patterns for such childhood diseases are structurally similar to random mixing. Considering that high numbers of daily contacts *n* make both types of models that we discussed behave similarly and considering that under high transmission probabilities *β* almost every individual will be reached, random mixing models achieve almost the same results as more elaborate models including a certain amount of contact repetition and clustering. Also in case of Norovirus, the difference is probably small, as the infectious period of this infectious agent is very short [37] and as at the same time the basic reproduction number is comparatively high [37] (because the disease is easily communicable [38, 39]).

On the other side, there are diseases with comparatively low *R*_{0} estimates and typically low numbers of contacts that still qualify for potential transmission. Methicillin-resistant *Staphylococcus aureus* (MRSA), for instance, is an infectious agent mostly transmitted in health care and nursing institutions. It needs close physical contact for transmission [40] and *R*_{0} estimates given in the literature are close to the epidemic threshold [41]. Accordingly, both *β* and *n* are low. At the same time, health care settings tend to be highly structured regarding who cares for whom and who shares a room with whom. Hence, high levels of contact repetitiveness and clustering can be assumed [24]. Modelling MRSA under the random mixing assumption is likely to overestimate the total number of cases for given *n*, *β* and *τ*. If, in contrast, a random mixing model is fitted to measured data from an outbreak, either the infectivity or the number of potentially infectious contacts will be underestimated to meet the measured outbreak size. A similar argumentation applies to Ebola, which is transmitted via direct contact with infected blood, secretions, organs or semen (thus, *n* is rather low) and seems to be only moderately infectious [42–45]. As a consequence, random mixing models of Ebola [46] are of limited validity.

Finally, there are some diseases not easily attributable to one or the other class. Severe Acute Respiratory Syndrome (SARS) and Influenza, for instance, have a range of *R*_{0} estimates between 1.43 and 3.7 [43, 47–50] and between 1.3 and 3.77 [17, 51–56], respectively. No definite consensus has been reached on whether Influenza is transmitted predominantly by large droplets and close contact or by very small droplets that disseminate quickly and stay suspended in indoor air for a long time [57]. In the latter case, a large amount of people would be at risk of infection, so random mixing would be a reasonable approximation of the real contact patterns. In the case of transmission by close contact and large droplets (that fall out quickly), the mean number of potentially contagious contacts per day lies between 8 and 18, depending on the national and cultural context [12]. Considering that not all contacts are equally likely to transmit influenza, but that long and intense contacts (such as household contacts [58]) are more prone to do so and that such contacts also tend to be more repetitive and clustered, it is likely that random mixing models also overestimate the outbreak size for given *n*, *β* and *τ*. However, problems will definitely arise when the impact of social distancing measures (decrease of *n*) or of antiviral treatment (decrease of *β*) are estimated under the random mixing assumption: Both interventions will be much more effective in a more elaborate model than in a random mixing model when *n*, *β* and *τ* are the same for both model types. This argumentation is consistent with recent findings on the impact of other network properties on influenza spread: Heterogeneity in degree distribution does not influence the outbreak size in case of highly contagious influenza strains, but does so for moderately contagious strains; however, it does influence the total outbreak size when interventions are simulated – even in case of highly contagious strains [4].