The self-organization model reveals systematic characteristics of aging

Background Aging is a fundamental biological process, where key bio-markers interact with each other and synergistically regulate the aging process. Thus aging dysfunction will induce many disorders. Finding aging markers and re-constructing networks based on multi-omics data (i.e. methylation, transcriptional and so on) are informative to study the aging process. However, optimizing the model to predict aging have not been performed systemically, although it is critical to identify potential molecular mechanism of aging related diseases. Methods This paper aims to model the aging self-organization system using a series of supervised learning methods, and study complex molecular mechanisms of aging at system level: i.e. optimizing the aging network; summarizing interactions between aging markers; accumulating patterns of aging markers within module; finding order-parameters in the aging self-organization system. Results In this work, the normal aging process is modeled based on multi-omics profiles across different tissues. In addition, the computational pipeline aims to model aging self-organizing systems and study the relationship between aging and related diseases (i.e. cancers), thus provide useful indicators of aging related diseases and could help to improve prediction abilities of diagnostics. Conclusions The aging process could be studied thoroughly by modelling the self-organization system, where key functions and the crosstalk between aging and cancers were identified.


Introduction
Aging is a complex process regulated by key biomarkers, reflecting disorders / declined abilities of tissues [1]. Dysfunction of aging has been shown to be related to many diseases, such as diabetes, Parkinson disease [2], Alzheimer's disease [3] and cancers [4]. As a result, finding aging markers is critical to study aging related diseases and identify healthy genomic diagnostics (i.e. by predicting the chronological age (group) based on molecular profiles). For example, multi-tissue predictors of age have been calculated by DNA methylation [5] or mRNA expression profiles [6]; and there are more age predictors based on single tissue (e.g. brain [7], breast [8], ans so on), which also provide insights on aging related diseases [6] (i.e. Alzheimer's disease and cancers).
Further, the aging markers interact with each other [9], and synergistically coordinate the aging process, herein generating the self-organization system [10] of aging, where particular bio-markers regulate the aging process in different age groups, respectively. Although tissues become disordered reflected by a functional decline during aging in general (often evaluated by increased entropies [11]), a series of aging markers perform particular / ordered functions in special aging stages / age groups. In addition, these markers / genes interact with each other, coordinating regulation of the aging process; therefore, the interactions / modules between such markers also provide critical patterns of aging. In summary, finding bio-markers to predict the chronological ages, summarizing interactions between aging markers, and optimizing the aging self-organization system based on molecular profiles (i.e. methylation, expression and so on) of normal tissues from healthy persons, could help predict future health risks at system level. However, these works have not been solved entirely for the aging process.
In this work, we modeled the aging self-organization system using a series of computational methods: filtering inter-connection networks between different age groups by the maximum mutual information and minimum redundancy criterion in the information theory; summarizing interactions between bio-markers by the convolution technology; calculating patterns by accumulating weighted genes within the same module; selecting module scales by the hierarchical clustering method and cross validation; identifying order parameters in the aging self-organization system by network sparsification.
The prediction results show high classification accuracy between different age groups; moreover, the enrichment analysis and network analysis also found key functions of the order parameters. Thus critical complex characteristics (i.e. hierarchies, emergencies and bifurcations) were identified in different aging stages. Aging acceleration patterns were also identified across cancers. In short, the aging process can be thorough studied by modelling the self-organization system.

Results and discussion
A brief description of the aging self-organization system In the aging self-organization system, genes interacted with each other, and synergistically coordinated the aging process. Therefore, the aging process should also be evaluated by interactions between aging markers. The aging markers clustered nearby would drive similar function [12], and could be summarized within the same module. Each module takes a particular part during aging, and regulates the aging process altogether. Different levels / hierarchies of aging markers or modules reflected special complexities of the self-organization system, herein reflecting particular patterns in different aging stages. As a result, the aging self-organization system would identifies and emphasizes important characteristics that no single isolated marker or module would be able to achieve. In summary, modules based on aging markers in coordination determined the bifurcations and displayed critical differential patterns between age groups, where key aging markers within modules could be identified as the order parameters in the aging selforganization system. Further, the system from pathological samples show deviation from the normal aging self-organization system (from healthy persons): for example, the aging system with disease (i.e. cancers) should display significant acceleration compared tothe normal aging process. Accordingly, the following parts of this paper presents results of modelling the aging self-organization system as well as the computational pipeline was shown in Fig. 1. Fig. 1 Overview of the aging self-organization system. a the computational pipeline of modelling the aging self-organization system; b an example of the convolution of interactions between methylation cg27583030 (SLC25A4) and other genes in the model of age group 50-70 vs. 70-survival Classification results of the aging self-organization system The expression and methylation profiles were used to test the classification abilities of the aging self-organization system between different age groups, respectively (Figs. 2 and 3). Tables 1, 2 and 3 shows that classification results based on the self-organization system have lower error rates than traditional feature selection methods (i.e. the relieff-mRMR pipeline [13]), using both methylation and expression data between age groups. As a result, the selforganization system reduced the feature dimensions effectively and extracted critical modules by finding orderparameters, and identifying key differences based on aging markers / interactions in the aging process at system level. The results of the order-parameters were shown in Table S1-S2. The methylation order parameter with the maximum relieff weight was the convolution interactions of cg27583030 (SLC25A4, weight = 0.1974, shown in Fig. 1b) in the model of age group 50-70 vs. 70-survival. Common SLC25A4 related pathway were apoptosis and survival regulation of apoptosis by mitochondrial protein, reflecting the relationship between aging and cellular apoptosis [14]. The expression order parameters with the largest relieff weight was original profile of NRBF2 (weight = 0.2493) between age group 50-70 vs. 70sruvival. NRBF2 played a role in cellular survival and neural progenitor cell survival during differentiation [15], and dysfunction of NRBF2 also affected the aging process. Further, enrichment analyses of the order parameters in each module were performed on Biological Process (BP) terms of Gene Ontology (GO) and KEGG pathways using the hypergeometric test (Tables 4 and 5 and S1-S2). The most significant BP term was the negative regulation of viral process (GO:0048525, fdr = 0.0005) in the 245th module of the model between age group 20-50 vs. 50-70 based on the methylation profiles, reflecting the relationship between the immunity system and aging [16]; and the most significant KEGG pathway was Phenylalanine metabolism (fdr = 0.0009) based on the methylation profiles in the model between age group 0-50 vs. 50-survival, indicating the critical metabolism during aging. In addition, the annotation of orderparameters also reveal key functions across different aging stages, i.e. BP terms were enriched in aging related diseases in the early stage of aging, and enriched in tissue dysfunction in the later stage. It is perhaps indicative of functional decline of the immunity system induced aging / tissue dysfunction.      (Tables 6 and 7). For example, BP terms of organ morphogenesis were enriched during both young (0-20 vs. 20-50) and old (50-70 vs. 70-survival) age groups, reflecting the basal role of tissues affected by the aging process. Moreover, cancer and related signaling KEGG pathways were enriched across different aging stages based on both methylation and expression profiles (Fig. 4). These results reflected the cross-talk between aging and cancer, where dysfunction of aging might indicate diseases / cancerization of tissues.

Complexity characteristics of the aging self-organization system
In the self-organization system, molecules interacted with each other, and synergistically regulate particular process (i.e. aging). For example, summarized interactions of methylation profiles of cg27583030 (SLC25A4) was with the maximum relieff value (Fig. 1b). Moreover, a series of order parameters were with summarized interactions other than the ordinary profiles, indicating functions of cross-talks between key markers during aging (Table 1).
In addition, any single isolated marker / module could not reflect the whole bifurcation between age groups effectively, but combinations of modules could. It should be that cross-talks across modules generated the aging self-organization system with key differential patterns / bifurcations between age groups. Thus the hierarchies across modules promoted emergencies of the aging selforganization system, which were not displayed by any single module or order parameter. Therefore, the interaction across modules indicated the hierarchies / emergence of the aging self-organization system.
The corresponding networks across modules (using the maximum mutual information minimum redundancy criterion) were investigated to find core modules in the hierarchies of the self-organization system ( Figure  S1). Based on the methylation profiles, the 492th module were with the maximum interaction score (mean value = 0.0189), connecting other 2682 modules (in the model of age group: 0-20 vs. 20-50). This module acted as a "hub" and connected 45 key BP functions (i.e. negative regulation of cellular senescence, regulation of attachment of spindle microtubules to kinetochore and negative regulation of potassium ion transmembrane transporter activity) and 60 KEGG pathways (i.e. Maturity onset diabetes of the young and Pentose and glucuronate interconversions), reflecting the crosstalk between immunity and key metabolic pathway during aging.  Based on the expression profiles, the 1799th module were with the maximum interaction score (mean value = 0.0762), connecting other 2387 modules (in the model of age group: 20-50 vs. 50-70). The module connected 46 key BP functions (i.e. fatty acid beta oxidation using acyl coa oxidase, cell aggregation and cytokine production) and 51 KEGG pathways (i.e. Retinol metabolism, alpha linolenic acid metabolism and Pyruvate metabolism), revealed basal metabolism pathways during aging. In short, the corresponding networks reflected correlations of important parts / functions in the aging selforganization system, such as the immunity system, cancer related pathways, and so on. It might be the crosstalk between the immunity system and cancers that cause of emergence of critical themes in the aging process.
Therefore, the bifurcations (differential patterns) between age groups were investigated, where order parameters with convoluted interactions were summarized by each relieff weight, indicating order-disorder patterns of interactions between aging order-parameters from low (near the ordered / similar pattern with low entropic values of the aging system) to high (near disordered / different patterns with high entropic values) values, or vise verse. As a result, significant differential patterns were found between age groups using both methylation and expression profiles (Fig. 5). In the early aging stage (0-20 vs. 20-50), the aging self-organization systems were with significantly ordered patterns based on both methylation and expression profiles. As aging was regulated by special markers / order-parameters, the self-organization system showed ordered patterns in the early aging stage.  However, the self-organization systems were with disordered patterns in the middle (20-50 vs. 50-70) or later (50-70 vs. 70-survival) stage based on expression or methylation profiles, respectively. These results indicated tissues were with declined function / disordered patterns during aging. In short, the aging process was driven by both special markers / order parameters (with ordered patterns) and tissue declined function (with disordered patterns), perhaps determined by the former in the early aging stage, and by the latter in the later aging stage.

Aging acceleration of cancers
To study the crosstalk between aging and cancer, the aging associated accelerations were investigated in cancer samples (from the TCGA platform). The order parameters were extracted using the cancer profiles based on each self-organization model, and the module patterns in each model were summarized. Then the aging score were calculated by the module patterns (adjacent normal samples were as the training data, cancer samples were as the test data, and the 0-1 SVM regression was used as the predictor). Strikingly, the results showed that the scores in cancer samples were significantly higher compared to adjacent normal samples, based on both methylation and expression profiles (Fig. 6). These results were consistent with previous results, which might reflect the protection of the cancer tissues [5,17]. The correlation between somatic mutation and aging acceleration was also investigated, but none of SNPs with significant (p-value< 0.05 and fdr < 0.2, using the non-parameter Kruskal-Wallis Test [18]) aging acceleration were identified. The results were perhaps because of inadequate samples of paired profiles, aging acceleration tissues in cancer with fewer somatic mutations, or the complexity of aging [5]. Moreover, it has been found that there was negative correlation between age acceleration and number of somatic mutations [5]. Therefore, our work also found the negative correlations in most types of cancers ( Figure S2 and S3). However, only a few cancers were with significant correlation (e.g. THCA, shown in Figure S2l and S3l).
Further, the 11 types of cancer profiles were clustered based on the mean value of aging acceleration patterns using mean acceleration ratios based on simplified methylation and expression models. As a result (Fig. 6c and S4), 7 cancers were identified as one aging acceleration pattern, including BLCA, COAD, ESCA, HNSC, KIRC, KIRP and PRAD; and other 4 cancers were identified as another aging acceleration pattern (BRCA, LIHC, LUAD and THCA). In addition, 49 significant modules were identified based on both differential methylation and expression profiles, where the top differential module (the 926th expression module) connected 50 key BP terms (i.e. cellular localization, skeletal muscle adaptation and GABAergic neuron differentiation) and 51 KEGG pathways (Long-term potentiation and Retinol metabolism), indicating key functions of the aging process of the neuron system. The aging acceleration patterns also revealed basal characteristics across cancers.

Discussion
The aging process is regulated by a series of key markers. The aging markers interact with each other, and performed their functions in the specific aging networks. As a result, identifying modules clustered by these markers during aging was more informative to research the aging process compared to finding isolated markers.
In other words, differential interactions (evaluated by the mutual information) in proper networks were also powerful to study key gene regulations in biological processes (i.e. aging and related diseases) [19]. Both the network markers and gene markers were integrated based on the background networks to identify critical markers in the aging process thereby. Further, these aging markers were extracted as crucial features of aging, where the separation margin of SVM was optimized / converged to discriminate different age groups. As a result, the SVM hyper plane based on extracted network markers (both methylation and expression profiles) had enough efficiency to classify age groups and explorer critical mechanisms of the aging process, compared with other classfiers (Table S3 and S4).
In this paper, we presented a computational pipeline to model the aging self-organization system and select the order parameters using a series of supervised learning technologies. The discrimination results showed that our prediction ability had more accuracy than traditional gene selection / classification methods.
Tissues suffer declined functions during aging. However, the aging markers interact with each other, and synergistically regulate the aging process. Therefore, the aging process is affected by both ordered and disordered factors. In this work, the complexity characteristics across modules were also found to show critical patterns in different age groups.
In the immunity theories of aging [20], tissues exhibit progressively functional declines during aging. The functional analysis found that the order parameters were enriched in particular BP terms / KEGG pathways in different age groups, where immunity dysfunction and cancer related pathways indicated the common theme of aging. Therefore, our results showed the aging process could be predictive by modelling the self-organization system, and indicated crosstalk among aging, immunity and cancers.
Further, the cancer profiles also identified the aging acceleration of cancer samples with aging scores based on the self-organization system being statistically significant. Both methylation and expression profiles found the cancer samples showed aging acceleration compared to normal samples. The results indicated the protective roles of aging in cancers [5]. Moreover, different aging acceleration pattern could also discriminate cancer types.

Conclusion
In summary, we presented the self-organization model of the aging process based on both methylation and expression profiles in this work, where both ordered and disordered critical patterns were identified in different aging stages. Biological features of the order parameters indicated dysfunction of the immunity system and other common properties during aging (i.e. cancers). Thus the aging acceleration also revealed the relationship between aging and cancers. In conclusion, the aging selforganization system described here is informative to both aging and aging related diseases.

Data and pre-processing
We obtained methylation and expression profiles from MuTHER study [21] and GEO database (https://www. ncbi.nlm.nih.gov/geo/) with the chronological age (Table  S5, S6, S7 and S8), respectively. Only profiles in normal tissues of healthy persons were considered for modelling the aging self-organization systems (samples from normal tissues from persons with cancer and disease status sample / blood, e.g. traumatic blood from healthy persons were discarded). As a result, there were 2226 samples of Gene expression data from 37 datasets, and 4428 samples methylation data from 35 datasets were selected to model the aging self-organization systems, respectively.
For each methylation / expression dataset, the data was treated by a Singular Value Decomposition (SVD) method [16,22] (regress the first 3 principle components) to assess the sources of inter-sample variation separately in each tissue, and then were normalized to have zero mean and unit variance. Finally the profiles were discretized using two thresholds mean+/−std. If the data came from different platform (e.g. GPL96 / GPL97) even in the same GEO Series, or came from different region of brain (e.g. hippocampus, Posterior cingulate region and so on), the data were treated as independent dataset.
The age groups were partitioned as: (0, 20], (2050), [50, 70) and [70, survival). The choice of age groups was guided by the following criterion: first, the partition of methylation and expression data should be accordance for further integration; second, the human methylation "age acceleration" is significant before age of 20 [5]; third, the sample imbalances between age groups need to be small. Data from different age nearby were used to model the aging self-organization system (0-20 vs. 20-50; 20-50 vs. 50-70; 50-70 vs. 70-sruvival) based on methylation and expression profiles, respectively. Further, simplified classification models were also constructed to discriminate "young" (0-50) and "old" (50survival) age group [6] based on expression and methylation profiles, respectively.
We also downloaded paired methylation, expression, somatic mutation profiles and clinical data (both cancer and adjacent normal tissue) from the TCGA platform (through the xena website: https://xenabrowser.net/hub/ ) to further analyze aging related genomic alterations (totally 333 paired samples were obtained).
The computational pipeline of modelling aging selforganization systems Step 1, filtering the aging background network by maximum mutual information minimum redundancy criterion Aging is a gradual process with biological functional decline / disorder. The degree of disorder is often evaluated by entropy. The aging process is usually accompanied by increasing entropy; however, in the biological non-closed molecular system, particular bio-markers perform special function with ordered patterns in the aging process. Therefore, key order-disorder transitions (or vice versa) accounts towards important changes between age group in the aging process.
In this work, the mutual information between different age groups was used to evaluate relevance between genes (i.e. methylation or expression profiles), and the mutual information between genes (from all training samples / age groups) was used to evaluate gene redundancy. In addition, the background interaction system / network of the aging process satisfied maximum mutual information and minimum redundancy criterion: where I (i,j) group indicates the mutual information between genes within the same age group, evaluating the changed dispersion of gene interaction between age groups by the absolute difference, and redundancy (i,j) indicates the redundancy between genes. As a result, the mutual information between age groups was filtered by the redundancy in the background network as the preliminary gene interaction system of the aging process.
Step 2, summarizing gene interactions by the convolution technology Each "edge" in the aging background network indicating the changed dispersion of gene interaction. As a result, the entire interaction of a gene could be calculated by convoluting all the edges in the background network (i.e. Fig. 1b), where the absolute differences of mutual information between age groups were used as the convolution kernel.
where gene i and gene j was the profile of i-th and j-th gene, respectively; and interaction i; j ð Þ ¼ I j sign gene i ð Þ− sign gene j À Á j; age group À Á where I(x,y) was the mutual information between x and y.
Step 3, calculating the entire pattern by accumulating genes within module Highly interconnected genes in the network are usually involved in the same biological functions. In this work, genes in the same module were accumulated, where the weights were calculated by the relieff algorithm. Either gene original profiles or convoluted interactions were accumulated was determined by their relieff weights (subtracting the minimum value of the relieff weights). As a result, each module could be a feature in classification of different age groups.
Step 4, determining module size by clustering method and cross-validation The size of module (how many genes within the module) was determined using a hierarchical clustering method, where correlation of genes was evaluated by mutual information in the background network between different age groups in the aging process. The clustering degree / times was determined by (5-fold) cross validation.
Step 5, identifying order parameters by network sparsification In this work, only a small ratio of interactions were convoluted in step 2, sorted by the mutual information; and only genes with top relieff values were accumulated in the module in step 2. sqrt(n) interactions / genes were selected as the order parameters of the aging process using the network sparsification method, where n was the total number interacted with each gene / within the module, respectively. In this work, the 0-1 SVM classifier (with the linear kernel) was used to discriminate the age groups. The value of Box Constraint was 1, and the hyperplane was optimized by imposing a penalty on the length of the margin for every observation that is on the wrong side of its class boundary.

Enrichment analysis
Enrichment analyses were carried out to gain significantly biological functions. GO Biological Processes (BP) terms of Gene Ontology (GO) and KEGG pathways were downloaded from Gene Set Enrichment Analysis (GSEA) platform (version 6.1) [23].
The hypergeometric test [24] was performed to estimate the enrichment of these selected genes compared to known GO terms or pathways. Finally, the selected significant enrichment p-values were controlled by False Discovery Rate [25]. The thresholds were set as p-value< 0.05 and FDR < 0.2.
To evaluate annotated functions across, enriched BP terms / KEGG pathways were calculated by summarizing values of 1-fdr, where fdr < 0.2 was set as the threshold.