 Review
 Open Access
 Published:
Review and application of group theory to molecular systems biology
Theoretical Biology and Medical Modelling volume 8, Article number: 21 (2011)
Abstract
In this paper we provide a review of selected mathematical ideas that can help us better understand the boundary between living and nonliving systems. We focus on group theory and abstract algebra applied to molecular systems biology. Throughout this paper we briefly describe possible open problems. In connection with the genetic code we propose that it may be possible to use perturbation theory to explore the adjacent possibilities in the 64dimensional spacetime manifold of the evolving genome.
With regards to algebraic graph theory, there are several minor open problems we discuss. In relation to network dynamics and groupoid formalism we suggest that the network graph might not be the main focus for understanding the phenotype but rather the phase space of the network dynamics. We show a simple case of a C_{6} network and its phase space network. We envision that the molecular network of a cell is actually a complex network of hypercycles and feedback circuits that could be better represented in a higherdimensional space. We conjecture that targeting nodes in the molecular network that have key roles in the phase space, as revealed by analysis of the automorphism decomposition, might be a better way to drug discovery and treatment of cancer.
1. Introduction
In 1944 Erwin Schrödinger published a series of lectures in What is Life?[1]. This small book was a major inspiration for a generation of physicists to enter microbiology and biochemistry, with the goal of attempting to define life by means of physics and chemistry. Though an enormous amount of work has been done, our understanding of "Life Itself" [2] is still incomplete. For example, the standard way in which biology textbooks list the necessary characteristics of lifein order to delineate it from nonliving matterincludes metabolism, selfmaintenance, duplication involving genetic material and evolution by natural selection. This largely descriptive approach does not address the real complexity of organisms, the dynamical character of ecological systems, or the question of how the phenotype emerges from the genotype (e.g., for disease processes [3]).
The universe can be viewed as a large Riemannian resonator in which evolution takes place through energy dispersal processes and entropy reduction. Life can be thought of as some of the machinery the universe uses to diminish energy gradients [4]. This evolution consists of a stepbystep symmetry breaking process, in which the energy density difference relative to the surrounding is diminished. When the universe was formed via the Big Bang 13.7 billion years ago, a series of spontaneous symmetrybreaking events took place, which evolved the uniform quantum vacuum into the heterogenous structure we observe today. In fact the quantum fluctuations of the early universe got blown up to cosmological scales, through a process known as cosmic inflation, and the remnants of these quantum fluctuations can be observed directly in the variation of the cosmic microwave background radiation in different directions. At each stage along the evolution of the universefrom quantum gravity, to fundamental particles, atoms, the first stars, galaxies, planetsthere was a further breaking of symmetry. These cosmological, stellar, and atomic particle abstractions can be powerfully expressed in terms of group theory [5].
It also turns out that the very foundation of all of modern physics is based on group theory. There are four fundamental interactions (or forces) in Nature: strong (responsible for the stability of nuclei despite the repulsion of the positively charged protons), weak (manifested in betadecay), electromagnetic and gravitational. The first three are described by quantum theories: an SU(3) gauge group for the quarks, and an SU(2) × U(1) theory for the unified electroweak interactions [6–8]. From these theories one can derive, for example, Maxwell's theory of electromagnetism, which is the basis of contemporary electrical engineering and photonics, including laser action. Group theory provides a framework for constructing analogies or models from abstractions, and for the manipulation of those abstractions to design new systems, make new predictions and propose new hypotheses.
The motivation of this paper is to examine an alternative set of mathematical abstractions applied to biology, and in particular systems biology. Symmetry and symmetry breaking play a prominent role in developmental biology, from bilaterians to radially symmetric organisms. Brooks [9], Woese [10] and Cohen [11] have all called for deeper analyses of life by applying new mathematical abstractions to biology. The aim of this paper is not so much to address the hard question raised by Schrödinger, but rather to enlarge the set of mathematical techniques potentially applicable to integrating the massive amounts of data available in the postgenomic era, and indirectly contribute to addressing the hard question. Here we will focus on questions of molecular systems biology using mathematical techniques in the domain of abstract algebra which heretofore have been largely overlooked by researchers. The paper will encompass a review of the literature and also offer some new work. We begin with an introduction to group theory, then review applications to the genetic code, and the cell cycle. The last section explores ideas expanding group theory into contemporary molecular systems biology.
2. Introduction to Group Theory
Group theory is a branch of abstract algebra developed to study and manipulate abstract concepts involving symmetry [12]. Before defining group theory in more specific terms, it will help to start with an example of one such abstract concept, a rotation group.
Given a flat square card in real 3dimensional space (ℜ3space), we can rotate it π radians, i.e., 180 degrees, around the X, Y and Z axes; let us represent these rotations by (r_{1}, r_{2}, r_{3}) (see Figure 1). We will also assume a donothing operation represented by e. If we rotate our card by r_{1} followed by an r_{2} rotation, then we get the equivalent of doing only an r_{3} rotation. We can thus fill out a Cayley table (also called "multiplication" table, though the operation is not ordinary multiplication). Table 1 shows the full Cayley table for our card rotations in ℜ^{3}space.
The symmetry about the diagonal in the Cayley table tells us that the group is abelian: when the rotations are performed in pairs, they are commutative, so that r_{m} r_{n} = r_{n} r_{m}.
These four group operations can be written in matrix form as well:
Now we are in position to state the formal definition of a group G: it is a nonempty set with a binary operation (denoted here by *) which satisfies the following three conditions:

1.
Associativity: for all a,b,c ∈ G, (a *b) * c = a * (b* c).

2.
Identity: There is an identity element e ∈ G, such that a *e = e* a = a for all a ∈ G.

3.
Inverse: For any a ∈ G there is an element b ∈ G such that a*b = b* a = e.
Depending on the number of elements in the set G, we talk about finite groups and infinite groups. Finite simple groups have been classified; this classification being one of the greatest achievements of 20^{th} century's mathematics. Finite groups also have widespread applications in science, ranging from crystal structures to molecular orbitals, and as detailed below, in systems biology. Among the finite groups the most notable ones are S_{ n } and Z_{ n } , where n is a positive integer. The symmetric group S_{ n } as a set is the collection of permutations of a set of n elements, and has order, i.e., number of elements n!. It turns out that any finite group is the subgroup of a symmetric group for some n. The cyclic group Z_{ n } is a subgroup of S_{ n } consisting of cyclic permutations. Z_{ n } has two other presentations:

1.
Rotations by multiples of 2π/n.

2.
The group of integers module n.
These will be discussed later.
Infinite groups are harder to study, but those that have additional structurelike the structure of a topological space or of a manifoldwhere this additional structure is compatible with the group structure, have also been classified. Of particular interest are the Lie groups, which are simultaneously groups and topological spaces, and the group multiplication and inverse operation are both continuous functions. Lie groups are completely classified, many of them arising as matrix groups. The matrix representation allows us to use conventional matrix algebra to manipulate the group objects, but does not play any special role. In fact any group, finite or infinite, is isomorphic to a subgroup of matrix groups. This is the realm of group representation theory.
The orthogonal groups O(n) (where n is an integer) are made from real orthogonal n by n matrices, i.e., those n × n matrices O for which
The special orthogonal group SO(n) consists of those orthogonal matrices whose determinant is +1, and they form a subgroup of the orthogonal group: SO(n) ⊂ O(n). Geometrically, the special orthogonal group SO(n) is the group of rotations in n dimensional Euclidian space, while the orthogonal group O(n) in addition contains the reflections as well.
Similarly, the unitary matrices, U(n)
form a group (where H means complex conjugation of each matrix element together with transposition). Special unitary matrices, SU(n), satisfy the additional det(U) = +1 constraint, and also form groups.
Finally, we mention the "symplectic" or Sp(2n) groups, but given the fact that these are harder to define, we will not give a formal definition here. As will be shown later, these matrix groups are used in describing the "condensation" of the genetic code.
Another important definition which we will encounter later involves groupoids. A groupoid is more general than a group, and consists of a pair (G,μ), where G is a set of elements, for example, the set of integers Z, and μ is a binary operationagain usually referred to as "multiplication," but not to be confused with arithmetic multiplicationhowever, the binary operation μ is not defined for every pair in G. We will see that groupoids are useful in describing networks, and thus transcriptome and interactome networks.
3. The Genetic Code
In this section we review some work describing the genetic code in groupoid and group theory terms. One could easily imagine genetic codes based only on RNA or protein, or combinations thereof [13]. When the genetic code "condensed" from the "universe of possibilities" there were many potential symmetrybreaking events.
A codon could be represented as an element in the direct product of three identical sets, S 1 = S 2 = S 3 = {U, C, A, G}:
The triple cross product has 4^{3} = 64 possible triplets. As is known, the full threeway product table contains redundancies in the code. This was all worked out in the '60s, without group theory, using empirical knowledge of the molecular structure of the bases [14].
A simple approach to describe the genetic code involves symmetries of the codedoublets. Danckwerts and Neubert [15] used the Klein group; an abelian group with 4 elements, isomorphic to the symmetries of a nonsquare rectangle in 2space. The objective is to describe the symmetries of the codedoublets using the Klein group. We can partition the set of dinucleotides into two subsets:
The doublets in M_{1} would match with a third base for a triplet that has no influence on the coded amino acid. The doublets in M_{1} are associated with the degenerate triplets. Those in M_{2} do not code for amino acids without knowledge of the third base in the triplet. Introducing the doublet exchange operators (e,α,β,γ ) we can perform the following base exchanges:
where the exchange logic is given as follows: α exchanges purine bases with noncomplementary pyrimidine bases, β exchanges complementary bases which can undergo hydrogen bond changes, and γ exchanges purine with another purine and pyrimidine with another pyrimidine, and is a composition of α with β. The operator e is our identity operator. The Cayley table for the Klein group is shown in Table 2. The table has the exact form as the rotation table in Table 1 and so they are said to be isomorphic with each other.
Bertman and Jungck [16] extended this Klein representation to a Cartesian group product (K 4 × K 4), which resulted in a fourdimensional hypercube, known as a tesseract. The corners of the cube are pairs of operators from the Klein group and genetic code for doublets, shown in Figure 2.
The corners of this hypercube form two octets of dinucleotides, the two sets M_{1} and M_{2} . The vertices of each octet lie at the planes of a continuously connected region. One such region M_{1} is shown in the shading of Figure 2. The octets are neither subgroups nor cosets of a subgroup. They are both unchanged under the operations (e, e) and (β,e). These two octets can also be interchanged by acting on one of them with (α,α) and/or (γ,α).
In general, not much can be stated about the product of two groups. If A and B are subgroups of K, then the product may or may not be a subgroup of K. Nonetheless, the product of two sets may be very important and leads to the concept of cosets. Let K be the Klein group K ={e,α,β,γ} and take the subgroup H = {e,β}, then the set αH = {αe,αβ} = {α,γ} is known as a left coset. Since K is abelian, the right coset Hα = {eα,βα} = {α,γ} and we find αH = Hα. The following are the four cosets of the (K 4 × K 4) genetic exchange operators:
Here, we have written the corresponding dinucleotide next to the operator in the format (e,e):AA, etc.; the bar over some dinucleotides indicates membership in a different octet of completely degenerate codons, while the other dinucleotides are ambiguous codons.
The (K 4 × K 4), 4dimensional hypercube representation in Figure 2 suggests that the 64 elements in the genetic code, the triplets, could be represented by a 64dimensional hypercube and the symmetry operations in that space would be the codons. Naturally we can form the triple product
to arrive at a 64dimensional hypercube as the general genetic code. But of course multiple vertices of this hypercube code for the same amino acid. This is said to be a surjective map, because more than one nucleotide triplet codes for the same amino acid. In 1982 Findley et al. [17] describe further symmetry breakdown of the group D, and show various isomorphic subgroups including the Klein group and describe alternative coding schemes in this hyperspace.
Above we described the genetic code with respect to inherent symmetries. In 1985 Findley et al. [18] suggested that the 64dimensional hyperspace, D, may be considered as an information space; if one includes time (evolution), then we have a 65dimensional informationspacetime manifold. The existing genetic code evolved on this differentiable manifold, M [X]. Evolutionary trajectories in this space are postulated to be geodesics in the informationspacetime. It should be possible to use statistical methods to compute distances between species (polynucleotide trajectories) by using a metric, say the Euclidean metric:
and from a phylogenetic tree to recreate trajectories in this space. It should be possible to thus see regions of the informationspacetime that have not been explored by evolution. One may speculate on the codetrajectory by bringing in Stuart Kauffman's theory on the adjacent possible [19–21] by a perturbation theory. Further, the curves on this manifold should map, in a complex way, to the symmetry breaking described below, or bifurcation, and thus give a second route to the differential geometry of Findley et al. [18].
Another approach to understanding the evolution of the genetic code is based on analogies with particle physics and symmetry breaking from higherdimensional space. Hornos and Hornos [22] and Forger et al. [23] use group theory to describe the evolution of the genetic code from a higherdimensional space. Technically, they propose a dynamical system algebra or Lie algebra [24]the Lie algebra is a structure carried by the tangent space at the identity element of a Lie group. Starting with the sp(6) Lie algebra, shown in Figure 3, the following chain of symmetry breaking will result in the existing genetic code with its redundancies:
The initial sp(6) symmetry breaks into 6 subspaces sp(4) and su(2). Sp(4) then splits into su(2) ⊗ su(2) while the second su(2) factors into u(1). Details are given in Hornos and Hornos [22] and Forger et al. [23] on how this maps to the existing genetic code.
4. Cell Cycle and MultiNucleated Cells
Cell cycle is an example of a natural application of group theory because of the cyclic symmetry governing the process. The steps in the cell cycle include G1 → S → G2 →M, and back to G1. In some cases G0 is essentially so brief as to be nonexistent so we will ignore that state.
To cast the cell cycle into group theory terms recall the definition of a group we gave earlier [25]. The only reasonable approach for casting the cell cycle into group theory is to use the symmetries of a square. Table 3 shows the group table for the cell cycle. It is Abelian and isomorphic to the cyclic group Z_{4}. Writing the rotation operations for the cell cycle as permutations we get:
where for example R_{90} can be expressed as the mapping:
The cell cycle group table suggests exploring the group operations of some actual physical manipulation of cells. Rao and Johnson [26] and Johnson and Rao [27] conducted experiments on transferring nuclei from one cell into another to produce cells with multiple nuclei. An interesting question they addressed was what effects would a G2 nucleus have when transplanted into a cell whose nucleus was in the S phase? Figure 4 shows an example of a multinucleated cell from one of their cell fusion experiments. These experiments were designed to address larger questions about chromosome condensation and the regulation of DNA synthesis.
Some of the nuclei were prelabeled with ^{3}Hthymidine to enhance visibility. Details of the experiments and the results can be found in the original papers. Here we examine, by means of a group table, the converged state for these binucleated cells. Naturally it takes some time for the "reactions" (or not) to take place and for the cell to settle to some stable attractor. In some cases more than one nucleus was added to a cell in another state. For example two G1 nuclei were added to a cell in the S phase. Rao and Johnson [26] and Johnson and Rao [27] recorded the speed to convergence. The group table in Table 4 shows the converged cell state. For example, if a G2 nucleus was added to a cell in G1, there was essentially no change. These are just rough observations; given enough time, all cells will converge to state M, the strongest attractor in the dynamics of the cell cycle. To show that this follows actual group definitions we need to show associativity and find an identity and inverse element, or, alternatively, to show an isomorphism with a known group.
The table shows that the group is Abelianthat commutativity always holds: a ◦ b = b ◦ a for all a, b ∈ G, where G is the group. We can also show associativity, a ◦ (b ◦ c) = (a ◦ b) ◦ c; for example:
and
On the other hand, it is clear from the multiplication table the we cannot have a group structure on the set {G1, G2, S, M}. Namely, in a group G any row or column of the multiplication table will contain the elements of G precisely once, hence will be a permutation of elements. This property fails for the rows of S and M. Furthermore, the product of G1 and G2 is undefined. Nevertheless, the set {G1, G2, S, M} carries the structure of a groupoidwhich is discussed below.
Similar considerations apply if we fuse cells of different type, or differentiation state. These types of experiments were carried out for different stem cells, as reviewed in Hanna [28]. Another fusiontype experiment involves nuclear transfer from one type of somatic cell to another, and determining the identity of the outcome. A variant of this is to transfer RNA populations between cells, and observe the change in the cell's phenotype [29].
5. Algebraic Graph Theory: Graph Morphisms
Network graph theory is increasingly being used as the primary analysis tool for systems biology [30, 31], and graphs, like the yeast proteinprotein interaction (PPI) network shown in Figure 5, are becoming increasingly important. Two excellent references on network theory and network statistics are Newman et al. [32] and Albert and Barabasi [33]. Godsil and Royle [34] and Chung [35] are good references that go beyond the statistical analysis of network graphs and explore mappings from graph to graph, or morphisms and homomorphisms.
With modern datasets it is possible to begin exploring molecular systems dynamics on a network level by using morphism concepts and algebraic graph theory. For example, using these datasets we may be able to impute missing connections in PPI networks, or build vectormatrixbased models representing the dynamics of changing PPI networks. In other cases we may be able to prove algebraic graph theory concepts using the PPIdata. Our focus here will be to continue exploring the cell cycle by including transcription data and proteinprotein interaction data from highthroughput screenings. We will first review a few algebraic graph theorems. Godsil and Royle [34] will be our primary reference for algebraic graph theory.
Mathematically a network is a graph G = G(V, E) of a set of n vertices {V} (also called nodes), and a set of e edges {E}, or links. Graphs can be represented using the adjacency matrix A. The adjacency matrix of a finite graph on n vertices is the n × n matrix where the nondiagonal ij th entry A_{ ij } is the number of edges from vertex i to vertex j, while the diagonal entry A_{ ii } , depending on the convention, is either once or twice the number of edges (loops) from vertex i to itself.
The eigenvalues of this matrix, λ_{ i } , can be computed to produce the spectrum which is an ordered list of the eigenvalues λ_{1},λ_{2},...,λ_{ n } . This spectrum has many mathematical properties representative of the network graph, though two graphs may have identical spectra. The adjacency matrix however has other useful properties including the following:
Where tr(A) represents the trace of the matrix, n is the number of edges, and t represents the number of triangles in the graph. An excellent review of spectral graph theory is given by Chung [35].
Another important matrix is the incidence matrix, which has some very useful properties. The incidence matrix B(G) of a graph G, is a matrix having one row for each vertex and a column for each edge, with nonzero elements for those nodeedge pairs for which the node is an endnode of the edge. This matrix is therefore not square. An interesting property is that if we let G be a graph with n vertices, c_{0} its bipartite connected components, and B the incidence matrix of G, then its rank is given by rk(B) n  c_{0}.
Another observation concerning the incidence matrix involves the line graph of G, L(G). The edges of G are the nodes of L(G), and we connect two vertices with an edge if and only if the corresponding edges of G share an endpoint. An example is shown in Figure 6. A theorem proved by Godsil and Royle [34] shows a relation between the adjacency matrix of L(G) and the incidence matrix of G: B^{T}B = 2I  A(L(G)). These simple matrix manipulations allow one to compute potentially new metrics on some complex molecular networks, such as the PPI network in Figure 5.
The concept of automorphism of a graph is an important one, and as we will see it has applicability to subgraphs within more complex graphs. Automorphisms of a graph are permutations of the vertices that preserve the adjacency of the graph, i.e., if (u, v) is an edge, and P is the graph automorphism, then (P_{ u } , P_{ v } ) is also an edge. As a result, an automorphism maps a vertex of valence m to a vertex of valence m. Whole graph automorphisms applied to asymmetric graphs, similar to the yeast PPI network shown in Figure 5, detect core symmetric regions.
The automorphisms of a graph forms a group, Aut(G). The main question to ask is, what is the size of this automorphism group, represented as Aut(G)? This provides a measure of the overall network symmetry. Typically, as described by MacArthur and Anderson [36] and Xaio et al. [37], this is normalized for comparing networks of different sizes (N is the number of nodes):
MacArthur et al. [38] suggest, and show, that it is possible to decompose, or factor, a large network graph. The NAUTY algorithm [39] they use produces a set known as the automorphism group. The Human Bcell genetic interaction network, for example, can be factored into the terms [40]. The order of this group is computed as
This results from the fact that the order of the cyclic group C_{ n } is nsince there are 36 of them we take the 36^{th} powerthe order of the symmetric group S_{ n } is n!. Given that the network contained 5930 vertices (and 64,645 edges), we have
As a second example MacArthur et al. [38] use data from BioGRID for the S. cerevisiae interactome (with 5295 nodes) and obtain the following automorphism group and its order:
β_{ G } in this case is 1.02693.
As we will see later, this may be applicable to molecular interactomes. A full molecular interactome (not just PPI) is a directed graph, and describes an underlying dynamical system in terms of ordinary differential equations: dx_{ i } /dt = f_{ i } (A,x_{ j } ) where x_{ i } is the state of molecular species i, and A is the full interactome adjacency matrix, an asymmetric matrix. Golubitsky and Stewart [41] point out that the symmetry groups determine the dynamics of the network. When the symmetry changes in one or more factors of the automorphism group, because of a protein mutation or misfolding, for example, this will affect the overall symmetry and thus the dynamics. A catalog of the automorphism groups for interactomes is thus a list of the dynamic behaviors allowed. It might be possible to map these automorphism group elements to disease states. Incidentally, a neural network technique to perform automorphism partitioning is described in Jain and Wysotzki [42].
Another approach to study the dynamics of interactomes exploits a concept known as the Laplacian of the graph [34]. Interactomes are composed of treegraphs and spanning trees. (The high number of small symmetry subgroups, e.g., , in the automorphism group also indicates this tree topology.) Let σ represent an arbitrary orientation of a graph G, and let B be the incidence matrix of G^{σ} , then the Laplacian of G is Q(G) = BB^{T} . The Laplacian matrix plays a central role in Kirchhoff's matrix tree theorem, which tells us that the number of spanning trees in a G can also be calculated from the eigenvalues of Q: if G has n vertices, and (λ_{1}= 0, λ_{2},..., λ_{ n } ) are the eigenvalues of the Laplacian of G, then the number of spanning trees is given by:
A proof for this theorem is given for example in Godsil and Royle [34].
We can use this theorem to examine the effects of removing a vertex. If we let e = uv be an edge of G, then the graph G\e is obtained by deleting the edge e from G. The existing PPI network is an extreme case in which a set of unknown edges E and unknown vertices V have been removed from the actual interactome to give us the observed graph P = G \(E,V).
It would be interesting to see how far these deletion theorems can be extended as one approaches graphs with current density. One should be able to test these new theorems empirically with real world data from a manufacturing plant, say an integrated circuit fab. One could start with the full manufacturome and begin deleting edges or vertices and evaluating the theorems observing the effects on the automorphism groups. We know the full interactome should be a directed graph. With the manufacturome, which is of course a directed graph, it should be possible to evaluate and extend other algebraic graph theorems to directed and undirected graphs.
The last set of theorems we will introduce on algebraic graph theory involves the embedding space or representation of a graph. These theorems are discussed in Godsil and Royle [34]. A representation ρ of a graph G in ℜ ^{m} is a map ρ : V(G) → ℜ ^{m} . As an example, a graph with  V(G) = 8 and in which each vertex has a valance of 3 can be represented as a cube in 3space. The center of gravity of the mspace object is considered to be the origin for vectors pointing to the vertices. In the case of this example graph, we get:
We say that the mapping is balanced if
where ρ(u) represents the mapping vectors. We can create a matrix, R, of these vectors. The mapping is optimally balanced if and only if, 1 ^{T} R = 0. Usually this will not be the case, especially for complex interactomes and manufacturomes. If the column vectors of R are not linearly independent, the image of G is contained in a proper subspace of ℜ ^{m} . In this case the mapping ρ is just some lower dimensional representation embedded in ℜ ^{m} . The energy of this embedding is found from a Euclidean length:
This suggests it may be possible to asymptotically approach an optimal embedding for nteractomes by a gradient descent algorithm to minimize the energy of the embedding.
A number of questions then arise such as: What is the biological, or evolutionary, significance of the embedding space? How does it relate to the automorphism group and the actual molecular network dynamics? Are patterns noticeable for disease trajectories in this higherdimensional space, or even simple cell cycle trajectories in this space? Are there routes from differentiated cells to pluripotent states? Are there noticeable automorphism group differences between normal cells and polyploidy cells? Is there an isomorphism between the automorphism group and the motifs of Alon [43], and an isomorphism between the order of the automorphism group  Aut(G)  and the average degree distribution < k > or other network statistics? These are all open research questions and some methods described below may be applicable to efforts aimed at answering these questions.
6. Network Dynamics and the Groupoid Formalism
In the above section we described group theory formalism applied to graphs. Here we step up in symmetry, and describe another algebraic object, groupoids; this will allow us to bring more dynamics into the study [41, 44, 45]. Obviously this has importance for understanding the dynamics of molecular interactome networks.
Recall that a directed graph encodes the dynamics given by dx_{ i } /dt = f_{ i } (A,x_{ j } ) where x_{ i } is the state of molecular species i, and A_{ ij } is the full interactome adjacency matrix. More precisely the automorphism group of the network implicitly encodes the dynamics. Further, we know that interactomelike network graphs are composed of multiple copies of a few basic components, e.g. . Groupoids are algebraic objects that resemble groups but the conventional group operation is undefined. In other words, we recognize symmetry but automorphisms are nontrivial. This formalism will allow us to apply grouptheory methods to network graphs. Most of this will be focused on small subnets within the larger interactome, where we observe permutationtype automorphisms.
The notion of groupoid is most transparent if we approach it from a categorical angle [46]. The standard definition of a category C involves a collection of objects, A, B,..., and a set of morphisms (which could be the empty set) for each pair of objects; Hom(A, B) for objects A and B. The composition of morphisms is defined and is associative, and there is an identity element in each Hom(A, A), therefore Hom(A, A) is never empty.
But a category C can be viewed as an algebraic structure in itself, endowed with a binary operation, making it similar to a group or semigroup. We call this associated algebraic structure G(C). The "elements" (since the collection of objects do not necessarily form a set) of G(C) are the morphism of C, and the "product" is the composition, which is an associative partial binary operation with identity elements. If C has only one object, then any two morphisms can be composed, and we have only one identity element. The axioms of a category guarantee that G(C) is a semigroup. Furthermore, if we insist on the invertibility of each morphism in C, then G(C) is a group.
Now it is natural to extend the notion of a group by requiring that the objects of C form a set, i.e., C is a small category, and also ask that each morphism of C is invertible. This is the categorical definition of a groupoid. It is easy to translate this definition into the algebraic language, and get a notion similar to the definition of a group [47]. But perhaps it is the categorical definition that illuminates the power of groupoids. Namely, while groups are ideally suited to describe the symmetries of an object, groupoids can similarly capture the symmetries of collections of objects. This is perfectly illustrated in modern algebraic geometry, when one tries to form classifying space, known as moduli spaces, but the algebraic varieties one wants to classify (say elliptic curves) have different symmetries. This problem is solved using the language of stacks and groupoids [48]. The necessity for the same powerful generalization arises in string theory, where symmetries of the physical theory cannot be mathematically realized in terms of topological spaces and groups, only in terms of stacks and groupoids [49].
In the groupoid approach we will examine not the symmetry of the small subnetworks and motifs, but rather the dynamics of these small networks, when they are directed graphs, and in particular when these small nets are wired together to make larger networks (circuits). The symmetries we will observe are not the network symmetry but the symmetries in the phase space or the space of the dynamics.
The interactome, and indeed the full chemical reaction network comprising a cell is a complicated network with numerous feedback loops and feedforward circuits. Its dynamics is no doubt complicated and the details of the full network are only now being elucidated; but we can begin to speculate on some of the possible dynamics by exploiting work from a slightly more mature fieldneuronal networks.
We know that biological neuronal nets comprise two and threedimensional arrays of frequencycontrolled oscillators, voltage controlled oscillators, and logic gates. Engineers have constructed random and nonrandom networks of these components and discovered not only that the network is capable of memory storage in the form of dynamic patterns and limit cycles (for example memorizing a Bach minuet) but initially random pulse patterns coursing through the network will, after a time delay for component integration, entrain other components and produce continual limit cycles. In large arrays of these networks the limit cycles interact with each other to produce emergent dynamics. In the following we draw on work of Hasslacher and Tilden [50], Rietman et al. [51] and Rietman and Hillis [52]. We argue that by analogy similar dynamics would occur in molecular interaction network of the cell.
Figure 7 shows a schematic of the cell cycle. As described above, the cyclic group Z_{4} is a simple description of the cell cycle, but we can improve the description to incorporate the observation that G1 and G2 are metastable in the same cell. This multinucleated state, analogously, could correspond to cancer cells and/or polyploid cells in which we fused the two nuclei. These are also stable, or at least metastable, cell states, and as will be shown below the number of stable states is not huge.
We can let one node in this 4cycle be represented by the following transfer function in which we include a bias term and its associated Gaussian noise, θ + ε_{ θ }
where x is the input signal, and β is a gain and can be negative or positive and include noise. The noise is centered about the signal mean and the noise magnitude is set to about one standard deviation of the signal mean. Soft sigmoids have the property of acting like analog signals, not digital. Further, with more than one input feeding into the same node, we sum the product of the incoming signals and their strengths. The transfer function equation now becomes:
Using these dynamics a fournode ring, for example, can exhibit the following three states: (0000), (0101), (0001). Here we employ a permutationlike notation, where, for example (0001) → (0010) → (0100) → (1000) are equivalent to (0001). (Known as the 1 equivalent class, where the underscore is to remind us that this is not a number but a groupoid.) Interestingly, the three states shown here are isomorphic to the nucleifusion group: (0000) → dead cell; (0001) → normal healthy cell; (0101) → G1/G2 (equivalence class 5).
One can debate whether or not this is a good model of the cell cycle, but feedforward nets of similar central pattern generators (CPGs) are able to rapidly adapt to changing external stimuli to maintain some entrainment or global stability [53], and from a molecular perspective this is exactly what is required of biological cells. The molecular network in living cells consists of a highly complex interconnected feedback and feedforward chemical reaction system. Walhout and colleagues [54, 55] and others [56, 57] have been discerning some of these details. They have found that feedback and ring circuits, often with inhibitory connections, are common in transcription regulatory networks (proteinDNA interaction networks). One could envision that the basic cell cycle is the primary limit cycle in the dynamics of the cell and the transcription regulator dynamics are used to control and simultaneously be controlled by the cell cycle.
In addition, these ring circuits are able to operate in more than one stable state, exactly as we would need for complex molecular networks of living cells. A 6node ring circuit can exhibit 5 states; 8nodes can exhibit 7 states; 10 nodes, 16 states; 12 nodes, 32 states; 14 nodes, 64 states; and 16 nodes 128 states. The increase in states follows a 2ary necklace function.
where d_{ i } are the divisors of n with d_{1} = 1, d_{v(n)}= n ; v(n) is the number of divisors of n; ϕ(n) is the totient function, and F(.) is the Fibonacci sequence (where F_{n} = (F_{n1}) + (F_{n2}) [58]). The totient function, also called the Euler totient function, is the number of positive integers less than n which are relatively prime to n[51].
Consequently, even small rings of only a dozen nodes can maintain a large number of stable states. Coupling these motifs into networks can produce overall global stability. As Golubitsky and Stewart [41, 44] point outand as is apparent in the large network of Figure 5the overall network has very low global symmetry.
To give more details consider the 6node ring with only one bit active (000001) as a hexagon with one circle filled, as shown in Figure 8. If the active bit is traveling in the counterclockwise direction we can represent the transitioning bit string as follows:
After sixrotations, r, the ring dynamics is in the same configuration as when we started. (This is said to be a sixcycle in the terminology of dynamical systems.) Symbolically we can represent this as:
where the numbers are a decimal representation of the bit string; they are underlined to remind us that these are group symbols, and are not to be manipulated as numbers. This string of elements interspersed with a rotation operation represents the elements for the group and the main operation. We represent this group by where the superscript reminds us that the group is for sixnode rings and the subscript is the lowest decimal equivalent of the bit string in this group.
The group describes only one of the possible cyclic groups within the 6node ring circuit. Since there are four stable oscillatory states in the 6node circuit, there are four groups in total. The full set of all the groups is given as:
The above set of mappings shows cyclic permutations from rotation operations on the individual states s represented as decimal equivalent. The and groups are said to be of order 6. The group is of third order and the group is of second order. The similarities between group theory and conventional dynamics are now obvious. The two 6 order groups are 6cycles. The one thirdorder group is a threecycle and the secondorder group is a twocycle.
The rotation operator (applied once) for each group is different
As the number of rotations needed to return to the starting state decreases for a given group, the periodicity increasese.g. a twocycle is faster than a 6cycle. Similarly, as the number of rotations needed to return to the starting state decreases, the order of the group decreases and the symmetry increases. As we point out later, a symmetry phase transition occurs during signal input and ring coupling.
We can compare this group with conventional cyclic groups. The cyclic group C_{6} consists of the decimal numbers {0, 1, 2, 3, 4, 5} and the operation
where ρ is the operator that adds two elements in the group a, b and then applies the modulus operation. The identity element of C_{6} is 0, and the inverse of each element a is b = 6  a.
The C_{6} group table is shown in Table 5. The first row in the table lists the elements of the group. The first column lists the elements of the group, written in the same order as the elements in the first row. The actual arrangements of the elements in the first row/column are not important. The first row is a the first column is the element b, for the operator ρ The elements in the table are generated by the operator, just like a multiplication table.
The index p of a cyclic group C_{ p } is given by
where k  p means k divides p; φ(k) is the totient function (as discussed above) and Z is the set of integers.
The group table for the group is given in Table 6. Similar to the group table the elements are written across the first row and first column. Recall the underline is to remind us that these are symbols not numbers. We define the group operation ⊗ according to the following mapping:
This maps the CPG group to the first nonnegative integers in the cyclic group C_{6} .
By the defined mapping we have established an isomorphism between these two groups
The other isomorphisms that exist for the g^{6} set of groups are
There are four subgroups in , and there are four subgroups in C_{6} :{(0),(0,3),(0,2,4),(0,1,2,3,4,5)}.
In order to use these ideas with concepts such as signal input (known as sensor fusion in the control community) and network (ring) coupling to make larger networks, we need to define operators, Φ _{r} , that transform one group into another group. Let the subscript on the operator represent the number of rotations when the signal is injected. Then we can write all of the allowed operations on the groups and their results.
To consider sensor input and/or coupling to two or more of these dynamic rings we consider the example . This equation says that when the CPG circuit has one, 1 cycling through the ring and if a pulse of duration equal to the time constant of the nodes is injected at rotation 2 (subscript to operator), this will be the equivalent of initializing the ring circuit with (000101) or decimal 5. Hence, the circuit is transformed to the group. Explicitly this would be written as (000100) + (000001) → (000101).
As another example consider . This relationship says that when a pulse of two time constants is injected at rotation positions 2 and 3 into a 6node circuit with a signal already at position 0 (always the assumed initial state), the circuit pulse pattern will transform to . Explicitly this would be written as (000001) + (000110) → (0001001). The other equations are:
These equations are interpreted as follows. From Figure 8 we see a ring in state (000001). If we inject a pulse of short duration (i.e., less then the response time of the logic gates with the associated components) into that ring at position 0 while in that configuration, it will have no effect, . If injected into position 1 while the ring is in this (000001) state it will force the system to transition to state (000101), following the operation . If we inject a short pulse into the network at position 2 it will also transition to (000101) . On the other hand, a short pulse injected at position 3 will cause the ring circuit to exhibit the stable state (001001), according to the operation . In this case, the subscript on the operator indicates the node distance from node 0 in state 1 (000001), while the superscript and subscript on the symbol remind us that the ring is a 6node ring and it is in state (001001). These transition rules apply for either injected pulses, such as from external sensors, or for internal pulses, such as from rings coupled to make larger networks. The number of states the ring can sustain is still dictated by the ring size as given by the above 2ary necklace function.
The significance of this approach is that it describes a global dynamics and entrainment, i.e., a largescale molecular network dynamics and environmental response, via the dynamics of local internal networks in the interactome. Our concern here is not the symmetry of the interactome, but rather the symmetry of the local and global dynamics. As an example, Figure 9 shows the attractor diagram for the circuit shown in Figure 8. This is a schematic of the dynamics exhibited by the "interactome," the simple feedback circuit of Figure 8. From a group automorphism perspective we can factor the graph in Table 5 to C_{2} × S_{2}, far different from the C_{6} network that gives rise to the dynamics shown in Figure 9. This provides an entirely different description of the interactome in terms of the dynamics, rather than in terms of the molecular connectivity. Exploration of this approach to systems biology is an open research issue.
7. Cellular Dynamics Models via Graph Morphisms
Our interpretation of the proteinprotein interaction (PPI) network, shown in Figure 5, needs to be considered carefully. The first problem it represents is a biophysical interaction of two proteins as observed in yeast 2hybrid experiments [59, 60]. These biophysical interactions do not necessarily occur in the actual organism. Second, the PPI networks for most organisms represent only about 10% of the actual possible proteinprotein interactions. Third, it is a static network, or time invariant, which is an almost meaningless concept for life forms. We also know that to include the catalytic set for selfreplication, the full interactome should include small molecules, large biopolymers, DNA, RNA, oligobiopolymers, etc.
Given these caveats, we will now proceed to parse the PPI in time. We can do this by conducting a relational join between transcription data and PPI data. We start with the expression data as a function of time. Several expression data sets exist; here we mention only the more recent ones by Pramila et al. [61] and Granovskaia et al. [62]. Both of these teams conducted experiments collecting transcription microarray data at fiveminute intervals for the yeast S. cerevisiae. The Pramila data (accession number GSE4987) was from cDNAspotted arrays, and therefore consists of data in the range (2,2), where zero represents not expressed, below zero represents down regulated, and above zero, up regulated, respectively. The Granovskaia data consist of Affymetrix RNA data (PN 520055) and the numerical data are in the range (3,2), where data above zero are considered expressed and those below zero not expressed, while the discrimination between up regulated and down regulated is not provided.
The Granovskaia data set is described in their technical paper [62]. They distribute, via links, both the full set of expression data for 6378 gene IDs and a parsed set consisting of 588 genes associated with the cell cycle which clearly show oscillations. Here we ask: What are the largescale proteinprotein interactome changes as a function of time during the cell cycle?
Before we address this question, we note that both the Pramila et al. and the Granovskaia papers show heat maps for the major several hundred genes expressed during the cell cycle. These heat maps show periodic structure and represent periodicity in the transcriptome. Lastly, a paper by de Lichtenbert et al. [63] examines the yeast cell cycle with particular emphasis on parsing the proteome into molecular machines during the cell cycle. Our method differs, as our emphasis is on graph morphisms.
The state of the cell at any given point in time is given by the function x(t). As pointed out above, the transcriptome and the proteinprotein interactome (see Figure 5) can be combined to give us a view of the proteins and their connectivity as a function of time, based on the fact that the transcriptome codes for the proteome. Figure 10 shows this mapping relation.
This figure shows for the first time some of the interactome details as a function of time. Each graph represents the changes in the interactome, as represented by the transcriptome, during the indicated time period. Each time period is a 5minute segment. The red nodes represent those proteins whose expression has disappeared in the time period, and the blue nodes represent those proteins whose expression has appeared in that time period. In a later publication we will be analyzing these graphs in more detail, along with the graph statistical metrics and the automorphism group.
We close this section with a derivation of the matrix A_{ ij } mapping from timepoint to timepointessentially our graph morphism. This matrix can be found by inducing it from the transcriptome data. Recall the transcriptome data x(t) represents the state change from timepoint to timepoint. We can use a neural network to induce the matrix A_{ ij } (which is actually two matrices, A 1 and A 2) as follows. The mapping is given by
where • represents the product of a matrix with a vector. Here we are using the hyperbolic tangent function, a wellbehaved sigmoidal function often used in neural network mappings [64, 65]. While these two matrices can be found by the socalled delta rule [65], essentially a gradient descent algorithm [64], we will instead use an extended algorithm cited by Vapnik [66] among others [67]. The cost function for the error minimization is:
where A represents the norm of the sum of the two matrices and γ is a Lagrange multiplier called the regularization coefficient. The first term on the RHS is the least mean square of the difference between the target, T, and the learning machine response, R. This regularization technique effectively forces the values for the adjustable parameters in the nonlinear fit, the weight matrices, to very small numerical values, often near zero. Their magnitude is proportional to the regularization coefficient.
An intuitive argument for this regularization may be found in the analogy of fitting 40 data points to a 6000order polynomial in 2space. With 6000 adjustable parameters and using a conventional polynomialfitting algorithm, the plot of the function with the fitted data points would show wild oscillations in the function, with every data point perfectly intercepted by the function. If we fit the same 40 data points to a thirdorder polynomial, we would find that many of the points were not intercepted by the curve, and there would be an error associated with the fitting. But comparing interpolation on this thirdorder polynomial and the 6000dimensional polynomial, we find that our interpolated errors are much lower and the interpolation is more reliable. Now if we again fit our 40 data points to a 6000dimensional polynomial, but we also force the magnitude of any of the coefficients to be very small, the net effect will resemble a loworder polynomial. There will be an error associated with the fitting, and a much lower error associated with the interpolation. The regularization algorithm does much the same thing; it forces the magnitude of the weight terms to be small, even very small [66, 67].
Using the Granovskaia et al. [62] dataset and the 587 genes they identified as relevant to cell cycle, we first made the naïve assumption that the state of the cell, as represented by the transcription data at time t would be the same as at time t + 1, with this assumption the mean square error was 0.26. We next carried out the neural network analysis with yeast cell cycle data. We used leaveoneout cross validation to produce the final results. The average mean square error (MSE) from all outputs (all genes) across all time points (41 time points at 5 minute intervals) was 0.0459 (± 0.0835). Figure 11 shows the MSE per gene and the MSE per time interval for prediction from the learning machine. Table 7 shows a table listing all the cell cycle genes with an error > 2 times the standard deviation, 0.1670.
Figure 12 shows a heat map plot of expression values for the cell cycle genes as a function of time. The large errors shown in Figure 11 for some of the genes can be explained as expression noise. As shown in the heat map, as time increases the phase in the expression begins to disperse. This is likely due to the phase divergence in the growth of the population and transcription noise.
It should be possible to build a more accurate learning machine for the cell cycle by using a multioutput support vector regression machine [66] or a kernel adatron [68]. In either case the sensitivity analysis is directly computable from the weight matrices for the learning machine. For example, for a multioutput neural network the partial derivative of an output with respect to an input is given by:
With knowledge of the sensitivity analysis we can plot a Pareto chart showing the importance of each of the individual inputs with respect to the output. One could imagine also conducting multiway digital knockout experiments with this system and comparing it with known experimental results.
Conclusions
In this review we have touched on a few mathematical ideas that may expand our understanding of the boundary between living and nonliving systems. We recognize that there are other important works, including category theory [2, 69], genetic networks [70], complexity theory and selforganization [20, 69–71], autopoiesis [72], Turing machines and information theory [73], and many others. It would take a fulllength book to review the many subjects that already come into play in discussing the boundaries between living and nonliving. Here we focused on mostly group theory and abstract algebra applied to molecular systems biology. Throughout this paper we have briefly described possible open problems. Here we collect them with respect to the subsections of the paper.
In the section on the genetic code we proposed that it may be possible to use perturbation theory to explore the adjacent possibilities in the 65dimensional spacetime manifold of the evolving genome. One could start by using phylogenetic mappings as historical data on this manifold and compute distances in this space. The statistics of these distances may then be fed back via the perturbation theory to study the trajectory. Of course, we recognize that the existing stateoftheart bioinformatics makes this proposal mostly unfeasible at this time. But crude outlines of the technique could be developed.
With regards to algebraic graph theory, there are several minor open problems we discussed. Here we reiterate only a few. First, it may be possible to map the automorphism group to disease state through an isomorphism with the phase space. Second, it may be possible to use some of the graph deletion concepts G \(E,V) to evaluate existing proteinprotein interactomes. One would start with a known complete graph, say a manufacturing plant, remove edges (vertices) and compute conventional network graph statistics. Third, we pose the following questions. What is the minimal embedding space for something like a proteinprotein network? Are there patterns in that space? What biological significance is there to these observed patterns?
In the section on network dynamics and groupoid formalism we suggested that the network graph might not be the main focus for understanding the phenotype but rather the phase space of the network dynamics. We showed a simple case of a C_{6} network and its phase space network. We envision that the molecular network of a cell is actually a complex network of hypercycles and feedback circuits that could be better represented in a higherdimensional space. Targeting one protein may not have much effect on the overall phenotype or the overall phase space dynamics. For example, a planar array of 6cycles would give rise to frustration points, not unlike a spinglass [74]. The overall dynamics would then give rise to not only emergent phase space dynamics but also emergent patterns in the phase space that would not be computable from the molecular reaction network graph [75–79]. Targeting one or two proteins in this network, based on molecular interaction maps, may prove to be futile in many cases. We conjecture that targeting nodes in the molecular network that play key roles in the phase space, as revealed by analysis of the automorphism decomposition might be a better way to carry out drug discovery and treatment of cancer [80].
References
Schrödinger E: What Is Life?: The Physical Aspect of the Living Cell ; with, Mind and Matter ; & Autobiographical Sketches. 1992, Cambridge: Cambridge University Press
Rosen R: Life Itself: A Comprehensive Inquiry into the Nature, Origin, and Fabrication of Life. 1991, New York: Columbia University Press
Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabási AL: The human disease network. Proc Natl Acad Sci USA. 2007, 104: 86858690. 10.1073/pnas.0701361104.
Annila A: Space, time and machines. 2009, arXiv:0910.2629v1 [physics.genph]
Penrose R: The Road to Reality: A Complete Guide to the Laws of the Universe. 2004, London: Jonathan Cape
Weinberg S: The Quantum Theory of Fields. 1995, Cambridge: Cambridge University Press, 1:
Kaku M: Quantum Field Theory: A Modern Introduction. 1993, New York: Oxford University Press
Rosen J: Symmetry Rules: How Science and Nature Are Founded on Symmetry. 2008, New York: Springer, 1
Brooks R: The relationship between matter and life. Nature. 2001, 409: 409411. 10.1038/35053196.
Woese CR: A New Biology for a New Century. Microbiol Mol Biol Rev. 2004, 68: 173186. 10.1128/MMBR.68.2.173186.2004.
Cohen JE: Mathematics Is Biology's Next Microscope, Only Better; Biology Is Mathematics' Next Physics, Only Better. PLoS Biol. 2004, 2: e43910.1371/journal.pbio.0020439.
Goodman FM: Algebra: Abstract and Concrete. 2006, Iowa City, IA: SemiSimple Press, 2.5
Woolfson A: Life Without Genes. 2000, London: HarperCollins
Crick FHC: Codonanticodon pairing: The wobble hypothesis. J Mol Biol. 1966, 19: 548555. 10.1016/S00222836(66)800220.
Danckwerts HJ, Neubert D: Symmetries of genetic codedoublets. J Mol Evol. 1975, 5: 327332. 10.1007/BF01732219.
Bertman MO, Jungck JR: Group graph of the genetic code. J Hered. 1979, 70: 379384.
Findley AM, Findley GL, McGlynn SP: Genetic coding: approaches to theory construction. J Theor Biol. 1982, 97: 299318. 10.1016/00225193(82)901060.
Findley AM, McGlynn SP, Findley GL: Applications of differential geometry to molecular genetics. J Biol Phys. 1985, 13: 8794. 10.1007/BF01878385.
Kauffman SA: At Home in the Universe: The Search for Laws of SelfOrganization and Complexity. 1995, New York: Oxford University Press
Kauffman SA: Investigations. 2000, Oxford: Oxford University Press
Kauffman S: Molecular autonomous agents. Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences. 2003, 361: 10891099. 10.1098/rsta.2003.1186.
Hornos JEM, Hornos YMM: Algebraic model for the evolution of the genetic code. Phys Rev Lett. 1993, 71: 440110.1103/PhysRevLett.71.4401.
Forger M, Hornos YMM, Hornos JEM: Global aspects in the algebraic approach to the genetic code. Phys Rev E. 1997, 56: 707810.1103/PhysRevE.56.7078.
Stone M, Goldbart PM: Mathematics for Physics: A Guided Tour for Graduate Students. 2009, Cambridge, UK: Cambridge University Press
Gallian JA: Contemporary Abstract Algebra. 2006, Boston, MA: Houghton Mifflin, 6
Rao PN, Johnson RT: Mammalian Cell Fusion : Studies on the Regulation of DNA Synthesis and Mitosis. Nature. 1970, 225: 159164. 10.1038/225159a0.
Johnson RT, Rao PN: Mammalian Cell Fusion : Induction of Premature Chromosome Condensation in Interphase Nuclei. Nature. 1970, 226: 717722. 10.1038/226717a0.
Hanna JH, Saha K, Jaenisch R: Pluripotency and Cellular Reprogramming: Facts, Hypotheses, Unresolved Issues. Cell. 2010, 143: 508525. 10.1016/j.cell.2010.10.008.
Kim J, Eberwine J: RNA: state memory and mediator of cellular phenotype. Trends Cell Biol. 2010, 20: 311318. 10.1016/j.tcb.2010.03.003.
Junker BH, Schreiber F: Analysis of Biological Networks. 2008, Hoboken, N.J: WileyInterscience
Mason O, Verwoerd M: Graph theory and networks in Biology. IET Syst Biol. 2007, 1: 89119. 10.1049/ietsyb:20060038.
Newman MEJ, Barabási AL, Watts DJ: The Structure and Dynamics of Networks. 2006, Princeton: Princeton University Press
Albert R, Barabási AL: Statistical mechanics of complex networks. Rev Mod Phys. 2002, 74: 4797. 10.1103/RevModPhys.74.47.
Godsil CD, Royle G: Algebraic Graph Theory. 2001, New York: Springer
Chung FRK: Spectral Graph Theory. 1997, Providence, R.I: Published for the Conference Board of the mathematical sciences by the American Mathematical Society
MacArthur BD, Anderson JW: Symmetry and SelfOrganization in Complex Systems. 2006, arXiv:condmat/0609274v1 [condmat.disnn]
Xiao Y, MacArthur BD, Wang H, Xiong M, Wang W: Network quotients: Structural skeletons of complex systems. Phys Rev E. 2008, 78: 046102
MacArthur BD, SánchezGarcía RJ, Anderson JW: Symmetry in complex networks. Discrete Appl Math. 2008, 156: 35253531. 10.1016/j.dam.2008.04.008.
McKay BD: Practical Graph Isomorphism. Congressus Numerantium. 1981, 30: 4587.
Basso K, Margolin AA, Stolovitzky G, Klein U, DallaFavera R, Califano A: Reverse engineering of regulatory networks in human B cells. Nat Genet. 2005, 37: 382390. 10.1038/ng1532.
Golubitsky M, Stewart I: Nonlinear dynamics of networks: the groupoid formalism. Bulletin of the American Mathematical Society. 2006, 43: 305364. 10.1090/S0273097906011086.
Jain BJ, Wysotzki F: Automorphism Partitioning with Neural Networks. Neural Processing Letters. 2003, 17: 205215. 10.1023/A:1023657727387.
Alon U: An Introduction to Systems Biology: Design Principles of Biological Circuits. 2007, Boca Raton, FL: Chapman & Hall/CRC
Golubitsky M, Stewart I: The Symmetry Perspective: From Equilibrium to Chaos in Phase Space and Physical Space. 2002, Boston, MA: Birkhäuser
Brown R: From Groups to Groupoids: a Brief Survey. Bulletin of the London Mathematical Society. 1987, 19: 113134. 10.1112/blms/19.2.113.
Higgins PJ: Notes on Categories and Groupoids. 1971, London: Van Nostrand Reinhold Co
Dicks W, Ventura E: The group fixed by a family of injective endomorphisms of a free group. 1996, Providence, R.I.: American Mathematical Society
Laumon G, MoretBailly L: Champs algebriques. 1999, Berlin: Springer
Karp RL: Quantum Symmetries and Exceptional Collections. Communications in Mathematical Physics. 2010, 301: 121.
Hasslacher B, Tilden MW: Living machines. Robotics and Autonomous Systems. 1995, 15: 143169. 10.1016/09218890(95)00019C.
Rietman EA, Tilden MW, Askenazi M: Analog computation with rings of quasiperiodic oscillators: the microdynamics of cognition in living machines. Robotics and Autonomous Systems. 2003, 45: 249263. 10.1016/j.robot.2003.08.002.
Rietman EA, Hillis RW: Neural Computation with Rings of Quasiperiodic Oscillators. 2006, arXiv:cs/0611136v1 [cs.RO]
Pikovsky A, Rosenblum M, Kurths J: Synchronization: A Universal Concept in Nonlinear Sciences. 2001, Cambridge: Cambridge University Press
Vermeirssen V, Barrasa MI, Hidalgo CA, Babon JAB, Sequerra R, DoucetteStamm L, Barabási AL, Walhout AJM: Transcription factor modularity in a genecentered C. elegans core neuronal proteinDNA interaction network. Genome Res. 2007, 17: 10611071. 10.1101/gr.6148107.
Arda HE, Taubert S, MacNeil LT, Conine CC, Tsuda B, Van Gilst M, Sequerra R, DoucetteStamm L, Yamamoto KR, Walhout AJM: Functional modularity of nuclear hormone receptors in a Caenorhabditis elegans metabolic gene regulatory network. Mol Syst Biol. 2010, 6: 367
Lee TI, Rinaldi NJ, Robert F, Odom DT, BarJoseph Z, Gerber GK, Hannett NM, Harbison CT, Thompson CM, Simon I, Zeitlinger J, Jennings EG, Murray HL, Gordon DB, Ren B, Wyrick JJ, Tagne JB, Volkert TL, Fraenkel E, Gifford DK, Young RA: Transcriptional Regulatory Networks in Saccharomyces cerevisiae. Science. 2002, 298: 799804. 10.1126/science.1075090.
Xu Z, Wei W, Gagneur J, Perocchi F, ClauderMunster S, Camblong J, Guffanti E, Stutz F, Huber W, Steinmetz LM: Bidirectional promoters generate pervasive transcription in yeast. Nature. 2009, 457: 10331037. 10.1038/nature07728.
Wolfram MathWorld: The Web's Most Extensive Mathematics Resource.http://mathworld.wolfram.com/http://mathworld.wolfram.com/
Bartel PL, Fields S: The Yeast TwoHybrid System. 1997, New York: Oxford University Press
Dreze M, Monachello D, Lurin C, Cusick ME, Hill DE, Vidal M, Braun P: HighQuality Binary Interactome Mapping. Methods in Enzymology. 2010, Academic Press, 470: 281315.
Pramila T, Wu W, Miles S, Noble WS, Breeden LL: The Forkhead transcription factor Hcm1 regulates chromosome segregation genes and fills the Sphase gap in the transcriptional circuitry of the cell cycle. Genes Dev. 2006, 20: 22662278. 10.1101/gad.1450606.
Granovskaia MV, Jensen LJ, Ritchie ME, Toedling J, Ning Y, Bork P, Huber W, Steinmetz LM: Highresolution transcription atlas of the mitotic cell cycle in budding yeast. Genome Biol. 2010, 11: R24R24. 10.1186/gb2010113r24.
de Lichtenberg U, Jensen LJ, Brunak S, Bork P: Dynamic Complex Formation During the Yeast Cell Cycle. Science. 2005, 307: 724727. 10.1126/science.1105103.
Gershenfeld NA: The Nature of Mathematical Modeling. 1999, Cambridge: Cambridge University Press
Rumelhart DE, McClelland JL, University of California, San Diego: Parallel Distributed Processing: Explorations in the Microstructure of Cognition. 1986, Cambridge, Mass: MIT Press, 1:
Vapnik VN: Statistical Learning Theory. 1998, New York: Wiley
Vovk V, Gammerman A, Shafer G: Algorithmic Learning in a Random World. 2005, New York: Springer
ShaweTaylor J, Cristianini N: Kernel Methods for Pattern Analysis. 2004, Cambridge, UK: Cambridge University Press
Kampis G: SelfModifying Systems in Biology and Cognitive Science: A New Framework for Dynamics, Information, and Complexity. 1991, Oxford: Pergamon Press, 1
Kauffman SA: The Origins of Order: SelfOrganization and Selection in Evolution. 1993, New York: Oxford University Press
Feistel R, Ebeling W: Evolution of Complex Systems: SelfOrganization, Entropy, and Development. 1989, Dordrecht, Holland: Kluwer Academic Publishers
Maturana HR, Varela FJ: Autopoiesis and Cognition: The Realization of the Living. 1980, Boston, MA: Kluwer Academic Publishers
Chaitin GJ: Metabiology. 2010,http://www.umcs.maine.edu/~chaitin/http://www.umcs.maine.edu/~chaitin/
Fischer KH, Hertz J: Spin Glasses. 1991, Cambridge: Cambridge University Press
Anderson PW: More Is Different. Science. 1972, 177: 393396. 10.1126/science.177.4047.393.
Wolfram S: Undecidability and intractability in theoretical physics. Phys Rev Lett. 1985, 54: 735738. 10.1103/PhysRevLett.54.735.
Moore C: Unpredictability and undecidability in dynamical systems. Phys Rev Lett. 1990, 64: 23542357. 10.1103/PhysRevLett.64.2354.
Laughlin RB, Pines D: The Theory of Everything. Proc Natl Acad Sci USA. 2000, 97: 2831. 10.1073/pnas.97.1.28.
Gu M, Weedbrook C, Perales Á, Nielsen MA: More really is different. Physica D. 2009, 238: 835839. 10.1016/j.physd.2008.12.016.
Huang S, Eichler G, BarYam Y, Ingber DE: Cell Fates as HighDimensional Attractor States of a Complex Gene Regulatory Network. Phys Rev Lett. 2005, 94: 128701
Yu H, Braun P, Yıldırım MA, Lemmens I, Venkatesan K, Sahalie J, HirozaneKishikawa T, Gebreab F, Li N, Simonis N, Hao T, Rual JF, Dricot A, Vazquez A, Murray RR, Simon C, Tardivo L, Tam S, Svrzikapa N, Fan C, de Smet AS, Motyl A, Hudson ME, Park J, Xin X, Cusick ME, Moore T, Boone C, Snyder M, Roth FP, Barabási AL, Tavernier J, Hill DE, Vidal M: HighQuality Binary Protein Interaction Map of the Yeast Interactome Network. Science. 2008, 322: 104110. 10.1126/science.1158684.
Acknowledgements
We thank Arthur B. Pardee for discussions involving the multinucleated cell group table. We thank David Hill and Michael Cusick for helpful discussions. We thank Patrick Rourke for improving the readability at an early stage in the manuscript. EAR thanks Marc Vidal for the free time to pursue this research. We thank Philip Winter for his assistance.
Author information
Authors and Affiliations
Corresponding author
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors' contributions
EAR did the research, conceived of the idea of a review paper on the uses of group theory in systems biology, provided most of the material presented in this paper and wrote the first draft. RLK corrected much of the group theory material and made extensive edits of the manuscript. JAT assisted in presenting and integrating the material into the manuscript and coordinated the project. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Rietman, E.A., Karp, R.L. & Tuszynski, J.A. Review and application of group theory to molecular systems biology. Theor Biol Med Model 8, 21 (2011). https://doi.org/10.1186/17424682821
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/17424682821
Keywords
 Automorphism Group
 Genetic Code
 Incidence Matrix
 Network Graph
 Klein Group