A symmetry model for genetic coding via a wallpaper group composed of the traditional four bases and an imaginary base E: Towards category theory-like systematization of molecular/genetic biology

Sawamura, Jitsuki; Morishita, Shigeru; Ishigooka, Jun

doi:10.1186/1742-4682-11-18

Research
Open access
Published: 07 May 2014

A symmetry model for genetic coding via a wallpaper group composed of the traditional four bases and an imaginary base E: Towards category theory-like systematization of molecular/genetic biology

Jitsuki Sawamura¹,
Shigeru Morishita² &
Jun Ishigooka¹

Theoretical Biology and Medical Modelling volume 11, Article number: 18 (2014) Cite this article

4426 Accesses
4 Citations
Metrics details

Abstract

Background

Previously, we suggested prototypal models that describe some clinical states based on group postulates. Here, we demonstrate a group/category theory-like model for molecular/genetic biology as an alternative application of our previous model. Specifically, we focus on deoxyribonucleic acid (DNA) base sequences.

Results

We construct a wallpaper pattern based on a five-letter cruciform motif with letters C, A, T, G, and E. Whereas the first four letters represent the standard DNA bases, the fifth is introduced for ease in formulating group operations that reproduce insertions and deletions of DNA base sequences. A basic group Z₅ = {r, u, d, l, n} of operations is defined for the wallpaper pattern, with which a sequence of points can be generated corresponding to changes of a base in a DNA sequence by following the orbit of a point of the pattern under operations in group Z₅. Other manipulations of DNA sequence can be treated using a vector-like notation ‘D_j’ corresponding to a DNA sequence but based on the five-letter base set; also, ‘D_j’s are expressed graphically. Insertions and deletions of a series of letters ‘E’ are admitted to assist in describing DNA recombination. Likewise, a vector-like notation R_j can be constructed for sequences of ribonucleic acid (RNA). The wallpaper group B = {Z₅^×∞, ●} (an ∞-fold Cartesian product of Z₅) acts on D_j (or R_j) yielding changes to D_j (or R_j) denoted by ‘D_j◦B_(j→k) = D_k’ (or ‘R_j◦B_(j→k) = R_k’). Based on the operations of this group, two types of groups—a modulo 5 linear group and a rotational group over the Gaussian plane, acting on the five bases—are linked as parts of the wallpaper group for broader applications. As a result, changes, insertions/deletions and DNA (RNA) recombination (partial/total conversion) are described. As an exploratory study, a notation for the canonical “central dogma” via a category theory-like way is presented for future developments.

Conclusions

Despite the large incompleteness of our methodology, there is fertile ground to consider a symmetry model for genetic coding based on our specific wallpaper group. A more integrated formulation containing “central dogma” for future molecular/genetic biology remains to be explored.

Background

Group theory is the cornerstone in classifying and studying abstract concepts involving symmetry [1, 2]. In general, when group theory is used in various fields of natural sciences, it plays an important role in describing geometrical or dynamical symmetries of phenomena under consideration; examples include mathematics [3, 4], physics [5–8], chemistry [9], molecular/genetic biology [10–22], and anthropology [23]. Moreover, much fertile ground still exists where group theory can display its versatility from a multitude of viewpoints. To our knowledge, one such candidate is molecular/genetic biology where group theory has already provided great contributions [10–22].

Deoxyribonucleic acid (DNA) is a nucleic acid containing genetic instructions coded in ordered sequences of four bases located in genes that determine specific genetic characteristics of an organism. In the canonical Watson-Crick DNA base pairing, adenine (A) forms a base pair with thymine (T) and guanine (G) forms a base pair with cytosine (C) [24–26]. Similarly, ribonucleic acid (RNA), which has various biological roles, is a molecule that has a much shorter chain of nucleotides. The sequence of DNA consisting of bases ‘A, C, T and G’ is transcribed into RNA, composed of bases ‘A, C, U and G’; the sets differ in that ‘U (uracil)’ replaces ‘T (thymine)’.

Over the latter half of the 20th century, the nature of the genetic code became fairly well established. As for the coding sequences of DNA into nucleotide units, one needs to build up more general, sophisticated, rationally functionalized systematics concerning DNA base sequences that will enable genes to be understood at the molecular biology level in more optimized form. Indeed, many approaches have been undertaken to describe gene characteristics from various viewpoints within the participating disciplines [24–42]. In particular, the concept of ‘symmetry’ for DNA sequences plays an important role in understanding their characteristics.

However, each has its advantages and disadvantages in terms of utility and convenience in applications. To our knowledge, so far, if we intend to incorporate a sequence of bases into another sequence and/or exclude certain bases from that substitution, we need to look further afield because normally, sequencing and inserting-deleting operations cannot help in distinguishing one from the other. That means that multiple types of operations are necessary if features of DNA containing exceptional sequences are to be treated.

Previously, we suggested prototypal models that describe some clinical states based on group postulates [43]. In this article, we demonstrate a group/category theory-like model for molecular/genetic biology as an alternative application of our previous model. Specifically, focusing on DNA base sequences, we present a simple model where not only changes in sequences of DNA bases but also insertion, deletion, and recombination (partial/total conversion) of DNA bases are treatable within some simple rules via the combination of a set and a group defined over some specific wallpaper pattern. Moreover, a category theory-like formalism, where a description of the DNA bases and their transcription to RNA bases can be made, is attempted from which a category theory-like framework is constructed requiring as few and as simple rules as possible. As an example, by assimilating the canonical “central dogma” [26], we hope to provoke more interactivity among those interested branches of natural science, if possible. The methodology consists of eight parts, the content of which is built-up step-by-step as scope is enlarged to encompass the more advanced themes.

§1 A preliminary setting describing a wallpaper pattern used as a symmetry model for DNA sequences

First, we consider a certain wallpaper pattern that helps us to visualize the operations of the present model (see Figure 1) [2, 44–47]. There, the pattern comprises repetitions of a cruciform motif with each motif consisting of five letters E, C, A, T, and G with the latter four letters equally spaced at the points of a cross about a central E. The motif generates the pattern through a translation specified as a knight’s move in chess—two steps out and one step right. In this way, the grid-points in this regular wallpaper pattern can be obtained uniquely and be extended indefinitely. Note also that each horizontal line is generated by repetitions of the sequence E-C-A-T-G. Moreover, the line above is a displaced copy of the one below with letter A placed directly above letter E. This preserves the condition that any cruciform is composed of one each of the five letters.

The wallpaper pattern as an array of cruciforms is capable of being constructed as stacks of a unit cell (the 5 × 5 square enclosed in the dotted line in Figure 1) by horizontal and vertical translations [2, 44–47]. The positions of the bases of the cruciform motif are so determined to make it easier to determine the complementary base of each base; the practical applications are clarified later. We introduce the letter ‘E’ to indicate an ‘empty’ base which is treated in the same way as the other bases at least for display purposes. This five-base scheme is adopted to aid the notion of group composition in our model.In addition, we focus on a point ‘P’ on the wallpaper pattern (i.e., the grid-point array in Figure 1), to compose a certain DNA base sequence. In accordance with this, we shall always adjoin a series of letters that are determined as a trajectory of the point ‘P’—also called the ‘orbit of P’—over the wallpaper pattern. For instance, when we identify or recognize some changes of DNA bases with ‘P’ moving from ‘A → C → E’ over the wallpaper pattern, then this represents a series of changes to one base located at a specific position of a DNA sequence in the manner ‘ACE…’ or ‘…A…’ → ‘…C…’ → ‘…E…’. The ‘orbit of P’ can describe series of sequences of DNA bases, or series of changes of each letters in the same places, although, in this article, we focus mainly on the latter case, without provisory context.

With these postulates, we consider the set C₅ = {C, A, T, G, E}. If the point ‘P’ moves onto an ‘E’, ‘E’ must be included and identified in the series of letters, as in ‘ACGE T’, for example. This is interpreted as the series of DNA bases ‘ACGT’. Thus, ‘E’ depends on context; that is, ‘E’ can be inserted or removed from any series where we would like to include or eliminate ‘E’s so long as these are recognized/tracked in the entire process. When read from left to right, the place number of each letter in the series is subscripted, as in ‘A₁C₂G₃T₄’. After insertions/deletions, the place number is augmented/diminished depending on initial and final positions; hence following three insertions ‘A₁C₂G₃T₄’ → ‘A₁E₂C₃G₄E₅E₆T₇’; this means the point ‘P’ takes the place ‘E’ once between A₁ and C₃, and twice between G₄ and T₇ over the wallpaper pattern in Figure 1. More details are to be given later.As a further refinement, the orbit of ‘P’ can be stated as a sequence of shift operations as follows; let ‘r’ denote a move one step to the right corresponding to say A → T, T → G or G → E. Similarly, we denote ‘l’: move one step to the left as for C → E, and E → G; ‘u’: move one step up; and ‘d’: move down. We include ‘n’ to designate a ‘no move’ (remain at the same point). A sequence of ‘r’, ‘u’, ‘d’, ‘l’, and ‘n’ then provides a position-independent means to describe the orbit of ‘P’; any of these five operations can be applied to any of the five letters. We denote their operations on ‘P’ in the following way. If point ‘P’ moves from ‘E’ to ‘C’ (step to the right), we write ‘E◦r = C’ where ‘◦’ signifies apply ‘r’ to ‘E’ (see Figure 1). In a similar way, ‘E◦l = G’, ‘E◦u = A’, ‘E◦d = T’ and ‘E◦n = E’. Note though that each operator means a change of one base to another base within these five bases; the meaning of ‘=’ is not the degree of translation but equivalence to the resultant base from the wallpaper pattern.To shorten multiple applications of the operations, we introduce ‘●’ to denote the composition of two operations, for example, ‘((E◦r)◦u) = E◦r●u’. From Figure 1, we find ‘E◦d = T’ yields the same change as ‘E◦r●u = T’. As other examples, ‘r●r●d = n’ results in ‘r●r = u’, and ‘d●d●l = n’ results in ‘d●d = r’, because from Figure 1, ‘E◦r●r = E◦u = A’, and ‘E◦d●d = E◦r = C. All possible one-step changes between letters ‘C, A, T, G, E’ and operators ‘r, u, d, l, n’, and all possible compositions of operators for the wallpaper pattern of Figure 1 are presented in Figures 2 and 3, and Appendix A.

The binary compositions among the five operations ‘r, u, d, l and n’ can be shown to satisfy the Abelian group postulates (wallpaper group/plane symmetry group/plane crystallographic group [2–4, 44–47]). Indeed, let Z₅ = {r, u, d, l, n}, then {Z₅, ●} is the Abelian group of order five. That is, for all elements ∈ Z₅, we have:

1)
Associativity: x●(y●z) = (x●y)●z, (x, y or z being arbitrary elements belonging to Z₅);
2)
Identity: ‘n’ is an identity element such that x●n = n●x = x;
3)
Inverse: a unique element x⁻¹ exists such that x● x⁻¹ = x⁻¹●x = n (x⁻¹ is called the inverse element of x);
4)
Commutativity: x●y = y●x,
5)
Closure: any combination of operations between x●y belongs to Z₅.

Therefore, Z₅ is an Abelian group [2–4, 44–47]. The inverses for each of the elements are:

r^{- 1} = l, l^{- 1} = r, u^{- 1} = d, d^{- 1} = u, n^{- 1} = n,

(1)

which can be used to complete the composition table—also known as the Cayley table of the group.

We further stipulate that when we perform these operations, then we always assume/identify the coding of the sequence of DNA bases in accordance with these operations, and vice versa. This is because the action of ‘u’ on E yields base ‘A’, that of ‘d’ on ‘E’ yields base ‘T’, that of ‘l’ on ‘E’ yield base ‘G’, and naturally that of ‘n’ on ‘E’ results in the same ‘E’. For a more complex example, we might insert a certain series of ‘A’s in ‘ACCGT’ between the 3rd and 4th base. To begin, we decide to write this manipulation as follows: ‘ACC( )GT’ is transformed into ‘ACC(E)GT’ by inserting ‘E’. Next, because the operation ‘u’ to the new 4th component ‘E’ yields ‘E → A’ (‘ACCGT’ → ‘ACCA GT’), and vice versa, that is, ‘d’ operating on the 4th component ‘A’ produces ‘A → E’ (‘ACCA GT’ → ‘ACCGT’). In this way, appropriate use of ‘E’s through the adequate combination of operators of Z₅ enables to express inclusion and/or exclusion of any base between bases in a DNA sequence. To indicate this, we adopt a vector-like description with an infinite number of ‘E’s being assumed to be present at the end of any given base sequence. This means the point ‘P’ takes ‘E’s an infinite number of times over the wallpaper pattern (Figure 1); i.e.,

\begin{array}{l} D_{j} = [C |T| G |A| T |A| A |C| E |E| E |E| E |E| \dots] \\ = [C_{1} | T_{2} () G_{3} |A_{4}| T_{5} () A_{6} |A_{7}| C_{8} |E_{9}| E_{10} |E_{11}| E_{12} |E_{13}| E_{14} | \dots] \end{array}

(2a)

= [C_{1} | T_{2} (E_{3}) G_{4} |A_{5}| T_{6} (E_{7} | E_{8}) A_{9} |A_{10}| C_{11} |E_{12}| E_{13} |E_{14}| E_{15} |E_{16}| \dots] .

(2b)

(j: the number of the sequence, N: the number of single-stranded DNA bases of ‘D_j’s except for the infinite tail of ‘E’s; in the above case, N = 8)

In the last expression (2b), ‘E₃’ is inserted before the 3rd component ‘G₃’ and 6th component ‘A₆’ marked by ‘( )’ in formula (2a), and the place numbers of all components to the right of the 3rd component are all incremented by ‘1’, those to the right of the 6th component; by ‘1 + 2’. Likewise, we assume that the deletion of any ‘E’s that are already displayed in D_j is always permissible according to need with the place numbers being decreased by the necessary size.

Essentially, we regard the subscripted place number of a component of D_j, e.g., ‘3’ of ‘A₃’, as a convenient place mark to help in recognizing and counting the order of sequences. Place numbers remain fixed when performing operations within a series of operations during code recognition of bases. However, for an operation, another place number is always permissible in principle, from where indexing of a specific DNA base sequence starts.

Alternatively, we use the following notation to describe various cases:

1)
we denote by ‘{D_j}’ a sequence ‘D_j’ where specified ‘E’s other than the trailing series of ‘E’s are implicitly implied but the place number indexing is retained; i.e.,
$\{D_{j}\} = [C_{1} |T_{2}| G_{4} |A_{5}| T_{6} |A_{9}| A_{10} |C_{11}| E_{12} |E_{13}| E_{14} |E_{15}| E_{16} | \dots] .$
(3a)

Here, the explicitly indicated place numbers are the same as in (2b) and missing subscripted place numbers indicate omitted ‘E’s. Hence, (3a) without trailing ‘E’s and subscripts represents an ordinal/conventional DNA sequence.

2)
we denote by ‘<D_j>’ a sequence ‘D_j’ where specified explicit ‘E’s other than the trailing ‘E’s are deleted (changed into implicit ‘E’s) and the base sequence is re-indexed with sequential place numbers, i.e.,
$< D_{j} > = [C_{1} |T_{2}| G_{3} |A_{4}| T_{5} |A_{6}| A_{7} |C_{8}| E_{9} |E_{10}| E_{11} |E_{12}| E_{13} | \dots] .$
(3b)

Note that ‘E’s other than the trailing ‘E’s are not recognized as explicit components and hence are not indexed. Additional insertions/deletions of ‘E’s are permitted after deletions of ‘E’s; therefore, apart from the trailing ‘E’s, (3b) signifies an ordinal/conventional DNA sequence.

Although equivalent to ‘CTGATAAC’ as an actual DNA sequence expressions, related expressions {D_j} and <D_j> differ from each other; the former retains all information regarding inserted ‘E’s and place numbers whereas the latter does not.

In an extension of the notation, a multiple sequence of deletions of ‘E’s (say t-times) can be written as a t-tuple of ‘< >’s denoted ‘<<<<D_j>>>> (t-tuple) = <D_j> _t’. The final expression is without explicit ‘E’s other than those trailing at the end, and thus formulates a genuine DNA sequence after the appearance of indels. (Short for insertion/deletion markers, the idels are strings of mutated base pairs.) Similarly for the operation { }, we have ‘{{{D_j}}} (t-tuple) = {D_j}_t’. The operations ‘{ }’ and ‘< >’ can be performed freely when necessary; if further indels occur at say ‘G₃’ and ‘A₇’ in

<D_j > = [C₁|T₂|G₃|A₄|T₅|A₆|A₇|C₈|E₉|E₁₀|E₁₁|E₁₂|E₁₃|…], then < D_j > changes into

<D_j1 > = [C₁|T₂(E₃)A₄|T₅|A₆(E₇)C₈|E₉|E₁₀|E₁₁|E₁₂|E₁₃|…], and subsequently into

<<D_j1>> = [C₁|T₂|A₃|T₄|A₅|C₆|E₇|E₈|E₉|E₁₀|E₁₁|…]. The sequence < D_j1 > contains implicit ‘E’s aside from the trailing ‘E’s, and can be written as

{<D_j1>} = [C₁|T₂|A₄|T₅|A₆|C₈|E₉|E₁₀|E₁₁|E₁₂|E₁₃|…]. Naturally, {<D_j1>} and < D_j1 > are equivalent, but < <D_j1> > and < D_j1 > differ. Moreover, as long as place numbers are recognized/traced precisely, combinations of manipulations ‘{ }’ and ‘< >’ are allowed; e.g., {<{{<D_j1>}}>}. Hence, with appropriate use, we could treat (read, interpret, describe, record) conventional sequences of DNA via ‘{D_j}’ or ‘<D_j>’. However, below we shall focus on simple sequences ‘D_j’.

Looking at the beginning of a base sequence as in the following:

D_{j} = [C_{1} |G_{2}| A_{3} |C_{4}| \dots |T_{i}| \dots |A_{N - 1}| T_{N} |E_{N + 1}| E_{N + 2} |E_{N + 3}| \dots],

(i: i-th component of D_j, N: the number bases D_j)

a directionality for any D_j can be imposed;

D_{j} (5 \to 3) = [C_{1} |G_{2}| A_{3} |C_{4}| \dots |T_{i}| \dots |A_{N - 1}| T_{N} |E_{N + 1}| E_{N + 2} |E_{N + 3}| \dots],

and

D_{j} (3 \to 5) = [T_{1} |A_{2}| \dots |T_{N + 1 - i}| \dots |C_{N - 3}| A_{N - 2} |G_{N - 1}| C_{N} |E_{N + 1}| E_{N + 2} |E_{N + 3}| \dots] .

The notation, ‘(5 → 3)’ and ‘(3 → 5)’, is simply an additional label representing the two possible types of endings of single-stranded DNA. Nonetheless, when the number of bases is finite, two sequences can be equivalent, as for example

D_{j} (5 \to 3) = [C_{1} |G_{2}| A_{3} |C_{4}| T_{5} |A_{6}| T_{7}]

and

D_{j} (3 \to 5) = [T_{1} |A_{2}| T_{3} |C_{4}| A_{5} |G_{6}| C_{7}],

unless the prime endings <5’(five prime) → 3’(three prime) > or <3’ → 5’ > accompanies the sequence designation.

In accordance with these postulates, we now can define the set D = {D_j (j = 1,2,3,…)| D_j ∈ C₅ × C₅ × C₅ × … (N times, N ≤ ∞)} as the set of all possible sequences of recognized N-tuple single-stranded DNA bases. We can regard N to be a positive integer or infinity.

An analogous definition is clearly possible for the set R of RNA sequences; with ‘T’ substituted by ‘U’, operations of group Z₅, are similarly definable because all results obtained for DNA pertain to RNA under the base substitution. Thus, set R = {R_j (j = 1,2,3,…)| R_j ∈ C₅ × C₅ × C₅ × …(N times, N ≤ ∞)} is the set of all possible sequences of recognized N-tuple single-stranded RNA bases with C₅ = {C, A, U, G, E}.

§2 Group composition that yields changes in DNA bases via a Cartesian vector

Next, we can consider B = {B_m (m = 1,2,3,…) | B_m ∈ Z₅× Z₅× Z₅ × …(an N-fold product, N = ∞)} = {Z₅^×N, ●}, where elements of B act on any D_j. This means that D_j covers all possible sequences of the DNA bases, and this situation is the same for R_j of sequences of RNA bases.

Because B is a Cartesian product of the same Abelian group, it is also Abelian, where composition of any two elements of B is denoted by ‘●’ [4]. Details are shown in Appendix B and Figure 3. Accordingly, its formulation as a group B = {Z₅^×N, ●} is confirmed.

In a more general context, a Cartesian vector that is composed of the respective operators ‘b_(j→k)’ that effects the change D_j into D_k is definable in the following way:

\begin{array}{l} B_{(j \to k)} = [b_{(j \to k) 1} |b_{(j \to k) 2}| b_{(j \to k) 3} |\dots| b_{(j \to k) i} |\dots| b_{(j \to k) (N - 1)} |b_{(j \to k) N}| n_{N + 1} |n_{N + 2}| n_{N + 3} | \dots], \\ (N : the number of components) . \end{array}

Hence,

‘ D_{j} \circ B_{(j \to k)} = D_{k} ’ .

(4)

Clearly, for arbitrary ‘j’ and ‘k’, there exists a unique ‘m’ such that ‘B_(j→k) = B_m (m = 1,2,3,…)’; despite the difference in notation, the two are identical in practice.

Here, we present a simple example that consists of a multiple product of ‘B_(j→k)’s. Consider the scenario that a certain sequence of a single strand (or one side of a double-strand) of DNA transitions from D₁ to D₃, in stepwise fashion,

\begin{array}{l} D_{1} = [A_{1} |C_{2}| C_{3} |G_{4}| T_{5} |E_{6}| E_{7} | \dots] = [A_{1} |C_{2}| C_{3} () G_{4} |T_{5}| E_{6} |E_{7}| \dots], \\ D_{2} = [A_{1} |C_{2}| C_{3} (E_{4} | E_{5}) G_{6} |T_{7}| E_{8} |E_{9}| \dots], \\ D_{3} = [A_{1} |C_{2}| C_{3} (A_{4} | T_{5}) G_{6} |T_{7}| E_{8} |E_{9}| \dots], \\ D_{4} = [C_{1} |T_{2}| G_{3} (T_{4} | C_{5}) G_{6} |A_{7}| E_{8} |E_{9}| \dots] . \end{array}

We next consider the change ‘D₁ → D₂’. There exists an operator ‘B_(1→2) = [n₁|n₂|n₃(r₄|u₅)l₆|d₇|n₈|n₉|…]’ that is able to produce this change, specifically, the insertion of two ‘E’s between ‘C₃’ and ‘G₄’ yields the change ‘D₁ → D₂’. However, this sort of manipulation can be troublesome. Hence, in our model, insertion/deletion of ‘E’s are instead ascribed to the way the vector D_j is interpreted. This is preferable as this avoids easier manipulations. Next, we construct the operator ‘B_(2→3)’ that maps ‘D₂ → D₃’ (the details are shown in Appendix C). With reference to Figures 1, 2 and 3, we find

B_{(2 \to 3)} = [n_{1} |n_{2}| n_{3} (u_{4} | d_{5}) n_{6} |n_{7}| n_{8} |n_{9}| \dots] .

In a similar manner,

B_{(3 \to 4)} = [l_{1} |u_{2}| d_{3} (r_{4} | d_{5}) n_{6} |l_{7}| n_{8} |n_{9}| \dots] .

Naturally, the final D₄ is obtained from D₁ recursively,

‘ D_{1} \to D_{2} ’,

(5)

‘ D_{2} \circ B_{(2 \to 3)} ● B_{(3 \to 4)} = D_{4} ’ .

(6)

From the decomposition

\begin{array}{l} ‘ B_{(j \to k)} = B_{(j \to 0)} ● B_{(0 \to k)} = {B_{j}}^{- 1} ● B_{k} ’, we obtain ‘ D_{j} \circ B_{(j \to k)} = D_{j} \circ B_{(j \to 0)} ● B_{(0 \to k)} \\ = D_{0} ● B_{(0 \to k)} = D_{k} ’, \end{array}

(7)

where

D_{0} = [E_{1} |E_{2}| E_{3} |\dots| E_{i} |\dots| E_{N - 1} |E_{N}| E_{N + 1} |E_{N + 2}| E_{N + 3} | \dots]

(8)

denotes the identity element of D.

Note that the group operations can act on D_j irrespective of whether the ‘E’s are explicit or implicit as defined in §1. Moreover, any sequence ‘D_j’ can be presented as a polygonal line; as an example, the evolution of changes ‘D₁ → D₃’ is displayed in Figure 4.

§3 Integration of a linear group and a rotational group as a wallpaper group

Looking at the definitions of groups Z₅, D, and B, another approach is possible. The five bases can be represented by five equispaced phasors with a ‘2π/5’ angular phase separation located on the unit circle on the Gaussian plane, as depicted in Figure 5.

Herein, in the Gaussian plane, if ‘ω’ is defined to be the counterclockwise rotational angle ‘ω = 2π/5 (rad)’ and composition of ‘ω’ is denoted ‘●’, then assuming ‘ω’ obeys the ‘right translation rule’, we have

\begin{array}{l} ω = ω_{1}, \\ ω ● ω = 2 ω = ω_{2}, \\ ω ● ω ● ω = 3 ω = ω_{3}, \\ ω ● ω ● ω ● ω = 4 ω = ω_{4}, \\ ω ● ω ● ω ● ω ● ω = 5 ω = ω_{5} = ω_{0} = 0 (= no rotation) . \end{array}

(9)

The general form of an arbitrary base is expressed as ‘X_m ↔ Exp(m · ω · i)’ (here, ‘i’ is the ‘imaginary unit’, ω = 2π/5 (rad), m = {0, 1, 2, 3, 4, 5}). With {*} meaning one of the bases among ‘C, A, T, G and E’, we construct the following map. Denoting composition by ‘◦’, ω_m acts on the identity trivially and hence yields the correspondences

\begin{array}{l} Exp (0 \cdot i) \leftrightarrow \{Exp (0 \cdot ω \cdot i)\} = \{1\} = \{1\} \circ ω_{0} = E = X_{0}, \\ Exp (2 πi / 5) \leftrightarrow \{Exp (1 \cdot ω \cdot i)\} = \{1\} \circ ω_{1} = C = X_{1}, \\ Exp (4 πi / 5) \leftrightarrow \{Exp (2 \cdot ω \cdot i)\} = \{1\} \circ ω_{2} = A = X_{2}, \\ Exp (6 πi / 5) \leftrightarrow \{Exp (3 \cdot ω \cdot i)\} = \{1\} \circ ω_{3} = T = X_{3}, \\ Exp (8 πi / 5) \leftrightarrow \{Exp (4 \cdot ω \cdot i)\} = \{1\} \circ ω_{4} = G = X_{4}, \\ X_{5} = X_{0} = E . \end{array}

(10)

Expanding the operations for ‘ω₁, ω₂, ω₃, …’ on bases ‘C, A, T, G and E’, we establish for instance:

E \circ ω_{1} = C, C \circ ω_{1} = A, A \circ ω_{2} = G, T \circ ω_{3} = C, G \circ ω_{1} = E .

In continuance, the set P_ω = {ω₁, ω₂, ω₃, ω₄, ω₀ (= ω₅)} is readily confirmed to form group {P_ω, ●} where the identity element is ‘ω₀’ and the inverse of ‘ω_m’ is ‘ω_m⁻¹’:

\begin{array}{l} {ω_{0}}^{- 1} = ω_{0}, \\ {ω_{1}}^{- 1} = ω_{4}, \\ {ω_{2}}^{- 1} = ω_{3}, \\ {ω_{3}}^{- 1} = ω_{2}, \\ {ω_{4}}^{- 1} = ω_{1}, \\ {ω_{5}}^{- 1} = ω_{0} = 0 . \end{array}

(11)

Closure and associativity follow from (9) and (10).Here, if we turn our attention to the wallpaper pattern, a further bijection obeying the postulates of the wallpaper group can be confirmed. Corresponding to Figures 3 and 6 a bijection between the Cayley Tables for translational and rotational operations can be established:

\begin{array}{l} r \leftrightarrow ω = ω_{1}, \\ u \leftrightarrow 2 ω = ω_{2}, \\ d \leftrightarrow 3 ω = ω_{3}, \\ l \leftrightarrow 4 ω = ω_{4}, \\ n \leftrightarrow 5 ω = ω_{5} = ω_{0} = 0 . \end{array}

(12)

Naturally, inverses (e.g., ω₂⁻¹ = ω₃) are preserved in accordance with the inverses for ‘r, u, d, l, and n’. Any right translation of the horizontal line in Figures 1 and 6 (translational group) is also expressible as a rotation over the fivefold phasor diagram in Figure 5 (rotational group). Thus, these are able to be regarded as a synthesized form of the wallpaper style (wallpaper group) from which expressions such as ‘A◦r = A◦ω₁ = T = E◦d’ and ‘A◦l = A◦ω₄ = C = E◦r’ can be confirmed. All possible one-step changes between ‘A, C, T, G and E’ and ‘ω₁, ω₂, ω₃, ω₄ and ω₀,’ are shown in Figure 2.

Therefore, this rule for ‘E’ does not break the postulates for set D, group Z₅, and group B.

§4 Methods to obtain complementary sequences from primary DNA

Suppose, from among ‘C, A, T, G and E’, a base ‘X_m’ is given; its complementary base ‘X_m^†’ to ‘X_m’ is defined as follows; for ‘X_m = {Exp(m · ω · i)}, m = {0, 1, 2, 3, 4, 5}, then ‘X_m^†’ is obtained by ‘X_m^† = {Exp((5 – m) · ω · i)}’, where ‘{Exp(5 · ω · i)} = {1} = E’. In this regard,

‘ {X_{5}}^{†} = X_{0} = {X_{0}}^{†} = X_{5} ’

(13)

The procedure yields specifically ‘A^† = T’ and ‘C^† = G’.

Clearly, the complement of ‘E’ is ‘E’ itself; ‘E^† = E’.

Another notation for the ‘X_m’ expressed as a base can be given. We introduce the one-value function ‘^ωX(m)’ that provides the same results,

‘ X_{m} =^{ω} X (m) = E \circ mω = E \circ (ω ● ω ● \dots ● ω (m times)) ’ (m = 0, 1, 2, \dots, 5) .

(14)

As for ‘m’ in (14), both positive and negative integers are permissible. Thus,

‘X_m^†’ is expressible as

{X_{m}}^{†} =^{ω} X (5 - m) = E \circ (5 - m) ω = E \circ (ω ● ω ● \dots ● ω (‘ 5 - m ’ times)) ’ (m = 0, 1, 2, 3, 4, 5) .

(15)

A simple example is illustrated below.

Suppose ‘D_j’ = [A₁|T₂|C₃|E₄|G₅|T₆|…] = [^ωX(2)|^ωX(3)|^ωX(1)|^ωX(0)|^ωX(4)|^ωX(3)|…], then,

\begin{array}{l} ‘ {D_{j}}^{†} ’ = [^{ω} X (5 - 2) {|^{ω} X (5 - 3)|}^{ω} X (5 - 1) {|^{ω} X (5 - 0)|}^{ω} X (5 - 4) |^{ω} X (5 - 3) | \dots], \\ = [^{ω} X (3) {|^{ω} X (2)|}^{ω} X (4) {|^{ω} X (0)|}^{ω} X (1) |^{ω} X (2) | \dots], \\ = [T_{1} |A_{2}| G_{3} |E_{4}| C_{5} |A_{6}| \dots] . \end{array}

(16)

In accordance with the wallpaper group in Figure 1, the translations in one direction (e.g., right) over a horizontal line form a cyclic group P_r that contains only {r, r², r³, r⁴, r_e (= r⁰ = r⁵ = n)}. This group is isomorphic with group P_ω = {ω₁, ω₂, ω₃, ω₄, ω₀ (= ω₅)}, as is the group similarly generated over a vertical line.

Similar to ‘^ωX(m)’, ‘X_m’ can be expressed using another one-value function ^rX(m) = E◦r^m’:

X_{m} =^{r} X (m) = E \circ r^{m} = E \circ (r ● r ● \dots ● r (m times)) (m = 0, 1, 2, 3, 4, 5) .

(17)

Hence, ‘X_m^†’ (the complementary base of ‘X_m’) is written as

{X_{m}}^{†} =^{r} X (5 - m) = E \circ r^{5 - m} = E \circ (r ● r ● \dots ● r (‘ 5 - m ’ times)) .

(18)

Extension to vertical translations is straightforward;

‘ X_{m} =^{u} X (5) = E \circ u^{5} = E \circ (u ● u ● \dots ● u (m times)),

(19)

and its complementary base ‘X_m^†’ can be identified similarly although the order of letters are somewhat different.

Consider the following simple example in identifying ‘D_j^†’ using ‘^rX(m)’s;

for ‘ D_{j} ’ = [A_{1} |T_{2}| C_{3} |E_{4}| G_{5} |T_{6}| \dots] = [^{r} X (2) {|^{r} X (3)|}^{r} X (1) {|^{r} X (0)|}^{r} X (4) |^{r} X (3) | \dots],

by replacing ‘^ωX(m)’ by ‘^rX(m)’ in formula (15), the same result is obtained.

According to these rules, ‘X₁◦b_{[X1→ X4]} = X₁◦r³ = E◦r⁴ = G’. In general, when the i-th component ‘b_{[Xm1→ Xm2]i}’ of ‘B_(m1→m2)’ changes X_m1 (= E◦r^m1 = E◦(m₁ω)) to X_m2 (= E◦r^m2 = E◦(m₂ω)).

Hence, the highlighted form of the operator vector is expressed as

B_{(m 1 \to m 2)} = [\dots |b_{[Xm 1 \to Xm 2] i}| \dots] = [\dots |{r^{m 2 - m 1}}_{i}| \dots] = [\dots |(m_{2} - m_{1}) ω_{i}| \dots] .

(20)

For a further example, given the operator ‘B_(j→k)’ that changes D_j to D_k,

\begin{array}{l} D_{j} = [C_{1} |A_{2}| E_{3} |\dots| C_{i} |\dots| T_{N - 1} |A_{N}| E_{N + 1} |E_{N + 2}| E_{N + 3} | \dots], \\ = [E \circ {r^{1}}_{1} |E \circ {r^{2}}_{2}| E \circ {r^{0}}_{3} |\dots| E \circ {r^{1}}_{i} |\dots| E \circ {r^{3}}_{N - 1} |E \circ {r^{2}}_{N}| E \circ {r^{0}}_{N + 1} |E \circ {r^{0}}_{N + 2}| E \circ {r^{0}}_{N + 3} | \dots], \\ D_{k} = [G_{1} |T_{2}| C_{3} |\dots| G_{i} |\dots| C_{N - 1} |T_{N}| E_{N + 1} |E_{N + 2}| E_{N + 3} | \dots], \\ = [E \circ {r^{4}}_{1} |E \circ {r^{3}}_{2}| E \circ {r^{1}}_{3} |\dots| E \circ {r^{4}}_{i} |\dots| E \circ {r^{1}}_{N - 1} |E \circ {r^{3}}_{N}| E \circ {r^{0}}_{N + 1} |E \circ {r^{0}}_{N + 2}| E \circ {r^{0}}_{N + 3} | \dots] . \end{array}

With details shown in Appendix D, ‘B_(j→k)’ takes the form

B_{(j \to k)} = [{r^{3}}_{1} |{r^{1}}_{2}| {r^{1}}_{3} |\dots| {r^{3}}_{i} |\dots| {r^{3}}_{N - 1} |{r^{1}}_{N}| {r^{0}}_{N + 1} |{r^{0}}_{N + 2}| {r^{0}}_{N + 3} | \dots] .

Naturally, the state D_k is obtained through recursively applying the operations, D_j ◦B_(j→k) = D_k. (Details are presented in Appendix D).

Whereas ‘D_j^†’s might have components in reverse order in terms of sense (5’ or 3’), there exists however certain ‘D_k’ such that ‘D_k = D_j^†, (j, k = 0, 1, 2, 3, 4,…)’. With this, ‘D_j^†’ is one of the ordinal elements belonging to the same set D. Thus, the symbol ‘†’ need only be present when elements are distinct.

§5 Further unifying notation to describe the wallpaper group operation

Consider Figure 6; we assume that the number of right translations ‘r’ ∈ group P_r, (or ω ∈ group P_r) is ‘a’ and the number of up translations ‘u’ ∈ group P (or 2ω (= ω₂) ∈ group P_r) is ‘b’ with a, b = …,-2, −1, 0, 1, 2,…. Similarly, with ‘d ↔ 3ω’ ‘l ↔ 4ω’, the total change can be summarized as ‘x[a, b]’. We can confirm that there exists at least a pair of ‘a, b’ that satisfies

‘ X = E \circ x [a, b] ’,

(21)

because any base in Figure 6 can be obtained by a finite number of transitions from ‘E’. For instance, A can be expressed as; ‘A = E◦x[0, 1] = E◦x[2, 0] = E◦x[1,3] = E◦x[2,3] = E◦x[4,4]’. However, we remark that ‘x[a, b]’ means changes of bases from one to another prescribed by the wallpaper pattern. In practice, ‘x[a, b] = r^a●u^b’ constitutes a multiple composition of elements of group Z₅. In addition, ‘x[−a, −b] = r^-a●u^-b = l^a●d^b’ or ↔ ‘(−a)ω●(−b)(2ω) = (−a-2b)ω’. E.g., ‘x[−3, −2] means ‘r⁻³●u⁻² = l³●d²’ or ↔ ‘(−3)ω●(−2 · 2)(ω) = (−3 -4)ω = (−7)ω = (−2)ω = 3ω = ω₃’.

For the wallpaper group, the ‘a’ and ‘b’ should be interpreted in modulo 5 addition. The Cayley table for the wallpaper group are presented in Appendix A.

Within ‘the square unit cell’ in Figure 1 or 6, there are five pairs of ‘a, b’ for each base, as for ‘A’. Under modulo 5 addition, ‘x[a + 5, b + 5] = x[a, b]’ holds. Moreover, if ‘X^†’ is obtained from ‘X’ using ‘X = E◦x[a, b]’, ‘X^†’ can be determined as

‘ X^{†} = E \circ x^{- 1} [a, b] = E \circ x [- a, - b] ’,

(22)

or ‘ X^{†} = E \circ x [5 - a, 5 - b] ’,

(23)

because ‘X’ and ‘X^†’ are symmetrically disposed with respect to ‘E’ over the wallpaper pattern that would be selected as a standard for the definition of ‘a, b’. In practice, for an arbitrary ‘X’, ‘X^†’ can be obtained via (22) or (23) by making use of an arbitrary ‘E’ as the reference point for the symmetry.

For example, if ‘G = E◦x[1, 2]’, then according to (22) ‘G^† = E◦x⁻¹[1, 2] = E◦x[−2, −1] = C or according to (23) G^† = E◦x[1, 2] = E◦x[3, 4] = C’. There are an infinite number of identifications for the complementary base for an arbitrary base ‘X’.

Moreover, if we define the composition for the ‘x[a, b]’s as

‘ x [a_{1}, b_{1}] ● x [a_{2}, b_{2}] = x [a_{1}, b_{1}] + x [a_{1}, a_{2}] = x [a_{1} + a_{2}, b_{1} + b_{2}] ’,

(24)

we can confirm descriptions (20), (22) and (23). As for the operators ‘B_m’, with ‘D_j’ expressed as ‘D_j = […|E◦x[a_i , b_i] _i|…]’, one of the candidates of the appropriate ‘B_(j→j†)’s that produces ‘D_j◦B_(j→j†) = D_j^†’ is identified:

B_{(j \to j †)} = [\dots | x [- 2 a_{i}, - 2 b_{i}]_{i} | \dots] = [\dots |r^{- 2 ai} ● {u^{- 2 bi}}_{i}| \dots] = [\dots |l^{2 ai} ● {d^{2 bi}}_{i}| \dots],

(25a)

or = [\dots |r^{- 2 ai} ● {(r^{2})}^{- 2 bi}_{i}| \dots] = [\dots |{r^{- 2 ai - 4 bi}}_{i}| \dots] = [\dots |{l^{2 ai + 4 bi}}_{i}| \dots] (using ‘ u = r^{2} ’, ‘ d = l^{2} ’),

(25b)

or \leftrightarrow [\dots | (- 2 a_{i}) ω ● (- 2 b_{i}) \cdot {(2 ω)}_{i} |\dots]=[\dots| (- 2 a_{i} - 4 b_{i}) ω_{i} | \dots] (using ‘ r^{m} \leftrightarrow mω ’, ‘ u^{m} \leftrightarrow 2 mω ’) .

(25c)

The exponents ‘-2a_i -4b_i’ in (25b) are permitted to take positive or negative integer values.

In these expressions, the rules for the wallpaper group (25a) can also be expressed as either for the linear group or for the rotational group (25b or 25c).

More generally,

‘D_j = […|E◦x[a_(j)i , b_(j)i] _i|…]’ is changed into ‘D_k = […|E◦x[a_(k)i , b_(k)i] _i|…]’, and

‘B_(j→k)’s that provides ‘D_j◦B_(j→k) = D_j^†’ is identified as

\begin{array}{l} B_{(j \to k)} = [\dots | x[a_{(k) i} - a_{(j) i}, b_{(k) i} - b_{(j) i}]_{i} | \dots] = [\dots |r^{a (k) i - a (j) i} ● {u^{b (k) i - b (j) i}}_{i}| \dots] \\ (= [\dots |l^{- a (k) i + a (j) i} ● {d^{- b (k) i + b (j) i}}_{i}| \dots]) . \end{array}

(26a)

Also, = [\dots |r^{a (k) i - a (j) i} ● {(r^{2})}^{b (k) i - b (j) i}_{i}| \dots] = [\dots |r^{a (k) i - a (j) i} {^{+ 2 b (k) i - 2 b (j) i}}_{i}| \dots],

(26b)

or else \leftrightarrow [\dots |(a_{(k) i} - a_{(j) i}) ω ● 2 (b_{(k) i} - b_{(j) i}) ω_{i}| \dots] = [\dots |(a_{(k) i} - a_{(j) i} + 2 b_{(k) i} - 2 b_{(j) i}) ω_{i}| \dots] .

(26c)

As mentioned in §1, if a certain sequence ‘X’ has sense <5’ → 3’>, the complementary sequence ‘X^†’ of a certain sequence ‘X’ is reversed to <3’ → 5’ > .

To aid understanding, we present the following examples: Given

\begin{array}{l} D_{j} = [A_{1} |T_{2}| E_{3} |\dots| C_{i} |\dots| G_{N - 1} |A_{N}| E_{N + 1} |E_{N + 2}| |E_{N + 3}| \dots] \\ = [E \circ x {[0, 1]}_{1} |E \circ x {[0, - 1]}_{2}| |E \circ x {[0, 0]}_{3}| \dots |E \circ x {[1, 0]}_{i}| \dots \\ \dots |E \circ x {[- 1, 0]}_{N - 1}| E \circ x {[0, 1]}_{N} |E \circ x {[0, 0]}_{N + 1}| E \circ x {[0, 0]}_{N + 2} |E \circ x {[0, 0]}_{N + 3}| \dots] . \end{array}

then, according to (22), ‘D_j^†’ is simply

\begin{array}{l} {D_{j}}^{†} = [E \circ x^{- 1} {[0, 1]}_{1} | E \circ x^{- 1} [0, - 1]_{2} | E \circ x^{- 1} [0, 0]_{3} | \dots \\ \dots |E \circ x^{- 1} {[1, 0]}_{i}| \dots | E \circ x^{- 1} {[- 1, 0]}_{N - 1} | E \circ x^{- 1} [0, 1]_{N} |E \circ x^{- 1} {[0, 0]}_{N + 1}| E \circ x^{- 1} {[0, 0]}_{N + 2} |E \circ x^{- 1} {[0, 0]}_{N + 3}| \dots], \\ = [E \circ x[0, - 1]_{1} | E \circ x {[0, 1]}_{2} | E \circ x [0, 0]_{3} | \dots \\ \dots | E \circ x {[- 1, 0]}_{i} |\dots| E \circ x [1, 0]_{N - 1} | E \circ x [0, - 1]_{N} |E \circ x {[0, 0]}_{N + 1}| E \circ x {[0, 0]}_{N + 2} |E \circ x {[0, 0]}_{N + 3}| \dots], \\ = [T_{1} |A_{2}| E_{3} |\dots| G_{i} |\dots| C_{N - 1} |T_{N}| E_{N + 1} |E_{N + 2}| E_{N + 3} | \dots] . \end{array}

If we use the optional formula (25a − c), the relation ‘D_j◦B_(j→j†) = D_j^†’ is derived. Details are given in Appendix E.Apart from these examples, additional identities for the wallpaper group can be verified using Figure 1 or 6; e.g.,

\begin{array}{l} ‘ x [1, 0] ● x [0, 1] = x [1, 0] + x [0, 1] (= r ● u) = x [1 + 0, 0 + 1] = x [1, 1] = x [0, - 1] (= d) ’, \\ ‘ x [2, 0] (= r ● r) = x [0, 1] = u ’ . ‘ x [3, 1] ● x [- 1, 10] = x [3 - 1, 1 + 10] = x [2, 11] = x [2, 1] \\ = r^{2} ● u^{1} = (r ● r) ● u = u ● u = l ’ . \end{array}

We develop various general formulas:

\begin{array}{l} ‘ x [a + 5, b] = x [a, b] ’, ‘ x [a, b + 5] = x [a, b] ’, \\ ‘ x [2 a, - a] = x [0, 0] ’, ‘ x [a, 2 a] = x [0, 0] ’, \\ ‘ x [- 2 a, a] = x [0, 0] ’, ‘ x [- a, - 2 a] = x [0, 0] ’ . \end{array}

(27)

Other unknown rules might underlie the wallpaper pattern.

Concerning style in treating the wallpaper group, examples ‘X_m = ^rX(m) in (16, 17, 19), and ^ωX(m) in (14, 15, 20) could be regarded as a specific combination that are displayed as

‘ X_{m} =^{r} X (m) =^{ω} X (m) = E \circ x [a, 0] (a = \dots, - 2, - 1, 0, 1, 2, \dots, 5, 6, \dots; integer) ’ .

(28)

§6 Treatment of changes of sequences and the insertion/deletion of DNA bases via an optionally generalized operation

Below, we demonstrate, using several examples containing ‘E’s, changes and inclusion/exclusion of DNA bases using a more generalized scheme.

For definiteness, let ‘D_j’ be the sequence ‘CGT AT…C…TA’, we consider the change of its ‘1–3’ components ‘C₁G₂T₃’ into ‘G₁T₂A₃’, and moreover the insertion of two bases ‘GC’ between ‘T₃’ and ‘A₄’ denoted ‘( )’:

D_{j} = [C_{(j) 1} |G_{(j) 2}| T_{(j) 3} () A_{(j) 4} |\dots| C_{(j) i} |\dots| T_{(j) N - 1} |A_{(j) N}| E_{(j) N + 1} |E_{(j) N + 2}| E_{(j) N + 3} | \dots] .

We denote the result of this transformation as D_k,

\begin{array}{l} D_{k} = [G_{(k) 1} |T_{(k) 2}| A_{(k) 3} (G_{(k) 4} | C_{(k) 5}) A_{(k) 6} |\dots| C_{(k) i + 2} |\dots| T_{(k) N + 1} |A_{(k) N + 2}| E_{(k) N + 3} |E_{(k) N + 4}| \\ E_{(k) N + 5} \dots | \dots] . \end{array}

(29)

The procedure from D_j to D_k is described recursively to find operator ‘B_(j→h).

First, two ‘E’s are inserted after the 3rd component (this change is denoted ‘D_j → D_h’) in preparation for insertion of ‘GC’;

\begin{array}{l} D_{j} \to D_{h} = [C_{(h) 1} |G_{(h) 2}| T_{(h) 3} (E_{(h) 4} | E_{(h) 5}) A_{(h) 4 + 2} |\dots| C_{(h) i + 2} | \dots \\ \dots |T_{(h) N - 1 + 2}| A_{(h) N + 2} |E_{(h) N + 1 + 2}| E_{(h) N + 2 + 2} |E_{(h) N + 3 + 2}| \dots] . \end{array}

(30)

Thus, ‘B_(j→h)

= [b_{[C \to C] 1} |b_{[G \to G] 2}| b_{[T \to T] 3} (b_{[G \to E] 4} | b_{[C \to E] 5}) b_{[A \to A] 6} |\dots| b_{[C \to C] i} |\dots| b_{[T \to T] N - 1} | b_{[A \to A] N}] ’ .

This change is in accordance with those rules for vector-like ‘D_j’s dependent upon ‘E’s.

Hence, the operator B_(h→k) that produces the change from D_h to D_k is described as:

\begin{array}{l} B_{(h \to k)} = [b_{[C \to G] 1} |b_{[G \to T] 2}| b_{[T \to A] 3} (b_{[E \to G] 4} | b_{[E \to C] 5}) b_{[A \to A] 6} | \dots \\ \dots |b_{[C \to C] i + 2}| \dots |b_{[T \to T] N + 1}| b_{[A \to A] N + 2} |b_{[E \to E] N + 3}| b_{[E \to E] N + 4} |b_{[E \to E] N + 5}| \dots] . \end{array}

(31)

Thereby,

\begin{array}{l} D_{h} \circ B_{(h \to k)} \\ = [C_{1} \circ b_{[C \to G] 1} | G_{2} \circ b_{[G \to T] 2} | T_{3} \circ b_{[T \to A] 3} (E_{4} \circ b_{[E \to G] 4} | E_{5} \circ b_{[E \to C] 5}) A_{6} \circ b_{[A \to A] 6} | \dots | C_{i} \circ b_{[C \to C] i + 2} | \dots \\ | T_{N + 1} \circ b_{[T \to T] N + 1} | A_{N + 2} \circ b_{[A \to A] N + 2} | E_{N + 3} \circ b_{[E \to E] N + 3} | E_{N + 4} \circ b_{[E \to E] N + 4} | E_{N + 5} \circ b_{[E \to E] N + 5} | \dots] . \end{array}

(32)

With reference to Figure 1, 2, 6 or Appendix A,

\begin{array}{l} = [C_{1} \circ d_{1} |G_{2} \circ l_{2}| T_{3} \circ l_{3} (E_{4} \circ l_{4} | E_{5} \circ r_{5}) A_{6} \circ n_{6} |\dots| C_{i + 2} \circ n_{i + 2} | \dots \\ \dots |T_{N + 1} \circ n_{N + 1}| A_{N + 2} \circ n_{N + 2} |E_{N + 3} \circ n_{N + 3}| E_{N + 4} \circ n_{N + 4} |E_{N + 5} \circ n_{N + 5}| \dots], \end{array}

(33)

\begin{array}{l} = [G_{1} |T_{2}| A_{3} (G_{4} | C_{5}) A_{6} |\dots| C_{i + 2} |\dots| T_{N + 1} |A_{N + 2}| E_{N + 3} |E_{N + 4}| E_{N + 5} | \dots], \\ = D_{k} . (29) . \end{array}

(34)

This indicates a code change of the ‘1–3’ components and a ‘GC’ insertion after the 3rd as described via the two steps: 1) D_j → D_h (inserting two ‘E’s after the ‘3rd’ component), and 2) D_h ◦B_(h→k) = D_k. Note that the exclusion of the ‘4–5’ components ‘GC’ from D_k and the transformation of the ‘1–3’ components from ‘GTA’ to ‘CGT’ constitute the recursive procedure for the inverse operator

‘ B_{(k \to h)} = {B_{(h \to k)}}^{- 1} ’

(35)

Alternatively, ‘D_h → D_j’ is obtained by deleting the two ‘E’s from the ‘4–5’ components of D_h to yield the initial state ‘D_j’ in accordance with the characteristics of the vector-like ‘D_j’s.

In summary, essentially, all transitions (changes and inclusion/exclusion) of a certain sequence within the same single-stranded DNA, whether it has finite or infinite length, can be described in principle within a single operation using only the unique operator B_{(… →… )} ∈ group B.

§7 Synthesis of changes, insertion/deletion, and recombination of DNA bases

As a further development, to demonstrate recombination, take two finite sequences ‘GETAGT (= D_c1)’ and ‘ATAGCTA (= D_d1)’. These have vector expressions

\begin{array}{l} D_{c 1} = [G_{1} E_{2} \underline{T_{3} A_{4} G_{5} T_{6}} | E_{7} E_{8} E_{9} \dots], \\ = [G_{(c 1) 1} |E_{(c 1) 2}| T_{(c 1) 3} |A_{(c 1) 4}| G_{(c 1) 5} |T_{(c 1) 6}| E_{(c 1) 7} |E_{(c 1) 8}| E_{(c 1) 9} | \dots], \end{array}

(36)

\begin{array}{l} D_{d 1} = [A_{1} T_{2} \underline{A_{3} G_{4} C_{5} T_{6} A_{7}} | E_{8} E_{9} E_{10} \dots], \\ = [A_{(d 1) 1} |T_{(d 1) 2}| A_{(d 1) 3} |G_{(d 1) 4}| C_{(d 1) 5} |T_{(d 1) 6}| A_{(d 1) 7} |E_{(d 1) 8}| E_{(d 1) 9} |E_{(d 1) 10}| \dots] . \end{array}

(37)

To illustrate for the pair D_c1 and D_d1, we consider recombination to take place between the sequence ‘T₃A₄G₅T₆’ of the (3–6)-th component of ‘D_c1’ and the ‘AGCTA’ of the (3–7)-th component of ‘D_d1’ at the same instant.

First, in the pair of sequences, a series of ‘E’s of complementary size is inserted in ‘D_c1’ just before the sequence to be converted, and in ‘D_d1’ just after the sequence to be converted. For example, for ‘D_c1’, five ‘E’s, ‘EEEEE’, of size equivalent to that of ‘A₃G₄C₅T₆A₇’ of ‘D_d1’, are inserted just before ‘T₃’ in ‘D_c1’ where ‘A₃G₄C₅T₆A₇’ is to be located, that is, the interval between ‘the 2nd ‘E₂’ and 3rd ‘T₃’ within ‘D_c1’. Under this procedure, D_c1 changes into D_c2:

\begin{array}{l} D_{c 2} = [G_{1} E_{2} (EEEEE) \underline{T_{3 + 5} A_{4 + 5} G_{5 + 5} T_{6 + 5}} | E_{7 + 5} E_{8 + 5} E_{9 + 5} \dots], \\ = [G_{1} E_{2} (EEEEE) \underline{T_{8} A_{9} G_{10} T_{11}} | E_{12} E_{13} E_{14} \dots] . \end{array}

(38)

Here, we assume that ‘EEEEE’ is changed into ‘A₃G₄C₅T₆A₇’ (originally, the (3–7)-th component of ‘D_d1’). In addition, ‘T₈A₉G₁₀T₁₁’ is transformed into the same number of ‘E’s, ‘EEEE’, at the same time. By this process, ‘D_c2’ changes in ‘D_c3’:

D_{c 3} = [G_{1} E_{2} (A_{3} G_{4} C_{5} T_{6} A_{7}) \underline{E_{8} E_{9} E_{10} E_{11}} | E_{12} E_{13} E_{14} \dots] .

(39)

Note that bold type and underline are here merely pedagogical aids to help identify sequence changes. Meanwhile, four ‘E’s ‘EEEE’ equivalent in size to ‘T₃A₄G₅T₆’ of ‘D_c1’ would be inserted after ‘A₇’ of ‘D_d1’ where ‘T₃A₄G₅T₆’ of ‘D_c1’ is to be located within ‘D_d1’. That is, ‘T₃A₄G₅T₆’ is inserted into the interval between the 7th ‘A₇’ and 8th ‘E₈’ within ‘D_d1’. In this procedure, D_d1 changes into D_d2:

\begin{array}{l} D_{d 2} = [A_{1} T_{2} \underline{A_{3} G_{4} C_{5} T_{6} A_{7}} (EEEE) E_{8 + 4} E_{9 + 4} E_{10 + 4} \dots], \\ = [A_{1} T_{2} \underline{A_{3} G_{4} C_{5} T_{6} A_{7}} (EEEE) E_{12} E_{13} E_{14} \dots] . \end{array}

(40)

Furthermore, we change ‘EEEE’ into the equivalent-sized ‘T₈A₉G₁₀T₁₁’ (originally, the (3-6)-th components of ‘D_c1’) while ‘A₃G₄C₅T₆A₇’ is transformed into the equivalent-sized ‘EEEEE’. Through this procedure, ‘D_d2’ changes in ‘D_d3’:

D_{d 3} = [\underline{A_{1} T_{2} E_{3} E_{4} E_{5} E_{6} E_{7}} (T_{8} A_{9} G_{10} T_{11}) E_{12} E_{13} E_{14} \dots]

(41)

As a result, if we omit the infinite series of ‘E’s from right end, we have the recombination (partial conversion between this pair of sequences from ‘D_c1, D_d1’) with ‘D_c1’ = ‘G₁E₂T₃A₄G₅T₆’ being transformed into ‘D_c3’ = ‘G₁E₂A₃G₄C₅T₆A₇’ and ‘D_d1’ = ‘A₁T₂A₃G₄C₅T₆A₇’ being transformed into ‘D_d3’ = ‘A₁T₂T₃A₄G₅T₆’. We define the manipulation of the recombination (partial/total conversion) between ‘D_c1, D_d1’ in this way.

In the initial stage in the previous illustration, we inserted different sizes of ‘E’ sequences in each line; however, processes ‘D_c1 → D_c2’ and ‘D_d1 → D_d2’ are preferred to be regarded as ‘E’ insertions/deletions (see comments prior to equation (5)) and this rule depends upon the characteristics of these vectors (e.g., ‘D_j’s).

As previously explained, the operations can be performed in any of the three equivalent linear group, rotational group, and wallpaper group. Choosing the wallpaper group,

\begin{array}{l} D_{c 2} \circ B_{(c 2 \to c 3)} \\ = [G_{1} \circ n_{1} | E_{2} \circ n_{2} (E_{3} \circ u_{3} |E_{4} \circ l_{4}| E_{5} \circ r_{5} |E_{6} \circ d_{6}| E_{7} \circ u_{7}) \underline{T_{8} \circ u_{8}} | \underline{A_{9} \circ d_{9}} | \underline{G_{10} \circ r_{10}} | \underline{T_{11} \circ u_{11}} |E_{12} \circ n_{12}| E_{13} \circ n_{13} |E_{14} \circ n_{14}| \dots], \\ = [G_{1} | E_{2} (A_{3} |G_{4}| C_{5} |T_{6}| A_{7}) \underline{E_{8}} | \underline{E_{9}} | \underline{E_{10}} | \underline{E_{11}} |E_{12}| E_{13} |E_{14}| \dots], \end{array}

(42)

\begin{array}{l} = D_{c 3}, \\ where B_{(c 2 \to c 3)} = [n_{1} | n_{2} (u_{3} |l_{4}| r_{5} |d_{6}| u_{7}) \underline{u_{8}} | \underline{d_{9}} | \underline{r_{10}} | \underline{u_{11}} |n_{12}| n_{13} |n_{14}| \dots] . \end{array}

(43)

Also,

\begin{array}{l} D_{d 2} \circ B_{(d 2 \to d 3)} \\ = [A_{1} \circ n_{1} |T_{2} \circ n_{2}| \underline{A_{3} \circ d_{3}} | \underline{G_{4} \circ r_{4}} | \underline{C_{5} \circ l_{5}} | \underline{T_{6} \circ u_{6}} | \underline{A_{7} \circ d_{7}} \\ (E_{8} \circ d_{8} |E_{9} \circ u_{9}| E_{10} \circ l_{10} | E_{11} \circ d_{11}) E_{12} \circ n_{12} |E_{13} \circ n_{13}| E_{14} \circ n_{4} | \dots], \\ = [A_{1} |T_{2}| \underline{E_{3}} | \underline{E_{4}} | \underline{E_{5}} | \underline{E_{6}} | \underline{E_{7}} (T_{8} |A_{9}| G_{10} | T_{11}) E_{12} |E_{13}| E_{14} | \dots], \end{array}

(44)

\begin{array}{l} = D_{d 3}, \\ where B_{(d 2 \to d 3)} = [n_{1} |n_{2}| \underline{d_{3}} | \underline{r_{4}} | \underline{l_{5}} | \underline{u_{6}} | \underline{d_{7}} (d_{8} |u_{9}| l_{10} | d_{11}) n_{12} |n_{13}| n_{14} | \dots] . \end{array}

(45)

With respect to (42) and (44), the inverse identities are confirmed:

{B_{(c 2 \to c 3)}}^{- 1} = B_{(d 2 \to d 3)} .

(46)

Generally, B_{(_→_)} giving transition ‘D_c2 → D_c3’ automatically produces an inverse change for ‘D_d2 → D_d3’, as stated in (46) and reduces troublesome manipulations, even if only partially.

§8 Further applications of the composition category-like prototypal model using additional ribonucleic acid (RNA)

We next comment on other possible applications of the model. The category theory-like construction for treating DNA transcription to RNA might be conceivable, and the combination of the set and the group can comprise a category when these satisfy category theory postulates [48, 49]. That is because we believe that in future developments the discussion should embrace category theory as one of the important options.

To begin, according to our description for handling ‘E’s, it seems difficult to define inverse elements in a group theoretical way when there are deletions of ‘E’s from any place in a sequence because we cannot find sufficient numbers of ‘E’s in the target component of D_j.

Thus, we consider the ‘morphism f’ that transforms the sequence of DNA bases within set D as follows [48, 49].

morphism f : X → X, dom(f) = D_j, cod(f) = D_k. Object ‘X’ is the set of ‘D_j’s. There exists a morphism ‘1_X’ such that ‘1_X●f = f = f●1_X’ for every ‘morphism f’, when ‘1_X’ = [n₁|n₂|n₃|…|n_i|…|n_N-1|n_N|…] (∈ group B). If supplemented, the ‘morphisms f’ comprise ‘group B’ (see reference list in Figure 7). The group composition for ‘f₁’ and ‘f₂’ is denoted ‘f₁●f₂’.

As mentioned earlier, sequences of DNA consisting of bases ‘C, A, T, and G’ are transcribed into RNA consisting of ‘C, A, U, and G’. This process can be regarded as the combination of two manipulation; Ι) transcription from the original DNA sequences (D_j) to those of its complement (D_j^†), and ΙΙ) alternation from ‘C, A, T, and G’ to ‘C, A, U, and G’ (D_j^† → R_j^†) (both are illustrated in Figure 8).

Step Ι

The transformation from the original DNA bases ‘D_j’ into the complementary sequence ‘D_j^†’ (e.g., ‘TCATEAGCTGA…’ → ‘AGTAETCGACT…’) (for transcription to pre-messenger RNA (pre-mRNA) before splicing) can be performed via the manipulation (13–15, 17, 18, 20, 22, 23, 25a–c) in §3 and §4. ‘D_j^†’ can be obtained via the linear group (17, 18, 20), the rotational group (14, 15, 20) and also the wallpaper group (21–23, 25a–c, 26a–c). Thereby, morphism ρ : X → Y, dom(ρ) = D_j, cod(ρ) = D_j^†. Object ‘Y’ is the set of ‘D_j^†’s (essentially equivalent to the set of ‘D_j’s).

There exist morphisms ‘1_X’ and ‘1_Y’ such that ‘1_Y●ρ = ρ = ρ●1_X’ for ‘morphism ρ’, where

‘ 1_{X} = 1_{Y} = [n_{1} |n_{2}| n_{3} |\dots| n_{i} |\dots| n_{N - 1} |n_{N}| \dots] ’ .

(47)

However, in practice, morphism ρ is one of the ‘B_m’s ∈ group B (see Figures 7 and 8).

Step ΙΙ

Next, we define manipulations that change the above ‘D_j^†’ into ‘R_j^†’ where all ‘T’s are converted into ‘U’s; e.g., (D_j^†=) [A|G|T|A|E|T|C|G|A|C|T|…] → (R_j^†=) [A|G|U|A|E|U|C|G|A|C|U|…]’.

This process can also be expressed in a similar way as transcription.

morphism τ : Y → Z, dom(τ) = D_j^†, cod(ρ) = R_j^†. Object ‘Z’ is the set of ‘R_j^†’s. There exist morphisms ‘1_Y’ and ‘1_Z’ such that ‘1_Z●τ = τ = τ●1_Y’ for every ‘morphism τ’, where

‘ 1_{Y} = 1_{Z} = [n_{1} |n_{2}| n_{3} |\dots| n_{i} |\dots| n_{N - 1} |n_{N}| \dots] ’ .

(48)

(refer to Figures 7 and 8).

Evidently, morphism τ does not satisfy the group postulates because the source object ‘Y’ and target object ‘Z’ are different and a single set of operations cannot be defined at this stage.

Additionally, as for Steps І and ΙΙ, the resultant process for morphisms ρ and τ can be expressed as:

morphism g = ρ ● τ : X \to Z,

(49)

dom(g) = D_j, cod(g) = R_j^† (see Figures 7 and 8).

There exist morphisms ‘ 1_{X} ’ and ‘ 1_{Z} ’ such that ‘ 1_{Z} ● g = g = g ● 1_{X} ’ .

(50)

The only difference between D_j^† and R_j is the appearance ‘T’ and ‘U’ in the sequences.

Naturally, for RNA base sequences, similar treatments are possible in the single group B:

morphism h : Z → Z, dom(h) = R_j^†, cod(h) = R_k^† (Figures 7 and 8).

There exists morphism ‘ 1_{Z} ’ such that ‘ 1_{Z} ● h = h = h ● 1_{Z} ’ .

(51)

Ordinarily, in prokaryotic cells, the DNA sequences are transcribed along their entire length. For eukaryotic cell, a splicing process is needed using nascent pre-messenger RNA (pre-mRNA) where introns of DNA bases are removed and exons are joined before producing a correct protein through translation, resulting in the mature messenger RNA (mRNA). Thus, the previous procedure was about the prokaryotic cell or the pre-translation of pre-mRNA in the eukaryotic cell. Therefore, to treat the products after this RNA splicing procedure in the eukaryotic cell, the following approach might be possible. The removal of introns can be regarded as changes from a certain series of bases to ‘E’s as follows.

If ‘GUA’ is removed from ‘A(GUA)EUCGACU…’ to become ‘ A( )EUCGACU…’, this procedure can be described as; ‘R_j^† → Rs_j^†’,

‘ {R_{j}}^{†} = [A_{1} (G_{2} |U_{3}| A_{4}) E_{5} |U_{6}| C_{7} | G_{8} |A_{9}| C_{10} |U_{11}| E_{12} |E_{13}| E_{14} | \dots] ’,

(52)

‘ R {s_{j}}^{†} = [A_{1} (E_{2} |E_{3}| E_{4}) E_{5} |U_{6}| C_{7} | G_{8} |A_{9}| C_{10} |U_{11}| E_{12} |E_{13}| E_{14} | \dots] ’ .

(53)

The ‘Rs_j^†’ form a set Rs = {Rs_j^† (j = 1,2,3,…)} that is a part of set R (see Figures 7 and 8).

Hereon, we admit ‘E’s in the sequences of RNAs (as elements of set Rs) during the operations before morphism ‘f’ and after morphism ‘j’ to maintain theoretical consistency. Thus, if the result of a series of these maps is ‘Rs_j^† = A₁E₂E₃E₄E₅U₆C₇G₈A₉C₁₀U₁₁E₁₂E₁₃E₁₄…’, then the actual RNA sequence should be interpreted as ‘AUC…’. Specifically, an equivalent-sized substitution of some bases in pre-mRNA with ‘E’s can be written morphism j: Z → Zs, dom(j) = R_j^†, cod(j) = Rs_j^†. There exists a morphism ‘1_Zs’ such that ‘1_Zs●j = j = j●1_Zs’.

‘j’ changes some series of bases from ‘C, A, U, G, E’ to an equivalent-sized series of ‘E’s within the partial operations of the group B. However, morphism ‘j’ fails the group axioms, as inverse might not be definable.

Finally, as in §1, we apply the simultaneous deletions of all explicit ‘E’s of mRNA other than the trailing ‘E’s, the state after these deletions being denoted with ‘< >’; for

\begin{array}{l} ‘ R {s_{j}}^{†} = [A_{1} E_{2} E_{3} E_{4} E_{5} U_{6} C_{7} G_{8} A_{9} C_{10} U_{11} E_{12} E_{13} E_{14} \dots] \\ = [A_{1} (E_{2} E_{3} E_{4}) E_{5} U_{6} C_{7} G_{8} A_{9} C_{10} U_{11} E_{12} E_{13} E_{14} \dots] ’, \end{array}

the description ‘<Rs_j^† > = [A₁U₂C₃G₄A₅C₆U₇…]’ is specified without explicit non-trailing ‘E’s. In this regard, as in §1, if some indels (insertions/deletions) occur at certain bases of < Rs_j^†>, as for ‘<Rs_j1^† > = [A₁U₂(E₃)G₄(E₅)C₆U₇…]’ (with the deletion of ‘C₃’ and ‘A₅’), we state the result as ‘<<Rs_j1^†>> = [A₁U₂G₃C₄U₅…]’. ‘R_j^†’s include < Rs_j^† > s and ‘<<Rs_j^†> > s from the set R and both still satisfy the postulates of group B. This rule is a relative postulate and explicit ‘E’s are not absolutely forbidden in ‘<Rs_j^†>’s or ‘<<Rs_j^†>>’s, hence further indels of ‘E’s into ‘<Rs_j^†>’s or ‘<<Rs_j^†>>’s are not forbidden.

Also, omissions of explicit ‘E’s are considered as in ‘{<Rs_j1^†>} = [A₁U₂G₄C₆U₇…], where place numbers ‘3’ and ‘5’ are absent indicating implicitly their presence in the vector. (Note that all products belong to group B.) Similarly, t-tuples of ‘< >’s are denoted ‘<<<<Rs_j^†>>>> (t-tuple) = <Rs_j^†> _t’ representing multiple deletions of ‘E’s (t-times). Combinations of symbols ‘{ }’ and ‘< >’ are also allowed when necessary, as for example {<{{<Rs_j1>}}>}, as long as the subscripted place numbers are adequately recognized/traced.

Nevertheless, the multiple use of ‘< >’ to remove all ‘E’s in the vector ‘Rs_j^†’ should have a unique meaning with regard to protein synthesis. As a result, the subsequent reading/translation in line with ‘codon’-like ‘[AUC|GAC|U…]’, ‘[A|UCG|ACU|…]’ or ‘[AU|CGA|CU…]’ leads in an ordinal way to a description of protein synthesis. Through the use of ‘{ }’ and/or ‘< >’, the concept ‘E’ may have benefits, although this may need to be intensely explored in future studies.

The procedure reversing transcription, found for example in retrovirus, is also describable if additional options are added to the scheme. However, these options are omitted at this stage to keep the model simple.In summary, suppose we have a ‘category C’ with objects ‘X’, ‘Y’, ‘Z’, ‘Zs’ and morphisms ‘f’ , ‘ρ’ , ‘τ’ , ‘g’ , ‘h’ and ‘j’. We affirm that these definitions satisfy the postulates of category. A list is given in Figure 7. Indeed, morphisms other than ‘τ’ and ‘g’ are simple group-theoretical products. One of the reasons we have introduced the concept ‘category’ is that the translation from single-strand DNAs to RNAs is difficult or impossible to systematize as a group structure. Therefore, if we identify the differences, we can treat all manipulations, except for ‘τ’ and ‘g’, based simply on group B.

The expression ‘hom(X, X)’ denotes all morphisms f: ‘from X to X’. Likewise, ‘hom(X, Y)’ denotes all morphisms ρ: ‘from X to Y’. In addition, ‘hom(Y, Z)’ denotes all morphisms τ: ‘from Y to Z, and hom(X, Z) denotes all morphism g: from X to Z. Then, hom(Z, Z) denotes all morphisms h: ‘from Z to Z’. Finally, hom(Y, Z) denotes all morphism h: from Y to Z. (Details are displayed in Appendix F)

As is explained in §3 and §4, the rotational group can be regarded as a specific bijection of the wallpaper group [2, 44–47], so, we can describe this relationship naturally in a category theory-like way where two categories C₁ and C₂ are linked.

First, we consider two categories C₁ and C₂ with a ‘functor F’ from C₁ to C₂ written ‘F: C₁ → C₂’. For example, the pre-category C is denoted C₁ and the product of functor F on category C₁ is denoted C₂[48, 49]. Note that the only difference between C₁ and C₂ is assumed to be the nature of its expression; morphism f₁ = B₁ (∈ category C₁) is based on the wallpaper group in Figure 1 or 6; e.g.,

‘ B_{1} = [r_{1} |l_{2}| u_{3} |\dots| n_{i} |\dots| r_{N - 1} |d_{N}| \dots] = [\dots |x {[a_{i}, b_{i}]}_{i} \dots|] = [\dots |r^{ai} ● {u^{bi}}_{i}| \dots] (\in group B_{1}) ’ .

(54)

Additionally, morphism f₂ = B₂ (∈ category C₂) is based on the rotational group over the Gaussian plane in Figure 5; e.g.,

‘ B_{2} = [ω_{1 1} |ω_{4 2}| ω_{2 3} |\dots| ω_{0 i} |\dots| ω_{1 N - 1} |ω_{3 N}| \dots] = [\dots |(a_{i} + 2 b_{i}) ω_{i}| \dots] (\in group B_{2}) ’ .

(55)

With regard to the identity morphisms, we have

\begin{array}{l} ‘ 1_{X 1} = 1_{Y 1} = 1_{Z 1} = 1_{Zs 1} = [n_{1} |n_{2}| n_{3} |\dots| n_{i} |\dots| n_{N - 1} |n_{N}| \dots], 1_{X 2} = 1_{Y 2} = 1_{Z 2} \\ = 1_{Zs 2} = [ω_{0 1} |ω_{0 2}| ω_{0 3} |\dots| ω_{0 i} |\dots| ω_{0 N - 1} |ω_{0 N}| \dots] . \end{array}

(56)

Herein, we view ‘functor F: C₁ → F(C₁) (= C₂)’ in following way [48, 49].

(Details are shown in Appendix G)

Note that a similar definition like the composition of C₁ based on the linear group and C₂ based on rotational group is possible, being linked with ‘functor F’.

This is satisfied provided an adequate definition of ‘Functor F’ is given, and we presume that the morphisms described previously formulates a model that renders one of forms of the canonical “central dogma” proposed by Crick in 1958 [26].

In the transcription of RNA bases, the ‘RNA splicing’ process is well-known, whereby ‘intron sequences’ are excised, and ‘exon sequences’ are combined to condense effectual information for further interpretation in protein synthesis. However, as for further processing of the triplets of bases e.g., ‘ACG’ and ‘AUG’, a considerable number of models have been reported e.g., [12, 18–21, 33, 34, 50]. We refrain from pursuing this issue at present.

Results

We added an imaginary base ‘E’ to the set of actual DNA bases, and composed group Z₅ of basic translational operations on grid-points of a cruciform wallpaper pattern constructed of the five base letters. Moreover, using the same five letters, we integrated the wallpaper group as the combination of linear group over the horizontal line and the rotational group based on symmetries of a fivefold phasor diagram on the unit circle in the Gaussian plane. Additionally, changes in the sequences of the DNA bases are treated using set D, the set of all possible sequences of DNA bases that also contain ‘E’. Also, ‘D_j’s are drawn as polygonal lines graphically. Moreover, by combining group Z₅, the operators that rearrange bases of DNA sequences constitute the group B. Using these results, simple changes of sequences, insertions/deletions, and recombination of DNA bases are also treatable via a synthesis of group-theoretical operations between sets D and group B. Together with this, all results obtained for DNA pertain to RNA by replacing T with U. Using these tools, category theory-like language is introduced to describe the canonical “central dogma” that is expected to integrate DNA-based processes, although the overall profile and range of applicability is unclear at this stage. Alternatively, by introducing the manipulations ‘{ }’ and ‘< >’, operations on states of ‘E’s in ‘D_j’s/‘R_j’s, whether explicit or not, can be performed in parallel with the conventional description for DNA/RNA sequences.

Discussion

The issue treated in this article is, roughly speaking, the combination of two ideas: one is the wallpaper pattern in the context of DNA sequencing ‘§1 – §7’ , and the other is the tentative development towards systematization of molecular/genetic biology in the style of some category-theory-like description ‘§8’. Essentially, the two are different topics although strongly connected. The former is an independent study on symmetry modeling of DNA sequences, whereas the latter can be re-expressed using different material as long as the basic elements can be treated within a category-theory-like model satisfying group theory-like postulates.

In this article, we considered a group/category theory-like treatment devising an expedient ‘E’, grid-point array, and group operations to move over the array. We discussed whether and how a more synthesized description can be constructed, using simplistic postulates of group. Next, we take the basics of category theory to describe processes, although this is only at a preliminary stage.

For an application of our ideas, we have chosen DNA sequencing from the perspective of not only coding sequences of DNA bases but also describing insertions/deletions of DNA sequences using a single operation that is an element of a group. The expeditious ‘E’ permits inserting and/or deleting sequences depending on the purpose. Specific notation was introduced so that vector-like DNA sequences and operations can be composed as a set and a group.

Ordinarily, a method to describe DNA sequences is often limited in scope by focusing on only one aspect such as recognizing each base sequentially (e.g., A, G, T, A, C…from ‘AGTAC…’) [13, 33–35], where an operation like ‘rotation’, ‘transition’ or ‘conversion’ based on a certain solid is often used. Another focus of attention is the rules for interpretation of codons in synthesizing proteins from DNA sequences. The rules are defined to capture the specific activity from the viewpoint of group-like operations [18–22].It also enables us to treat three manipulations as one type of operation in the group, with easily-imaginable graphic displays such as Figure 4, although it is only an accessorial tool at this stage. Increasing the degree of freedom by one and integrating changes of coding and sequence recombination might yield some polysemous utility.

When inserting/deleting sequences of bases into the main DNA sequence, even if the endpoints of the base series are identified precisely, it appears that manipulations via ‘E’s are not always necessary. Nonetheless, to determine the final order of the bases in these cases, we must track base changes from one to the next (including ‘E’, even when lost or deleted). If we use the ‘E’-assisted manipulations for coding, we need only to examine the inclusions; the rest remains unchanged in order. Additionally, we assume that when any operation is performed, the position and number of ‘E’s should be fixed so that the order of any component of ‘D_j’ or ‘B_j’ is not changed, at least, during operations (e.g., (A.7)). The exception is specifically the insertion/deletion of ‘E’s such as in (29–34) and (36–45).

We briefly point out the notational benefits of the imaginary ‘E’s. These are three; 1) to adjust the sorts (number) of bases in DNAs and RNAs (from ‘4’ to ‘5’), thus enabling group-theoretical composition over the (two-dimensional) plane; 2) to link the notational sequences of DNAs and mRNAs in a single format that can be used in a more compact database to record and analyze genetic information; and 3) to express sequences of DNAs/RNAs as a vector in three different ways: a) with explicit ‘E’s in the vector, b) with implicit ‘E’s in the vector, and c) with all ‘E’s omitted in the vector except the trailing ‘E’s. The last offers flexibility in storing world-wide genetic data in a single set. We suggest that exhaustiveness is one of the potentialities of the model adding versatility in addressing the possibilities of certain behaviors of DNA/RNA sequences. While that might be far from practical applications at this moment, a more rigorous methodology in the near future may yield a means.Regarding style of the grid-point/cruciform/wallpaper pattern (Figure 1) in defining the group postulates, one of its advantages is that each base is surrounded by the four others. This symmetrical simplicity is absent in the linear group and the rotational group (Figures 1, 5 and 6), where the relative position of the five bases is fixed and thereby restrictive. Also, it might be crucial that the number ‘5’ is key in enabling composition of the sort provided by the wallpaper group using the cruciform, and an identity element necessary to satisfy the group postulates. Being a prime, ‘5’ will be convenient in further developments of the model exploiting algebraic structures such as rings or fields.

A similar synthesis might be possible between a modulo 7 additive rotational group based on a sevenfold phasor diagram with a space group depending upon six ‘forward/backward’, ‘up/down’ and ‘left/right’ directions. In practice, a space group is formed that consists of three orthogonal cruciforms comprising the six directions (±x, ±y, ±z) with seven elemental operations {m_u(up), m_d(down), m_r(right), m_l(left), m_f(forward), m_b(backward) and m_n(no movement)}. These determine the operations of the group, which permute seven bases (prime number) or seven letter-like constituents. Analogous to the fivefold phasor diagram, we draw equispaced elements on the unit circle over the Gaussian plane; suppose ‘φ = 2π/7 (rad)’, then the set {φ₁(= φ), φ₂(= 2φ), φ₃(= 3φ), φ₄(= 4φ), φ₅(= 5φ), φ₆(= 6φ), φ₀(= φ₇ = 0)} parameterizes the rotational group [51], and both are, at least, in partial correspondence. We presume that, in extension, bringing together an n-dimensional space group (using the 2n + 1 elements associated with the ± n-directions and E) and a rotational group based on the n-fold phasor diagram on the unit circle (with 2n + 1 elements as points of the vertex of a polygon) might be possible. For this article, we have just focused on ‘n = 2’ in §1–§8.

Apart from the above, the model based on the wallpaper pattern might have a close relationship with ‘cellular automata’ [52]. Appropriate definitions of the wallpaper pattern for the five bases might find an expression between groups and cellular automata [53].

One consideration concerns whether a more integrated/synthesized style to describe biomolecular processes is possible using only simple, primitive defining rules, in particular, when describing genetic processes such as DNA transcription and RNA synthesis of proteins. Whereas the group postulates might be too restrictive to define molecular behavior, category postulates might enable such schemes to proceed because its postulates are weaker than those defining a group. If the interpretation of DNA by messenger RNA is definable within category theory, and protein synthesis is expressible within the same theory, there might be advantages in having the molecular system classified and treated in a reduced size in the database. At least, we conjecture that these ideas might be valid when clarifying impossible phenomena associated with changes of DNA sequences, resulting in reducing unnecessary, recrementitious efforts or roundabout paths that might encroach on researchers’ limited time for investigation. That issue might be avoided if the impossibility of certain themes was known beforehand. From this standpoint, we believe that a mathematical systematization (in a general and unexceptional manner) is crucially important for future molecular/genetic biology.

The limitations of the present model should be noted. First, the wallpaper pattern drawn in Figure 1 is one example of various patterns. In general, the wallpaper groups have been classified into seventeen categories [2, 44–47]. There could be other types of patterns like Figure 1 and groups upon which to compose this sort of model. For instance, if we exchange all ‘A’s for all ‘C’s, and all ‘G’s for all ‘T’s in the model presented in this article, an almost equivalent model ‘§1–§8’ is constructed. Other arrangements might provide still unknown advantages that enable models like ours to be treated in a more rational manner. It remains unclear how to construct an optimal method to determine models yielding the wallpaper pattern of Figure 1 and the bijection given in Figure 3, and to develop the categories presented in Figures 7 and 8. The best positions of the five bases should be examined under a rigorous methodology.

Second, a Cartesian vector is defined as a combination of components on which operations are conducted independently. Indeed, we can perform operations on the i-th component of D_j of set D using the (i + 1)-th component of B_j by adding an ‘E’ any place before the (i-1)-th component of D_j. This is because the components after the i-th of B_j shift to the right within the vector B_j. Therefore, in appearance, the components at different positions are essentially spectators (e.g., a base ‘C₃’ cannot change into either ‘A₅’ or G₅ by any B_j except via ‘E’-assisted manipulations). In this case, after the insertion of two ‘E’s between the ‘2’ and ‘3’ components, ‘C₃₊₂(= C₅)’ can become either ‘A₅’ or G₅’ by acting appropriately on B_j at C₅. However, that might raise some confusion. For ‘E’-assisted operations (such as 29–34, 38, 40), the results might change according to the place number of inserted/deleted ‘E’s that yields the mis-matches between the ‘i-th’ component of ‘D_j’ and that of ‘B_j’. We believe further studies are warranted to find a descriptive format for the model.

Third, as for the graphical displays of ‘D’s in Figure 4, although the sequences of ‘D_m’s (m = 1, 2, 3) are in reality the same, the respective expressions are not always unique because the presence of the imaginary ‘E’s changes the shape of each sequence; e.g.,‘D₁ = [A₁C₂C₃( )G₄T₅E₆E₇E₈…]’ and ‘D₂ = [A₁C₂C₃(E₄E₅)G₆T₇E₈E₉E₁₀…]’ are different over the wallpaper pattern despite being equivalent as real sequences. Although by use of electronic tools, these graphics might be of versatility for detection or identification of DNA sequences, these might produce other confusions in the present form. We hope that more appropriate devices would be performed in future study.

Fourth, DNA transcription to RNA and/or mRNA and translation of RNA and/or mRNA into proteins at the ribosomes are performed using a grammar rule based on a three-base set called a ‘codon’. Codons have information to synthesize twenty types of proteins; for example, ‘CAG’ codes for ‘glutamine’. As mentioned before, a number of approaches have been proposed exploit group-theoretic methods. These cover the rules for composition of triplet of bases ‘XXX’, the ways of reading codons, and models to compose geometric solids such as the tetrahedron and hexahedron, [12, 18–21, 33, 34, 50]. The rules for treating this aspect (transcription and translation of DNA bases’ information) are not established in the present article. In addition, there are specific types of codon, such as ‘TAA’, ‘TGA’, and ‘TAG’, which are presently classified as ‘stop’ or ‘halt’ commands. Aside from this, there are various rules related to biogenetic activities such as DNA repair, alternative splicing, transposition, and translocation. These specific characterizations are lacking in our model, so, further improvements on this issue are desirable.

Fifth, the traditional symmetry model of DNA bases often is based on the chemical types ‘purine/pyrimidine’, ‘amino/keto’, and ‘strong/weak hydrogen binding’ using biomolecular characteristics, which often have advantages for their treatments where three-dimensional graphics aid the imagination, and `matricized’ expressions are possible [29, 35, 36]. In our model, we merely use a rule for complementary pairing in §4 and §5. No restriction on couplings between ‘C, A, T, G and E’ is postulated in the present article. There might occur a number of combinations where non-realistic pairings of bases (e.g., ‘A-G’, ‘C-C’, and ‘T-E’) produce futilities and wastefulness in applications. We hope that future studies can solve this problem.

Sixth, there might be too many speculative conjectures with hypothetical situations those should be used to prove scientific facts using verified methods. Thus, a more rigorous examination for a rational style with a more effective methodology is necessary.

Our model is far from a complete systematization. However, we believe that it is necessary that some principal breakthrough should be pursued if we intend to systemize a descriptive model, and that if appropriate definitions are devised, that might help to systemize biomolecular/genetic biology in a more optimized manner with greater sophistication to make a significant contribution to the field.

Conclusions

Within the large limitations of our methodology, it is considered that there is fertile ground where variants of the symmetry model for genetic coding based upon a specific wallpaper group are constructible. By integrating the linear group and rotational group over a specific wallpaper pattern, a more integrated formulation based on a group/category theory-like description is open to exploration in applications to a number of topics from molecular/genetic biology.

Appendix A

According to Figures 1, 3 and 6, the following relationships are confirmed straightforwardly between any bases and independently of the type of bases:

\begin{array}{l} d ● d = l ● u = u ● l = r ● n = n ● r = r, \\ r ● r = d ● l = l ● d = u ● n = n ● u = u, \\ l ● l = r ● u = u ● r = d ● n = n ● d = d, \\ u ● u = r ● d = d ● r = l ● n = n ● l = l, \\ n ● n = r ● l = l ● r = u ● d = d ● u = n \end{array}

(A.1)

Here the symbol ‘↔’ signifies ‘bijection’ and the meaning of ‘x[−1, 0]’ is explained in §4. Hence, operators that are regarded to effect changes from one base to another can be re-expressed as illustrated in the following examples for various types of component operations:

b_{[E \to C]} = b_{[C \to A]} = b_{[A \to T]} = b_{[T \to G]} = b_{[G \to E]} = r (\leftrightarrow ω_{1}) = x [1, 0],

(A.2)

b_{[E \to A]} = b_{[A \to G]} = b_{[G \to C]} = b_{[C \to T]} = b_{[T \to E]} = u (\leftrightarrow ω_{2}) = x [0, 1],

(A.3)

b_{[E \to T]} = b_{[T \to C]} = b_{[C \to G]} = b_{[G \to A]} = b_{[A \to E]} = d (\leftrightarrow ω_{3}) = x [0, - 1],

(A.4)

b_{[E \to G]} = b_{[G \to T]} = b_{[T \to A]} = b_{[A \to C]} = b_{[C \to E]} = l (\leftrightarrow ω_{4}) = x [- 1, 0],

(A.5)

b_{[E \to E]} = b_{[C \to C]} = b_{[A \to A]} = b_{[T \to T]} = b_{[G \to G]} = n (\leftrightarrow ω_{0} (= no rotation) = ω_{5}) = x [0, 0]

(A.6)

Appendix B

As for B,

1)
Associativity: ‘(B_j●B_k)●B_l = B_j●(B_k●B_l)’ holds for all positive integers j, k and l.
2)
Identity: ‘B₀ = [n₁|n₂|n₃| … |n_i| … |n_{(n ‒ 1)}|n_n|n_n + 1|n_n + 2|n_n + 3| …]’ is an identity element that satisfies ‘ B₀●B_m = B_m●B₀ = B_m ’. (i = 1, 2, 3, …; ‘ n_i (=n) ’ is an element of Z₅ (no movement of the point P))
3)
Inverses: there exists a unique ‘B_m ⁻¹’ that satisfies ‘B_m ^‒ 1●B_m = B_m●B_m ^‒ 1 = B₀’. Actually, the components of the inverse are the inverses of each individual component.
4)
Commutativity: ‘B_j●B_k = B_k●B_j’.
5)
Closure law: any ‘B_j●B_k’ belongs to the set B.

Appendix C

\begin{array}{l} D_{2} \circ B_{(2 \to 3)} ● B_{(3 \to 4)} \\ = [A_{1} |C_{2}| C_{3} [E_{4} | E_{5}] G_{6} |T_{7}| A_{N} |E_{8}| E_{9} | \dots] \\ \circ [n_{1} |n_{2}| n_{3} (u_{4} | d_{5}) n_{6} |n_{7}| n_{8} |n_{9}| \dots]●[l_{1} |u_{2}| d_{3} (r_{4} | d_{5}) n_{6} |l_{7}| n_{8} |n_{9}| \dots], \\ = [A_{1} \circ n_{1} ● l_{1} |C_{2} \circ n_{2} ● u_{2}| C_{3} \circ n_{3} ● d_{3} (E_{4} \circ u_{4} ● r_{4} | E_{5} \circ d_{5} ● d_{5}) G_{6} \circ n_{6} ● n_{6} |T_{7} \circ n_{7} ● l_{7}| E_{8} \circ n_{8} ● n_{8} |E_{9} \circ n_{9} ● n_{9}| \dots] . \end{array}

(A.7)

Again, with reference to Figure 1 or Appendix B,

\begin{array}{l} = [A_{1} \circ l_{1} |C_{2} \circ u_{2}| C_{3} \circ d_{3} (E_{4} \circ d_{4} | E_{5} \circ r_{5}) G_{6} \circ n_{6} |T_{7} \circ l_{7}| E_{8} \circ n_{8} |E_{9} \circ n_{9}| \dots], \\ = [C_{1} |T_{2}| G_{3} (T_{4} | C_{5}) G_{6} |A_{7}| E_{8} |E_{9}| \dots] = D_{4} . \end{array}

(A.8)

Appendix D

Naturally, the series D_k is generated through the following sequence of operations:

\begin{array}{l} B_{(j \to k)} = [{r^{4 - 1}}_{1} |{r^{3 - 2}}_{2}| {r^{1 - 0}}_{3} |\dots| {r^{4 - 1}}_{i} |\dots| {r^{1 - 3}}_{N - 1} |{r^{3 - 2}}_{N}| {r^{0 - 0}}_{N + 1} |{r^{0 - 0}}_{N + 2}| {r^{0 - 0}}_{N + 3} | \dots], \\ = [{r^{3}}_{1} |{r^{1}}_{2}| {r^{1}}_{3} |\dots| {r^{3}}_{i} |\dots| {r^{- 2}}_{N - 1} |{r^{1}}_{N}| {r^{0}}_{N + 1} |{r^{0}}_{N + 2}| {r^{0}}_{N + 3} | \dots], \\ = [{r^{3}}_{1} |{r^{1}}_{2}| {r^{1}}_{3} |\dots| {r^{3}}_{i} |\dots| {r^{- 2 + 5}}_{N - 1} |{r^{1}}_{N}| {r^{0}}_{N + 1} |{r^{0}}_{N + 2}| {r^{0}}_{N + 3} | \dots], \\ = [{r^{3}}_{1} |{r^{1}}_{2}| {r^{1}}_{3} |\dots| {r^{3}}_{i} |\dots| {r^{3}}_{N - 1} |{r^{1}}_{N}| {r^{0}}_{N + 1} |{r^{0}}_{N + 2}| {r^{0}}_{N + 3} | \dots] . \end{array}

Then,

\begin{array}{l} D_{j} \circ B_{(j \to k)} \\ = [C_{1} |A_{2}| E_{3} |\dots| C_{i} |\dots| T_{N - 1} |A_{N}| E_{N + 1} |E_{N + 2}| E_{N + 3} | \dots] \circ [{r^{3}}_{1} |{r^{1}}_{2}| {r^{1}}_{3} |\dots| {r^{3}}_{i} | \dots \\ \dots |{r^{3}}_{N - 1}| {r^{1}}_{N} |{r^{0}}_{N + 1}| {r^{0}}_{N + 2} |{r^{0}}_{N + 3}| \dots], \\ = [E \circ {r^{1}}_{1} |E \circ {r^{2}}_{2}| E \circ {r^{0}}_{3} |\dots| E \circ {r^{1}}_{i} |\dots| E \circ {r^{3}}_{N - 1} |E \circ {r^{2}}_{N}| E \circ {r^{0}}_{N + 1} |E \circ {r^{0}}_{N + 2}| E \circ {r^{0}}_{N + 3} | \dots] \\ \circ [{r^{3}}_{1} |{r^{1}}_{2}| {r^{1}}_{3} |\dots| {r^{3}}_{i} |\dots \dots| {r^{3}}_{N - 1} |{r^{1}}_{N}| {r^{0}}_{N + 1} |{r^{0}}_{N + 2}| {r^{0}}_{N + 3} | \dots] \\ = [E \circ {r^{1 + 3}}_{1} |E \circ {r^{2 + 1}}_{2}| E \circ {r^{0 + 1}}_{3} |\dots| E \circ {r^{1 + 3}}_{i} | \dots \\ \dots |E \circ {r^{3 + 3}}_{N - 1}| E \circ {r^{2 + 1}}_{N} |E \circ {r^{0}}_{N + 1}| E \circ {r^{0}}_{N + 2} |E \circ {r^{0}}_{N + 3}| \dots], \\ = [E \circ {r^{1 + 3}}_{1} |E \circ {r^{2 + 1}}_{2}| E \circ {r^{0 + 1}}_{3} |\dots| E \circ {r^{1 + 3}}_{i} | \dots \\ \dots |E \circ {r^{3 + 3}}_{N - 1}| E \circ {r^{3}}_{N} |E \circ {r^{0}}_{N + 1}| E \circ {r^{0}}_{N + 2} |E \circ {r^{0}}_{N + 3}| \dots], \\ = [E \circ {r^{4}}_{1} |E \circ {r^{3}}_{2}| E \circ {r^{1}}_{3} |\dots| E \circ {r^{4}}_{i} |\dots| E \circ {r^{6}}_{N - 1} |E \circ {r^{3}}_{N}| E \circ {r^{0}}_{N + 1} |E \circ {r^{0}}_{N + 2}| E \circ {r^{0}}_{N + 3} | \dots], \\ = [E \circ {r^{4}}_{1} |E \circ {r^{3}}_{2}| E \circ {r^{1}}_{3} |\dots| E \circ {r^{4}}_{i} |\dots| E \circ {r^{6 - 5}}_{N - 1} |E \circ {r^{3}}_{N}| E \circ {r^{0}}_{N + 1} |E \circ {r^{0}}_{N + 2}| E \circ {r^{0}}_{N + 3} | \dots], \\ = [E \circ {r^{4}}_{1} |E \circ {r^{3}}_{2}| E \circ {r^{1}}_{3} |\dots| E \circ {r^{4}}_{i} |\dots| E \circ {r^{1}}_{N - 1} |E \circ {r^{3}}_{N}| E \circ {r^{0}}_{N + 1} |E \circ {r^{0}}_{N + 2}| E \circ {r^{0}}_{N + 3} | \dots], \\ = [G_{1} |T_{2}| C_{3} |\dots| G_{i} |\dots| C_{N - 1} |T_{N}| E_{N + 1} |E_{N + 2}| E_{N + 3} | \dots] = D_{k} . \end{array}

(A.9)

Appendix E

\begin{array}{l} D_{j} \circ B_{(j \to j †)} \\ = [E \circ x [0, 1]_{1} | E \circ x [0, - 1]_{2} | E \circ x [0, 0]_{3} |\dots| E \circ x [1, 0]_{i} | \dots \\ \dots |E \circ x {[- 1, 0]}_{N - 1}| E \circ x {[0, 1]}_{N} |E \circ x {[0, 0]}_{N + 1}| E \circ x {[0, 0]}_{N + 2} |E \circ x {[0, 0]}_{N + 3}| \dots] \\ \circ [x [0, - 2]_{1} | x [0, 2]_{2} | x [0, 0]_{3} |\dots| x [- 2, 0]_{i} |\dots| x [2, 0]_{N - 1} | x [0, - 2]_{N} \\ |x {[0, 0]}_{N + 1}| x {[0, 0]}_{N + 2} |x {[0, 0]}_{N + 3}| \dots], \\ = [E \circ r^{0} ● {u^{1}}_{1} |E \circ r^{0} ● {u^{- 1}}_{2}| E∘ \circ r^{0} ● {u^{0}}_{3} |\dots| E∘ \circ r^{1} ● {u^{0}}_{i} | \dots \\ \dots |E \circ r^{- 1} ● {u^{0}}_{N - 1}| E \circ r^{0} ● {u^{1}}_{N} |E \circ r^{0} ● {u^{0}}_{N + 1}| E \circ r^{0} ● {u^{0}}_{N + 2} |E \circ r^{0} ● {u^{0}}_{N + 3}| \dots], \\ \circ [r^{0} ● {u^{- 2}}_{1} |r^{0} ● {u^{2}}_{2}| r^{0} ● {u^{0}}_{3} |\dots| r^{- 2} ● {u^{0}}_{i} |\dots| r^{2} ● {u^{0}}_{N - 1} |r^{0} ● {u^{- 2}}_{N}| r^{0} ● {u^{0}}_{N + 1} |r^{0} ● {u^{0}}_{N + 2}| r^{0} ● {u^{0}}_{N + 3} | \dots], \\ = [E \circ r^{0} ● u^{1} ● r^{0} ● {u^{- 2}}_{1} |E \circ r^{0} ● u^{- 1} ● r^{0} ● {u^{2}}_{2}| E \circ r^{0} ● u^{0} ● r^{0} ● {u^{0}}_{3} |\dots| E \circ r^{1} ● u^{0} ● r^{- 2} ● {u^{0}}_{i} | \dots \\ |E \circ r^{- 1} ● u^{0} ● r^{2} ● {u^{0}}_{N - 1}| E \circ r^{0} ● u^{1} ● r^{0} ● {u^{- 2}}_{N} |E \circ r^{0} ● u^{0} ● r^{0} ● {u^{0}}_{N + 1}| E \circ r^{0} ● u^{0} ● r^{0} ● {u^{0}}_{N + 2} |E \circ r^{0} ● u^{0} ● r^{0} ● {u^{0}}_{N + 3}| \dots], \\ = [E \circ r^{0} ● {u^{- 1}}_{1} |E \circ r^{0} ● {u^{1}}_{2}| E \circ r^{0} ● {u^{0}}_{3} |\dots| E \circ r^{- 1} ● {u^{0}}_{i} | \dots \\ \dots |E \circ r^{1} ● {u^{0}}_{N - 1}| E \circ r^{0} ● {u^{- 1}}_{N} |E \circ r^{0} ● {u^{0}}_{N + 1}| E \circ r^{0} ● {u^{0}}_{N + 2} |E \circ r^{0} ● {u^{0}}_{N + 3}| \dots], \\ = [E \circ r^{0} ● {d^{1}}_{1} |E \circ r^{0} ● {u^{1}}_{2}| E \circ r^{0} ● {u^{0}}_{3} |\dots| E \circ l^{1} ● {u^{0}}_{i} | \dots \\ \dots |E \circ r^{1} ● {u^{0}}_{N - 1}| E \circ r^{0} ● {d^{1}}_{N} |E \circ r^{0} ● {u^{0}}_{N + 1}| E \circ r^{0} ● {u^{0}}_{N + 2} |E \circ r^{0} ● {u^{0}}_{N + 3}| \dots], \\ = [T_{1} |A_{2}| E_{3} |\dots| G_{i} |\dots| C_{n - 1} |T_{n}| E_{n + 1} |E_{n + 2}| E_{n + 3} | \dots] \\ = {D_{j}}^{†} . \end{array}

(A.10)

Appendix F

The axioms are:

I)
A binary operation and closure law: the combination of two morphisms satisfies hom(X, X) × hom(X, Y) → hom(X, Y). Moreover, hom(X, Y) × hom(Y, Z) → mor (X, Z) and hom(Y, Z) × hom(Z, Zs) → mor (Y, Zs) both hold.
II)
Associativity: If f: X → X, ρ: X → Y, τ: Y → Z, g: X → Z, h: Z → Z, and j: Z → Zs. Then, ‘f●(ρ●τ) = (f●ρ)●τ’, ‘ρ●(τ●h) = (ρ●τ)●h’, ‘f●(g●h) = (f●g)●h’, and ‘τ●(h●j) = (τ●h)●j’ hold.
III)
Identity: there exist morphisms ‘1_X, 1_Y, 1_Z, 1_Zs’ such that ‘1_X●f = f = f●1_X’, and ‘1_Y●ρ = ρ = ρ●1_X’, ‘1_Z●τ = τ = τ●1_Y’, ‘1_Z●g = g = g●1_X’, ‘1_Z●h = h = h●1_Z’. ‘1_Zs●j = j = j●1_Z’. In practice,
$‘ 1_{X} = 1_{Y} = 1_{Z} = 1_{Zs} = [n_{1} |n_{2}| n_{3} |\dots| n_{i} |\dots| n_{N - 1} |n_{N}| \dots] ’ satisfies these conditions .$
(A.11)

Appendix G

For Category C₁,

\begin{array}{l} morphism f_{1} (= B_{1} (\in group B_{1})) : X_{1} \to X_{1}, \\ morphism ρ_{1} : X_{1} \to Y_{1}, \\ morphism τ_{1} : Y_{1} \to Z_{1}, \\ morphism g_{1} (= ρ_{1} ● τ_{1}) : X_{1} \to Z_{1}, \\ morphism h_{1} (= B_{1} (\in group B_{1})) : Z_{1} \to Z_{1}, \\ morphism j_{1} : Z_{1} \to Z s_{1} . \end{array}

(A.12)

Similarly for category C₂, for each object F(X₁) = X₂, F(Y₁) = Y₂, F(Z₁) = Z₂, F(Zs₁) = Zs₂ (∈C₂), the following relationships also hold:

\begin{array}{l} morphism F (f_{1}) (= f_{2} = B_{2} (\in group B_{2})) : F (X_{1}) \to F (X_{1}), \\ morphism F (ρ_{1}) (= ρ_{2}) : F (X_{1}) \to F (Y_{1}), \\ morphism F (τ_{1}) (= τ_{2}) : F (Y_{1}) \to F (Z_{1}), \\ morphism F (g_{1}) (= g_{2} = ρ_{2} ● τ_{2}) : F (X_{1}) \to F (Z_{1}), \\ morphism F (h_{1}) (= h_{2} = B_{2} (\in group B_{2})) : F (Z_{1}) \to F (Z_{1}), \\ morphism F (j_{1}) (= j_{2}) : F (Z_{1}) \to F (Z s_{1}) . \end{array}

(A.13)

Other than these, if relationships F(f₁●ρ₁) = F(f₁)●F(ρ₁), F(ρ₁●τ₁) = F(ρ₁)●F(τ₁), F(τ₁●h₁) = F(τ₁)●F(h₁), F(f₁●g₁) = F(f₁)●F(g₁), F(g₁●h₁) = F(g₁)●F(h₁), and F(h₁●j₁) = F(h₁)●F(j₁) are satisfied, the composition of C₁ and C₂ linked with ‘functor F’ is possible although the proof is omitted here.

Furthermore, the following postulates hold: for object X (∈C₁), ‘F(1_X) = 1_F(X) (∈C₂)’ is true, for object Y (∈C₁), ‘F(1_Y) = 1_F(Y) (∈C₂)’ and for object Z (∈C₁), ‘F(1_Z) = 1_F(Z) (∈C₂)’, is true under the condition:

‘ F (1_{X}) = 1_{F (X)} = F (1_{Y}) = 1_{F (Y)} = F (1_{Z}) = 1_{F (Z)} = [ω_{0 1} |ω_{0 2}| ω_{0 3} |\dots| ω_{0 i} |\dots| ω_{0 N - 1} |ω_{0 N}| \dots] ’ .

(A.14)

References

Rosen J: Symmetry rules: How science and nature are founded on symmetry. 2008, New York: Springer-Verlag, 1
Book Google Scholar
Armstrong MA: Groups and symmetry, undergraduate texts in mathematics. 1988, New York: Springer-Verlag
Google Scholar
Judson TW: Abstract Algebra: Theory and Applications. 1997, Virginia: PWS Publishing Company
Google Scholar
Hungerford TW: Abstract Algebra, An Introduction. 1997, Philadelphia: Saunders College Publishing, 2
Google Scholar
Tung WK: Group theory in physics. 1985, Singapore: World Scientific Pub. Co. Inc.
Book Google Scholar
Unger AA: The abstract Lorentz transformation group. Am J Phy. 1992, 60: 815-828. 10.1119/1.17063.
Article Google Scholar
Sexl RU, Ulbantke HU: Relativity, Groups, Particles: Special Relativity and Relativistic Symmetry in Field and Particle Physics. 2001, Wien: Springer
Book Google Scholar
Hamermesh M: Group theory and its application to physical problems. 2012, New York: Dover Publications, Inc.
Google Scholar
Ladd M: Symmetry and group theory in chemistry. 1998, Cambridge: Woodhead Publishing Limited
Book Google Scholar
Derome J-R: Biological Similarity and Group Theory. J Theor Biol. 1977, 65: 369-378. 10.1016/0022-5193(77)90331-9.
Article CAS PubMed Google Scholar
Wang J: A complete symmetrical group DNA sequences and symmetry of poly-codon sequences (Ι). J Biomathematics. 2001, 16: 129-136.
Google Scholar
Wang J: A complete symmetrical group DNA sequences and symmetry of poly-codon sequences (ΙΙ). J Biomathematics. 2001, 16: 257-265.
Google Scholar
Chirikjian GS: Group theory and biomolecular conformation: I Mathematical and computational models. J Phys Condens Matter. 2010, 22: 323103-10.1088/0953-8984/22/32/323103.
Article PubMed Central PubMed Google Scholar
Chirikjian GS: Mathematical aspects of molecular replacement. I. Algebraic properties of motion spaces. Acta Crystallogr A. 2011, 67: 435-436. 10.1107/S0108767311021003.
Article PubMed Central CAS PubMed Google Scholar
Fischer M, Klaere S, Nguyen MAT, von Haeseler A: On the group theoretical background of assigning stepwise mutations onto phylogenies. Algorithms Mol Bio. 2012, 7: 36-10.1186/1748-7188-7-36.
Article Google Scholar
Bashford JD, Tsohantjis I, Jarvis PD: Codon and nucleotide assignments in a supersymmetric model of the genetic code. Phys Lett A. 1997, 233: 288-481.
Article Google Scholar
Bashford JD, Tsohantjis I, Jarvis PD: A supersymmetric model for the evolution of genetic code. Proc Natl Acad Sci U S A. 1998, 95: 987-992. 10.1073/pnas.95.3.987.
Article PubMed Central CAS PubMed Google Scholar
Sánchez R, Morgado E, Grau R: Gene algebra from a genetic code algebraic structure. J Math Biol. 2005, 51: 431-457. 10.1007/s00285-005-0332-8.
Article PubMed Google Scholar
Sánchez R, Grau R: Vector space of the extended base-triplets over the Galois field of five DNA bases alphabet. World Acad Sci Eng Technol, Int J Comp, Inf Sci Eng. 2007, 1: 5-
Google Scholar
Sánchez R, Grau R: An algebraic hypothesis about the primeval genetic code architecture. Math Biosci. 2009, 221: 60-76. 10.1016/j.mbs.2009.07.001.
Article PubMed Google Scholar
Sánchez R, Grau R: A novel Lie algebra of the genetic code over the Galois field of four DNA bases. Math Biosci. 2006, 202: 156-174. 10.1016/j.mbs.2006.03.017.
Article PubMed Google Scholar
Rietman EA, Karp RL, Tuszynski JA: Review and application of group theory to molecular systems biology. Theor Biol and Med Modell. 2011, 8: 21-10.1186/1742-4682-8-21.
Article Google Scholar
Korn F: Elementary Structures reconsidered: Lévi-Strauss on kinship. 2004, London: Routledge
Google Scholar
Crick FHC: Codon-anticodon pairing: The wobble hypothesis. J Mol Biol. 1966, 19: 548-555. 10.1016/S0022-2836(66)80022-0.
Article CAS PubMed Google Scholar
Crick FHC: The origin of the genetic code. J Mol Biol. 1968, 38: 367-379. 10.1016/0022-2836(68)90392-6.
Article CAS PubMed Google Scholar
Crick FHC: On protein synthesis. In Symp Soc Exp Biol. 1958, 12: 138-163.
CAS Google Scholar
Bashford JD, Jarvis PD: The genetic code as a periodic table. Biosystems. 2000, 57: 147-161. 10.1016/S0303-2647(00)00097-6.
Article CAS PubMed Google Scholar
Bíró T, Czirók A, Vicsek T, Major B: A application of vector space techniques to DNA. Fractals. 1998, 6: 205-210. 10.1142/S0218348X98000250.
Article Google Scholar
Beland P, Allen TF: The origin and evolution of the genetic code. J Theor Biol. 1994, 170: 359-365. 10.1006/jtbi.1994.1198.
Article CAS PubMed Google Scholar
Epstein CJ: Role of the amino-acid “code” and of selection for conformation in the evolution of proteins. Nature. 1966, 210: 25-28. 10.1038/210025a0.
Article CAS PubMed Google Scholar
Jukes TH, Osawa S: Evolutionary changes in the genetic code. Comp Biochem Physiol. 1993, B 106: 489-494.
Google Scholar
Jukes TH: Evolution of the amino acid code: Inferences from mitochondrial codes. J Mol Evol. 1983, 19: 219-225. 10.1007/BF02099969.
Article CAS PubMed Google Scholar
Pickover CA: DNA and protein tetragrams. J Mol Graphics. 1992, 10: 2-6. 10.1016/0263-7855(92)80001-T.
Article CAS Google Scholar
Trainor LEH, Rowe GW, Szabo VL: A tetrahedral representation of poly-codon sequences and a possible origin of codon degeneracy. J Theor Biol. 1984, 108: 459-468. 10.1016/S0022-5193(84)80046-6.
Article CAS PubMed Google Scholar
Zhang R, Zhang CT: Z curves, an intuitive tool for visualizing and analyzing the DNA sequences. J Biomol Str Dyn. 1994, 11: 767-782. 10.1080/07391102.1994.10508031.
Article CAS Google Scholar
Zhang CT: A symmetrical theory of DNA sequences and its applications. J Theor Biol. 1997, 187: 297-306. 10.1006/jtbi.1997.0401.
Article CAS PubMed Google Scholar
Duplij D, Duplij S: DNA sequence representation by trianders and determinative degree of nucleotides. J Zhejiang Univ (Sci). 2005, 6B: 743-755. 10.1631/jzus.2005.B0743.
Article CAS Google Scholar
Rushdi A, Tuqan J, Strohmer T: Map-invariant spectral analysis for the identification of DNA periodicities. EURASIP J Bioinform Syst Biol. 2012, 2012: 16-10.1186/1687-4153-2012-16.
Article PubMed Central PubMed Google Scholar
Zupan J, Randić M: Algorithm for coding DNA sequences into “spectrum-like” and “Zigzag” representations. J Chem Inf Model. 2005, 45: 309-313. 10.1021/ci040104j.
Article CAS PubMed Google Scholar
Jafarzadeh N, Iranmanesh A: A novel graphical and numerical representation for analyzing DNA sequences based on codons. MATCH Commun Math Comput Chem. 2012, 68: 611-620.
CAS Google Scholar
He P, Wang J: Numerical characterization of DNA primary sequence. Internet Elec J Mol Des. 2002, 1: 668-674.
CAS Google Scholar
Nandy A, Harle M, Basak SC: Mathematical descriptors of DNA sequences: Development and applications. ARKIVOC. 2006, ix: 211-238.
Google Scholar
Sawamura J, Morishita S, Ishigooka J: A group-theoretical notation for disease states: an example using the psychiatric rating scale. Theor Biol Med Model. 2012, 9: 28-10.1186/1742-4682-9-28.
Article PubMed Central PubMed Google Scholar
Martin GE: Transformation Geometry. 1983, New York: Springer-Verlag
Google Scholar
Lockwood EH, Macmillan RH: Geometric symmetry. 1978, Cambridge (England): Cambridge University press
Google Scholar
Liu Y, Collins RT: Skewed symmetry groups. Proc. IEEE Conf. Computer Vision and Pattern Recognition. 2001, 872-879.
Google Scholar
Liu Y: A computational model for periodic pattern perception based on frieze and wallpaper groups. IEEE Trans PAMI. 2004, 26 (3): 354-371. 10.1109/TPAMI.2004.1262332.
Article CAS Google Scholar
Awodey S: Category Theory (Oxford Logic Guides). 2010, New York: Oxford University Press, Inc., 2
Google Scholar
Mac Lane S: Categories for the working mathematician. 1998, New York: Springer-Verlag, 2
Google Scholar
D’Onofrio D, Abel D, Johnson DE: Dichotomy in the definition of prescriptive information suggests both prescribed data and prescribed algorithms: biosemiotics applications in genomic systems. Theoret Biol Med Modell. 2012, 9: 8-10.1186/1742-4682-9-8.
Article Google Scholar
Chung KW, Chan HSY, Wang BN: Tessellations with symmetries of the wallpaper groups and the modular group in the hyperbolic 3-space form dynamics. Comput Graph. 2001, 25: 333-341. 10.1016/S0097-8493(00)00135-7.
Article Google Scholar
Butler JT: A note on cellular automata simulations. Inf Control. 1974, 26: 286-295. 10.1016/S0019-9958(74)91409-0.
Article Google Scholar
Ceccherini-Silberstein TG, Machi A, Scarabotti F: Amenable groups and cellular automata. Ann Inst Fourier (Genoble). 1999, 49 (issue 2): 673-685.
Article Google Scholar

Download references

Acknowledgments

The authors wish to acknowledge Katsuji Nishimura, Ken Inada, and Kaoru Sakamoto for providing us with very useful advice during this study.

Author information

Authors and Affiliations

Department of Psychiatry, Tokyo Women’s Medical University, Tokyo, Japan
Jitsuki Sawamura & Jun Ishigooka
Depression Prevention Medical Center, Inariyama Takeda Hospital, Kyoto, Japan
Shigeru Morishita

Authors

Jitsuki Sawamura
View author publications
You can also search for this author in PubMed Google Scholar
Shigeru Morishita
View author publications
You can also search for this author in PubMed Google Scholar
Jun Ishigooka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jitsuki Sawamura.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JS conceived of the main idea of this article and wrote the manuscript. SM revised the manuscript. JI gave advice on potential versatilities of the model to the biological science. In addition, all authors read and approved the final version of the manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

Sawamura, J., Morishita, S. & Ishigooka, J. A symmetry model for genetic coding via a wallpaper group composed of the traditional four bases and an imaginary base E: Towards category theory-like systematization of molecular/genetic biology. Theor Biol Med Model 11, 18 (2014). https://doi.org/10.1186/1742-4682-11-18

Download citation

Received: 25 September 2013
Accepted: 09 February 2014
Published: 07 May 2014
DOI: https://doi.org/10.1186/1742-4682-11-18

A symmetry model for genetic coding via a wallpaper group composed of the traditional four bases and an imaginary base E: Towards category theory-like systematization of molecular/genetic biology

Abstract

Background

Results

Conclusions

Background

§1 A preliminary setting describing a wallpaper pattern used as a symmetry model for DNA sequences

§2 Group composition that yields changes in DNA bases via a Cartesian vector

§3 Integration of a linear group and a rotational group as a wallpaper group

§4 Methods to obtain complementary sequences from primary DNA

§5 Further unifying notation to describe the wallpaper group operation

§6 Treatment of changes of sequences and the insertion/deletion of DNA bases via an optionally generalized operation

§7 Synthesis of changes, insertion/deletion, and recombination of DNA bases

§8 Further applications of the composition category-like prototypal model using additional ribonucleic acid (RNA)

Step Ι

Step ΙΙ

Results

Discussion

Conclusions

Appendix A

Appendix B

Appendix C

Appendix D

Appendix E

Appendix F

Appendix G

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Theoretical Biology and Medical Modelling

Contact us