Dr. Ikehara describes about the genetic code as follows.
“There are two sides for exploring the origin of the genetic code. The first one is to understand from what primitive genetic code the modern one originated and how the primitive genetic code has evolved to the modern genetic code. The second one is to clarify how the correspondences between a codon and an amino acid were determined. On the other hand, the genetic code is actually realized by tRNA, which determined the corresponding relation between a codon and an amino acid. Therefore, the genetic code is a table summarizing the correspondences, which were determined by tRNAs.”
Furthermore, Dr. Ikehara emphasizes the following.
“Whenever the origin of something, such as the genetic code, is investigated, there is a serious problem, as that it is almost impossible to make clear completely, what events took place in the past, especially on the primitive Earth about 3.8~4.0 billion years ago. However, I would like to emphasize that the problems of the origins of gene, the genetic code and protein are too important to avoid pursuing, because these origins are always relevant to the fundamental properties of extant genes, the genetic code and proteins. I have proposed the GNC-SNS primitive genetic code hypothesis on the origin and evolution of the genetic code (Ikehara et al.2002).”
Let us explain about GNC-SNS Primitive Genetic Code Hypothesis! It will be a journey to retrace the footsteps of Dr. Ikehara’s research process!
SNS Primitive Genetic Code Hypothesis
- First, we investigated in the lab for the creation of an entirely new gene by using the six conditions (hydrophobicity/hydropathy, α-helix, β-sheet, turn/coil formation, acidic amino acid composition and basic amino acid composition) for the formation of a water-soluble globular protein in order to solve the origin of contemporary genes. These should be created in extant organisms even nowadays. From the results, the GC-NSF(a) hypothesis on the origin of contemporary genes was obtained, suggesting that entirely new genes originated from nonstop frame on antisense sequence of GC-rich genes (GC-NSF(a)s) (Ikehara and Okazawa 1993; Ikehara at al.1996).
- After the proposition of the GC-NSF(a) hypothesis, I noticed that the base composition format at the three codon positions of both highly GC-rich genes (65~70%) and their GC-NSF(a) sequences are roughly SNS or (G/C)N(C/G)(Ikehara and Yoshida 1996; Ikehara 2002; Ikehara et al.2002).
- Then, I considered the possibility that the SNS code encoding only ten amino acids ([GADV]-amino acids plus Glu[E], Leu[L], Pro[P], His[H], Gln[Q], and Arg[R]) was a primitive genetic code. To confirm whether or not the SNS code has a coding ability for formation of a water-soluble globular proteins, random numbers were generated in a frame of SNS at the three codon positions in a computer. Dot was plotted,- if an imaginary protein encoded by the SNS code satisfied the six structural conditions for formation of a water-soluble globular protein(Ikehara, K.2002; Ikehara et al.2002).
- From the results, it was found that the computer-generated SNS code encoding 10 amino acids satisfied the six conditions,- when the compositions of G and C at the first codon position were at around 55% and 45%, -respectively, and when every base was contained roughly at a ratio of about one-fourth each at the second codon position. Base compositions at the third position could not be restricted into a small range due to the degeneracy of the genetic code at that position. However, this also means that a polypeptide chain composed of SNS-encoding ten amino acids should be folded into a water-soluble globular structure at a high probability,-because the SNS code satisfies the six conditions for formation of a water-soluble globular structure.

Source: Towards Revealing the Origin of Life, P.149, Fig. 7.4
*Average base compositions at three codon positions of seven GC-rich P.aeruginosa genes.
Open and closed bars show the average base compositions of seven P. aeruginosa GC-rich genes and the corresponding GC-NSF(a)s, respectively.
*GC-NSF(a) means Nonstop frame on antisense strand of a GC-rich gene

Source: Towards Revealing the Origin of Life, P.149, Fig. 7.5
*Dot representation of base compositions at three base positions in codon, which were selected by determining whether imaginary protein computer-generated under the SNS coding system satisfies the six conditions for water-soluble globular structure formation.
GNC Primeval Genetic code hypothesis
- Next, I supposed that the SNS code must originate from a simpler code. To search for a genetic code more ancient or simpler than the SNS code, we used the four protein structure indexes (hydropathy, α-helix, β-sheet, and turn/coil formation) as minimum conditions for formation of a water-soluble globular protein with appropriate secondary and tertiary structures. From the results, it was found that GADV-protein encoded by GNC-code satisfies well the four conditions, when about equal amounts of GADV-amino acids are contained in a protein (Ikehara et al.2002).
- The four GADV-amino acids are excellent for formation of the respective secondary structures and an active center on GADV-protein, [G] is an amino acid.- which is effective for turn or coil formation, and [A] and [V] are effective for α-helix and for β-sheet formations, respectively(Berg et al.2002)
- Both hydrophobic ([V]) and hydrophilic ([D]) amino acids are also fortunately included in the four amino acids encoded by the GNC code. This helps the polypeptide chain to fold into a stable globular structure in water. Furthermore, the combination of the four amino acids is the simplest one out of those of the four selected from the 20 natural amino acids, suggesting that the GNC code could be the most ancient genetic code. This conclusion is the same as antiquity of the “GNC” code, which was first proposed by Eigen and Schuster (1977,1979) from an independent standpoint of the coding ability of protein.
- Furthermore, the idea that the universal genetic code originated from GNC code encoding four GADV-amino acids is also supported by the fact that GADV-amino acids are consistent with the chronological order of amino acids, which was published by Trifonov(2000).
- These support the idea,- that the amino acid composition containing GADV-amino acids at roughly equal amounts,- is a protein 0th-order structure, because it suggests that a water-soluble globular GADV-protein could be produced by random joining of GADV-amino acids encoded by the first GNC genetic code with a high probability.
- In addition, the GNC-SNS primitive genetic code hypothesis supposes that the universal genetic code evolved from substantially singlet but formally triplet GNC code to both substantially and formally triplet universal genetic code through substantially doublet but formally triplet SNS code.

Source: Towards Revealing the Origin of Life, P.150, Fig. 7.6
* Dot representation of base composition at the second base position in the codon, which were selected by determining whether imaginary protein computer-generated under the GNC coding system satisfies the four structural conditions for water-soluble globular structure formation.
In this way, we have proposed the GNC-SNS primitive genetic code hypothesis, suggesting that the universal or standard genetic code originated from GNC code encoding four GADV-amino acids, through SNS code (Ikehara 2002; Ikehara et al.2002).

Source: Towards Revealing the Origin of Life, P.151, Fig. 7.7
*GNC-SNS primitive genetic code hypothesis assuming that the universal or standard genetic code (both formally and substantially triplet code) originated from GNC primeval genetic code (formally triplet but substantially singlet code) through SNS primitive genetic code (formally triplet but substantially doublet code) composed 10 amino acids encoded by sixteen codons. Brown and blue color boxes indicate hydrophobic and hydrophilic amino acids, respectively.
Dr. Ikehara has proposed a new idea that adds a logical basis to the Frozen-Accident Theory of Dr.Crick.
GNC Code Frozen-Accident Theory
“I have proposed a novel hypothesis on the origin of tRNA , assuming that modern tRNAs originated from one nonspecific AntiC-SL tRNA, with which one of four GADV-amino acids was randomly bound, and evolved later into four specific AntiC-SL tRNAs through the four nonspecific AntiC-SL tRNAs.
According to the new hypothesis, the correspondences between a codon/anticodon and a GADV-amino acid must have arisen at one time. However, the specific bindings of GADV-amino acids with GNC codons have not been detected despite the strenuous efforts of many researchers, suggesting that GADV-amino acids cannot bind strongly with the corresponding codons or anticodons. Then, how were the correspondence relations between GADV-amino acids and GNC codons established? I thought about the role of the most ancient AntiC-SL tRNAs in GADV-protein synthesis and I considered the process of how four non-specific AntiC-SL tRNAs were converted to four specific GADV-AntiC-SL tRNAs (Fig.7.8). One idea flashed in my mind, the “GNC code-frozen accident theory”,- about the origin of the genetic code. I feel now that the flash of Crick, about 50 years ago, about the code frozen-accident theory might be unconsciously inspired by the GNC-code frozen-accident theory.”

Source: Towards Revealing the Origin of Life, P.153, Fig. 7.8
*The “anticodon-stem loop hypothesis” on the origin of tRNA.
(A) The hypothesis assumes that modern tRNA originated from one nonspecific primeval AntiC-SL tRNA carrying one of four [GADV]-amino acids selected randomly. The nonspecific AntiC-SL equipped with CCA-end is shown with three small yellow circles at the 3′-end.
(B) Successively, four nonspecific AntiC-SLs were created by gene duplication of the first AntiC-SL gene.
(C) After that, two pairs of [GADV]-amino acids (Gly/Ala and Asp/Val) were randomly assigned to two AntiC-SL pairs, one is AntiC-SL (GCC) and AntiC-SL (GGC), and the other is AntiC-SL (GAC) and AntiC-SL (GUC), and were frozen. The anticodon of two AntiC-SLs were necessarily selected because of a high stability of triplet base pairs (GCC/GGC and GAC/GUC)(Taghavi et al.2017). Consequently, the four specific primeval AntiC-SL tRNAs carrying one specific amino acid (shown by blue and red letters in bracket) were obtained. The specific tRNA is characterized by 5′ accepter stem sequence (light cyan circles) and anticodon (dark gray circles). Green double-headed arrows indicate strong interaction between two AntiC-loops through A-U base pair, Small grayish blue circles mean nucleotides in nonspecific AntiC-stems.
Reference: Ikehara K(2021) Towards Revealing the Origin of Life. Springer Nature
