The Biology Project - Biochemistry The Biology Project Biochemistry

The Biology Project > Biochemistry > The Chemistry of Amino Acids

Close window

Dr. Margaret Oakley Dayhoff

Note about Dr. Dayhoff
Biophysical Society

The origin of the single-letter code for the amino acids

The origin of the single-letter code for the amino acids is of historical interest, and in fact, this story may help the student to learn the code.  The reason for the code is simple enough–in the very early days of bioinformatics, the very fastest computers were in fact, rather clunky.  Dr. Margaret Oakley Dayhoff, arguably the founder of the field of bioinformatics, shortened the code from the three letter designations to the single letter code in an effort to reduce the size of the data files needed to describe the sequence of amino acids in a protein.  The listing of amino acids, the three letter and single letter code, and the explanation for the choice of the single letter is given below.  Note that there are 20 amino acids commonly found in proteins, and 26 letters in the alphabet.  As a result, most of the letters are used.

To develop a single-letter code for the amino acids, Dr. Dayhoff attempted to make the code as easy to remember as possible.  Of course, if the name of each amino acid began with a different letter, the code would be simple indeed.  For 6 of the amino acids, the first letter of the name is unique, making the code simple.  These are:

Amino Acid
3 letter code
Single letter code
Explanation

Cysteine
Histidine
Isoleucine
Methionine
Serine
Valine

Cys
His
Ile
Met
Ser
Val
C
H
I
M
S
V
First letter of the name
First letter of the name
First letter of the name
First letter of the name
First letter of the name
First letter of the name

For the other amino acids, the first letter of the name is not unique to a single amino acid, so Dr. Dayhoff assigned the letters A, G, L, P and T to the amino acids Alanine, Glycine, Leucine, Proline and Threonine, respectively, which occur more frequently in proteins than do the other amino acids having the same first letters.

Amino Acid
3 letter code
Single letter code
Explanation
Alanine
Glycine
Leucine
Proline
Threonine
Ala
Gly
Leu
Pro
Thr
A
G
L
P
T
First letter of the name
First letter of the name
First letter of the name
First letter of the name
First letter of the name

Some of the other amino acids are phonetically suggestive. 

Amino Acid
3 letter code
Single letter code
Explanation
Arginine
Phenylalanine
Tyrosine
Tryptophan
Arg
Phe
Tyr
Trp
R
F
Y
W

aRginine
Fenylalanine
tYrosine
tWiptophan (or, contains Double ring)

For the remaining 5 amino acids, Dr. Dayhoff was reaching somewhat to find an easy-to-remember connection between the single letter and the amino acid.  She assigned aspartic acid, asparagine, glutamic acid and glutamine the letters D, N, E and Q, respectively, noting that D and N are nearer the beginning of the alphabet than E and Q, and that Asp is smaller than Glu, while Asn is smaller than Gln. 

Amino Acid
3 letter code
Single letter code
Explanation
Aspartic Acid
Asparagine
Glutamic Acid
Glutamine
Asp
Asn
Glu
Gln

D
N
E
Q

asparDic
Contains N (or asparagiN)
gluE (or glutamEke)
Q-tamine

By the time Dr. Dayhoff got to lysine, there were not too many letters left, so she used the letter K, explaining that K is at least near L in the alphabet.

Amino Acid
3 letter code
Single letter code
Explanation
Lysine
Lys
K

K is near L in the alphabet

Note about Dr. Margaret Oakley Dayhoff (1925-1983)

Professional Obituary

Dr. Margaret Oakley Dayhoff was a professor at Georgetown University Medical Center and a noted research biochemist at the National Biomedical Research Foundation where she pioneered the application of mathematics and computational methods to the field of biochemistry.  Dr. Dayhoff dedicated her career to applying the evolving computational technologies to support advances in biology and medicine, most notably the creation of protein and nucleic acid databases and tools to interrogate the databases.  Her PhD degree was from Columbia University in the Department of Chemistry, where she devised computational methods to calculate  molecular resonance energies of several organic compounds.  She did postdoctoral studies at the Rockefeller Institute (now Rockefeller University) and the University of Maryland, and joined the newly established National Biomedical Research Foundation in 1959.

Dr. Dayhoff's work with proteins began in 1961 when she developed tools to aid protein chemists in determination of amino acid sequences by automatically overlapping the sequences of peptides.  She went on to initiate the "Atlas of Protein Sequence and Structure", and to develop many of the tools used today in database design and utilization.  In 1980, Dr. Dayhoff developed an on-line database system that could be accessed by telephone line, the first sequence database available for interrogation by remote computers.  Dr. Margaret Oakley Dayhoff, the founder of the field of bioinformatics, died before the field was recognized as a distinct area for investigation.  She was, indeed, a pioneer.

Dr. Dayhoff was extremely active in the Biophysical Society, and served the society as both its secretary and president.  One of her interests was in enhancing the ability of women to successfully pursue careers in the sciences.  She was well aware of the many challenges facing women in science, and worked hard to encourage and mentor women in scientific careers.  It is therefore fitting that the Margaret Oakley Dayhoff award was established to encourage young women to enter careers in scientific research.  This award is aimed towards women of very high promise who have not yet reached a position of high recognition within the structure of academic society.  It is administered through the Biophysical Society , and candidates are judged on achievement and promise in fields within the purvue of the Biophysical Society .

 

Close window

The Biology Project > Biochemistry > The Chemistry of Amino Acids


The Biology Project
Department of Biochemistry and Molecular Biophysics
University of Arizona
August 25, 2003
Contact the Development Team

http://www.biology.arizona.edu
All contents copyright © 2003. All rights reserved.