A Collection of Word Oddities and Trivia, Page 13

Last revision: Nov. 18, 2012


LONG WORDS - CHEMICAL NAMES

An 189,819-letter IUPAC name of titin, the largest known protein which is responsible for passive elasticity of muscle, can be seen being read aloud in a 3½-hour video clip on YouTube.

Shown below is a 1,185-letter chemical term for "Tobacco Mosaic Virus, Dahlemense Stain." This word has appeared in the American Chemical Society's Chemical Abstracts and is considered by some to be the longest real word.

ACETYLSERYLTYROSYLSERYLISOLEUCYL-
THREONYLSERYLPROLYLSERYLGLUTAMINYL-
PHENYLALANYLVALYLPHENYLALANYLLEUCYL-
SERYLSERYLVALYLTRYPTOPHYLALANYL-
ASPARTYLPROLYLISOLEUCYLGLUTAMYLLEUCYL-
LEUCYLASPARAGINYLVALYLCYSTEINYL-
THREONYLSERYLSERYLLEUCYLGLYCYL-
ASPARAGINYLGLUTAMINYLPHENYLALANYL-
GLUTAMINYLTHREONYLGLUTAMINYLGLUTAMINYL-
ALANYLARGINYLTHREONYLTHREONYL-
GLUTAMINYLVALYLGLUTAMINYLGLUTAMINYL-
PHENYLALANYLSERYLGLUTAMINYLVALYL-
TRYPTOPHYLLYSYLPROLYLPHENYLALANYL-
PROLYLGLUTAMINYLSERYLTHREONYLVALYL-
ARGINYLPHENYLALANYLPROLYLGLYCYL-
ASPARTYLVALYLTYROSYLLYSYLVALYLTYROSYL-
ARGINYLTYROSYLASPARAGINYLALANYLVALYL-
LEUCYLASPARTYLPROLYLLEUCYLISOLEUCYL-
THREONYLALANYLLEUCYLLEUCYLGLYCYL-
THREONYLPHENYLALANYLASPARTYLTHREONYL-
ARGINYLASPARAGINYLARGINYLISOLEUCYL-
ISOLEUCYLGLUTAMYLVALYLGLUTAMYL-
ASPARAGINYLGLUTAMINYLGLUTAMINYLSERYL-
PROLYLTHREONYLTHREONYLALANYLGLUTAMYL-
THREONYLLEUCYLASPARTYLALANYLTHREONYL-
ARGINYLARGINYLVALYLASPARTYLASPARTYL-
ALANYLTHREONYLVALYLALANYLISOLEUCYL-
ARGINYLSERYLALANYLASPARAGINYLISOLEUCYL-
ASPARAGINYLLEUCYLVALYLASPARAGINYL-
GLUTAMYLLEUCYLVALYLARGINYLGLYCYL-
THREONYLGLYCYLLEUCYLTYROSYLASPARAGINYL-
GLUTAMINYLASPARAGINYLTHREONYL-
PHENYLALANYLGLUTAMYLSERYLMETHIONYL-
SERYLGLYCYLLEUCYLVALYLTRYPTOPHYL-
THREONYLSERYLALANYLPROLYLALANYLSERINE

The spelling of the above word was taken from The Insomniac's Dictionary by Paul Hellweg. The spelling was corrected on March 29, 2000; there was a missing L in the next-to-last line. Thanks to Eric T. Ferguson, who spotted the error, and who, in fact, has suggested that the "word" be deleted from the website. He writes, "Your example is irrelevant. Any protein of known composition can be written out like this; the maximum length is indeterminate: just depends on how long a protein someone has decoded. Nobody ever writes out the name in this way. The same applies for DNA sequences, never written out in words but only in code letters. I suggest you drop the example."

There are two artificial terms describing complex chemical compounds which have appeared in the Guinness Book of World Records. However these "words" have never been used by chemists and have never appeared in a chemical book or paper. Thus, they have been withdrawn from Guinness.

One is a 3,641-letter chemical name describing bovine NADP-specific glutamate dehydrogenase, which contains 500 amino acids. Jeff Grant of Hastings, New Zealand, provided the word to Ross Eckler, editor of Word Ways. Two parts of the word, "glutamylasparginyl" and "glutaminyl," may not be accurate because of uncertainties in the chemical structure. One proposed resolution of the uncertainties would shorten the word to 3,639 letters. [Charles Turner]

The other, which appears below, is supposed to be a 1,913-letter chemical name for the tryptophan synthetase A protein:

methionylglutaminylarginyltyrosylglutamylserylleucylphenylalanylalanylglutaminyll eucyllysylglutamylarginyllysylglutamylglycylalanylphenylalanylvalylprolylphenylal anylvalylthreonylleucylglycylaspartylprolylglycylisoleucylglutamylglutaminylseryl leucyllysylisoleucylaspartylthreonylleucylisoleucylglutamylalanylglycylalanylaspa rtylalanylleucylglutamylleucylglycylisoleucylprolylphenylalanylserylaspartylproly lleucylalanylaspartylglycylprolylthreonylisoleucylglutaminylasparaginylalanylthre onylleucylarginylalanylphenylalanylalanylalanylglycylvalylthreonylprolylalanylglu taminylcysteinylphenylalanylglutamylmethionylleucylalanylleucylisoleucylarginylgl utaminyllysylhistidylprolylthreonylisoleucylprolylisoleucylglycylleucylleucylmeth ionyltyrosylalanylasparaginylleucylvalylphenylalanylasparaginyllysylglycylisoleuc ylaspartylglutamylphenylalanyltyrosylalanylglutaminylcysteinylglutamyllysylvalylg lycylvalylaspartylserylvalylleucylvalylalanylaspartylvalylprolylvalylglutaminylgl utamylserylalanylprolylphenylalanylarginylglutaminylalanylalanylleucylarginylhist idylasparaginylvalylalanylprolylisoleucylphenylalanylisoleucylcysteinylprolylprol ylaspartylalanylaspartylaspartylaspartylleucylleucylarginylglutaminylisoleucylala nylseryltyrosylglycylarginylglycyltyrosylthreonyltyrosylleucylleucylserylarginyla lanylglycylvalylthreonylglycylalanylglutamylasparaginylarginylalanylalanylleucylp rolylleucylasparaginylhistidylleucylvalylalanyllysylleucyllysylglutamyltyrosylasp araginylalanylalanylprolylprolylleucylglutaminylglycylphenylalanylglycylisoleucyl serylalanylprolylaspartylglutaminylvalyllysylalanylalanylisoleucylaspartylalanylg lycylalanylalanylglycylalanylisoleucylserylglycylserylalanylisoleucylvalyllysylis oleucylisoleucylglutamylglutaminylhistidylasparaginylisoleucylglutamylprolylgluta myllysylmethionylleucylalanylalanylleucyllysylvalylphenylalanylvalylglutaminylpro
lylmethionyllysylalanylalanylthreonylarginylserine

Fredrik Viklund writes, "Chemical terms should not in my opinion be listed as long words. Many, many compounds are so complex that their names would be horrific and probably beat the ones listed in all known sources. It is exceedingly hard to reconstruct the correct structure from the name, and many attempts are made to automate the process from structure to name and vice versa. Some systems are successful and commonly used in database searching. The long words starting with ACETYL-SERYL-TYROSYL-SERYL- methionyl-glutaminyl-arginyl-tyrosyl-glutamyl- are spelled-out versions of the amino acid sequence of proteins. To have the longest word, it would only require finding a larger protein, and as proteins are discovered at a rate of hundreds to thousands per week it wouldn't be sporty to accept those names as 'words.' Similar spelling-out for DNA sequences would yield even longer words as the DNA is continous for up to several hundred million bases where each base would be named something like 'uracilphosphate.'"

According to The Top 10 of Everything, 2000, the tryptophan synthetase A protein is "an enzyme consisting of 267 amino acids" and the 1913-letter name "has actually appeared in print in various publications" (Stuart Kidd)

6,8-DIDEOXY-6-(1-METHYL-4-PROPYL-2-PYRROLIDINECARBOXAMIDO)-1-THIO-D-ERYTHRO-D-GALACTO-OCTAPYRANOSIDE (100 characters including punctuation) is found in the OED2, although not as a vocabulary entry. Instead it appears in a citation for the word lincomycin [Stuart Kidd].

DIISOBUTYLPHENOXYETHOXYETHYLDIMETHYLBENZYLAMMONIUMCHLORIDE has appeared in the Journal of American Veterinary Medical Association. It is the chemical name of a drug they announced in the 1950s [John Carroll].

The IUPAC nomenclature for organic chemical compounds is open-ended, giving rise to the 189,819-letter chemical name Methionylthreonylthreonyl...isoleucine, the shortened version of a protein also known as titin, or sometimes connectin, which is involved in striated muscle formation. Titin is the largest known protein, consisting of 26,926 amino acids [Charles Turner].


Front | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20