Many of those who are interested in the subject of evolution and life point out that the genetic code is a tremendous carrier of information, and often raise the question of where that information comes from.
Intuitively we know that there is more information in, say, a recipe for baking a cake than in the statement that “it is raining”, but it is difficult to intuitively define or quantify information.
Information is something that can be transmitted from ‘A’ to ‘B’ (through space or time or whatever) which gives ‘B’ the ability to know something that they didn’t know before. Information conveys some meaning.
By itself a steady white light shone from A to B can only convey a tiny bit of information, perhaps only that “there is a light at A”. And if the light flashes once per second it might convey that “there is a light at A that flashes once per second”. With the addition of a decoder, say a lighthouse signal book, a regularly flashing light might convey the information that “that is Portland Bill lighthouse”. However, in that case the additional information that “Portland Bill light transmits such and such a sequence of flashes” has already been transmitted, and so perhaps it’s more accurate to say that the flashing light ‘activates’ the previously transmitted information, or that the previously transmitted information ‘decodes’ the information in the flashing light.
Often information needs to be transmitted from A to B in a secure way so that ‘C’ and ‘D’ cannot understand it. i.e the information from A cannot be activated by anyone other than B. The goal is to make the information without the decoder or ‘key’ indecipherable. In that case, what might seem to be a string of random letters does actually contain a vast amount of information. Yet without the decoder, the highly informative signal and the string of random letters look very similar; in practice both signals have the same potential to carry information. Consider the following strings of letters and spaces:
- life exists on earth
- hLif eexist so neart
- kudw wzuara ib wlerg
- ne wvkdmtfcng cdjvgd
It is easy to see that the second contains the same information as the first, but with the letters moved one space to the right, with the spaces kept in the same place.
The third sentence is less obviously not random, but there is a hint that it might convey the same information in that the word lengths are the same. After a little time sat at a typewriter one might realise that the key is to type the letter to the right of the one in the sequence above on a standard UK keyboard.
The fourth sentence is indeed random.
When C or D intercepts a string of letters from A then they may attempt to decode the string without knowing the key. For short strings this becomes impossible, but for longer strings it may be possible to find repeating patterns for instance that can be matched to known phrases. We might look for the most common letter in the string and assume that it is the letter ‘e’ for instance, and so on. And then we judge whether we have broken the code by whether the resulting new string of letters has any meaning. But once again, C or D must be able to recognise the meaning when they see it. They must for instance know the language that A and B speak – so they too have received some prior information by another route.
We can represent a string of DNA bases by a string of letters (we have immediately introduced a ‘code’ that needs a decoder by doing this of course).
From our scientific experimentation we have discovered that many of these strings contain information. We have for example found that the machinery within the cell is able to convert the DNA string into proteins: the cell is able to decode the DNA. Knowing that DNA is a code has led to a lot of effort aimed at identifying what it does; at decoding it. The first step has been trying to identify the complete code – hence the human Genome project. Once the complete string has been generated then we can try to decode it.
According to the Human Genome Project website (http://www.ornl.gov/sci/techresources/Human_Genome/project/info.shtml) less than 2% of the complete string of human DNA actually contains the codes that define the amino acid sequences in proteins. About half of the genome contains repeating sequences that don’t code for protein, and are often called ‘junk’ DNA; since it was unknown what they do, the initial response was to reject them as junk. However, as the above-referenced site states: “Deriving meaningful knowledge from the DNA sequence will define research through the coming decades to inform our understanding of biological systems. This enormous task will require the expertise and creativity of tens of thousands of scientists from varied disciplines in both the public and private sectors worldwide.” Indeed, recent research by the Encode project suggests that most of the DNA is indeed useful, not for making proteins but being involved in controlling the process.
As an aside, the techniques used in the human genome project have been applied to identifying the bacteria that caused the Black Death. It seems that the DNA of bacteria that caused the Black Death is not so different from plague bacteria around today; perhaps we should be worried…. http://www.nature.com/nature/journal/vaop/ncurrent/full/nature10549.html
A question often asked is, “where does all the information come from?”
Much of what we do generates information. Forensic science is highly developed at decoding clues to determine the likely course of events in criminal cases. North American Indian trackers can follow people for many miles based on the information left by footprints. Air crash investigators read the information left on the debris to try to determine what caused a given disaster. The information is physically recorded in the ‘clues’, and our knowledge and intelligence is able to ‘activate’ the information. In many cases the information can be traced back eventually to an intelligent source, although that cannot be concluded when for example decoding the information held in geological rock formations.
However, whilst all of these activities generate information, they are basically one-off events that need to be deciphered. None generates the sort of information found in this sentence for instance. None generate information in a code-like format of information; none generate a sequence of instructions.
In all of our daily experience of instructional information transfer, of codes and deciphering, the information has been generated by an intelligent mind. So the question behind the question is, “is the information contained in the DNA code generated by an intelligent source?”
It is argued that an unintelligent machine cannot generate more information than is inherently within the machine. For example, can we imagine a computer program coming up with an equation that has not been already programmed into it? And it is then argued that the cell is a molecular machine and so unable to generate more information than is contained within it and hence there must be an external Intelligent Designer that has generated and implanted the information in the cell. However, I don’t find these arguments thorough.
A cellular machine operates within an environment, so if for example a mutation causes a change in the information contained then the survival or death of the mutated cell will add the information that the mutation was good or bad; the good mutation survives and the bad fails and more information is added to the DNA. It seems to me that this is a perfectly adequate explanation for the generation of the information in DNA, and is completely consistent with the type of God I describe in “The God of Science”