It is important to first note, that eukaryotes and prokaryotes have many similarities with their genetic material, transcription, translation, etc. However, some key differences also exist, and they will be explored in greater depth in this lecture.
One interesting phenomenon that occurs in eukaryotic organisms is that when an mRNA (with a cap and poly-A tail) is hybridized with a (denatured) DNA, one finds that they do not match up exactly. The coding sequences do indeed allow for a great amount of base pairing, but the DNA also experiences a looping effect, such that large strands of DNA loop out of the sequence. This DNA loop is the result of an intervening sequence - a sequence that is present in the genomic DNA but is absent in the mRNA product of that gene.
The DNA for a given gene in eukaryotes is organized into exons and introns. The EXons are those EXpressed sequences that become the mRNA, and the INtrons are those INtervening sequences that are removed in the process of making a mature mRNA. In order to remove the introns from the pre-mRNA, the pre-mRNA is spliced at splice junctions found at the extreme ends of each and every intron. In pre-mRNA to mRNA splicing it is critical to make sure that splicing is extremely accurate. If splicing is off by one nucleotide the entire coding will be messed up because all of the codons downstream of the mistake will be out of the correct reading frame (they will be out of phase). If the "S" were removed from, "SO REX THE BIG RED DOG..." we would end up with an unreadable sequence like, "OR EXT HEB IGR EDD OG..."
RNA splicing is carried out by snurps (sometimes called snRNPs). They are about the size of a ribosome, and are composed on several RNA components (snRNAs, or small nuclear RNAs), and many proteins. The snurps can recognize a splice junction, and cut the pre-mRNA with great accuracy. In snurps, it is the RNA that acts as a catalyst, and the proteins merely hold the snurps in the correct configuration to stabilize them.
In some organisms, given the proper environment, the pre-RNA can fold in on itself, and cleave, removing the intron by itself - in the absence of any other enzymatic activity. This seems to have played a role in primitive evolutionary history.
The snRNAs in the snurps base pair with the pre-mRNA at splice junctions (and some other sites too). The snurps which are bound to the pre-mRNA at different locations interact with each other to facilitate the removal of the intron between the splice junctions and to join the adjacent exons.
Sometimes under well defined conditions exons can be skipped over during splicing. For example, say the DNA/pre-mRNA contains four exons, A, B, C, and D, and introns (represented by the dashes) in between: (A-B-C-D). If a splice junction is skipped over, splicing in some tissues might lead to an A-B-D mRNA, where exon C was skipped. The splicing could also produce an A-C-D mRNA, where exon B was skipped. So two different mRNAs, containing different sequences, and coding for different proteins, came from the same exact strand of DNA. This alternative splicing can aid in gene regulation, and can have many different effects. For some organisms, sex is determined by which splicing is carried out. With the HIV virus, one splicing can lead to the HIV remaining dormant, while another splicing will cause the person to develop AIDS.
There is great divergence of sequence between a given intron in different eukaryotic organisms. The exon sequences are much more conserved. This suggests that the actual sequence of the intron is not very important. If it were important, then any changes that occurred during evolution would be damaging and the organisms with the changes would not be likely to survive. This explains why some mutations in intron regions have no effect as they don't code for a protein, and are cut out before the mRNA is formed. However, some disorders occur when mutations occur at the splice junction or at a point in the intron which prevents splicing.
One possibility is that there were no introns in the beginning, but that they just evolved and inserted themselves into viable genes. Another possibility is that there were tons of introns in the beginning and some have been lost in time through any number of mechanisms.
The generally accepted hypothesis says that the primitive genome contained no introns but consisted of short minigenes that each had very well defined functions and structures. These primitive cells also had the ability to self-splice intervening sequences in the pre-mRNA. Over time, second copies of genes were made and were incorporated into the DNA at nearby as well as distant locations. Take for instance the single globin gene that, over time, was copied. Now the DNA contains two copies of the globin gene and variations can be made on the second copy to attempt to increase the capabilities of globin. We now know this as the a-, and b-globin genes, and in reality, due to crossing over, they actually exist on different chromosomes.
Processes like these might lead to evolution where minigenes were combined to form the genes as we recognize them today. Introns were the spaces between the minigenes that are now working together as a gene.
Pseudogenes are gene-like sequences that do not produce a product but we believe they were at one time genes. They might have promoters that don't work or they may have a mutation in the coding sequence that prevents a functional protein from being produced, or a mutation that interferes with normal splicing. Pseudogenes usually arise by gene duplication of a functional gene followed by errors collecting in one of the copies. Over time, enough mutations in a single copy might make a mutated copy of a gene no longer recognizable as a copy of its predecessor. One theory is that the "junk DNA" referred to before is simply copies of other genes that have, over time, mutated so much that they are unrecognizable as a copy.
This ability to vary a gene's capability has obviously played a great role in evolution, where survival of the fittest killed off those organisms that were not best suited to their environment. When a mutation was not advantageous, it would be killed and its genes therefore removed from the gene pool. When the mutation was advantageous, it would be incorporated. Another example of variations aiding in the functioning of an organism is with globin genes. Some globin gene products bind to oxygen much stronger than others. If a pregnant woman contained this strongly binding globin, the fetus would be unable to wrestle the oxygen from the mother and would die. So the fetus's globin is the one that contains the stronger oxygen affinity and it is able to pull oxygen away from the mother to serve the fetus.
Globins (combined with heme) bind oxygen. All globin genes have three exons and two introns. The functional protein, called hemoglobin, consists of 4 molecules of globin protein and a single molecule of heme. Human adult hemoglobin has two a-globins and two b-globins.
Myoglobin consists of a single globin subunit plus heme and carries oxygen within muscles. Because of their similar sequences and gene organizations (both have three exons in exactly the same location along the gene) it is believed that both the globin and myoglobin are derived from a common ancestor gene.
Plants called legumes have the ability to use certain kinds of bacteria as a means of getting their needed nitrogen through a process of nitrogen fixation. Nodules containing the bacteria in charge of nitrogen fixation line the roots. The bacteria and the plant have a symbiotic relationship; the plant provides the bacteria with food, and the bacteria fixes nitrogen for the plant. Nitrogen fixation can only occur, however, when there is no oxygen present, so leghemoglobin binds oxygen within the nodules, allowing the bacteria to fix nitrogen. This is crucial to the process. The sequence of leghemoglobin is related to the sequence of the other globins, but interestingly, the middle exon is split in leghemoglobin giving this particular globin gene 4 exons. Since the gene organization is close to that of the rest of the globin family and the protein sequence of leghemoglobin and globin are related, it is clear that these genes all share a common ancestor. It is not known if the ancestor had three or four exons. The leghemoglobin gene is believed to have been picked up at some point from animal DNA, but how that occurred scientists do not know.