Sunday, June 27, 2010

INTRODUCTION TO GENES

DNA code is a sequence of chemicals that form information that control how humans are made and how they work. It is a digital code but it is not binary, but quaternary with 4 distinct items. The encoding information in an ordered sequence of 4 different symbols called "bases", typically denoted A, C, G, and T.
  • A: adenosine
  • C: cytosine
  • G: guanine
  • T: thymine
These 4 substances are the fundamental "bits" of information in the genetic code, and are called "base pairs" because there is actually 2 substances per "bit", as discussed later. Everything else is built on top of this basis of 4 DNA digits.
The entirety of human DNA code, called the "human genome", is about 3 million bases in total. Every human being has 2 copies of this code, one copy from each parent, so a human's cell DNA contains a total of around 6 billion bases. In computer terms, this is around 6 Gigabytes of symbols, or more like 1 Gigabyte if compacted, since it's about 2 binary bits of information per A/C/G/T base pair. DNA molecules are linear in a twisted double-helix, with a start and an end, and do not contain any cycles.
Chromasomes: These 6 billion odd base pairs are split amongst 46 chromasomes. Each person gets 2 pairs of chromasomes, 23 from each parent, to total 46 chromasomes per human cell. A chromasome is the largest form of a DNA molecule, with a large sequence of DNA codes, of differing lengths, usually hundreds of millions of base pairs in each chromasome. Chromasomes are independent molecules of DNA, with the typical double-helix, a start and end, but no cycles. Chromasomes are physically large enough to be seen on high power microscopes.
Genes: Each chromasome has subsequences of DNA bases that encode particular features, and these are called "genes". Thus genes are not independent molecules, but are abstract sequences within chromasomes. All genes have different lengths. Genes are too small to be physically seen on a microscope, but are analyzed using indirect chemical, molecular, and computational methods. The total number of distinct genes in the human genome is believed to be around 30,000 genes according to the Human Genome Project.
So the hierarchy of terminology for genetic components is something like:
  • Base pair: the smallest element, a single DNA base-4 compound A, C, G, or T.
  • Gene: a medium-size sequence of around 100,000 DNA base pairs, like a sub-module
  • Chromasome: a large sequence of hundreds of millions of DNA base pairs, like a computer program file
  • Human genome: the entirety of human DNA program code: 2 pairs of 23 distinct chromasomes, adding to around 6 billion DNA base pairs
Every individual has a unique genetic program, though all human DNA shares much common code too. A lot of genes and other DNA subsequences are modified or move around within the DNA of a species, such as when they are inherited from parents at conception. DNA does not usually change within a particular individual's body, though this can occur rarely from cell mutations (e.g. some cancer cells) and also genetic damage such as from radiation or toxic chemical exposure.

CHROMOSOMES