This article is about the heritable unit for transmission of biological traits.
For other uses, see Gene (disambiguation).
These genes make up different DNA sequences called genotypes.
Genotypes along with environmental and developmental factors determine what the phenotypes will be.
Some genetic traits are instantly visible, such as eye color or the number of limbs, and some are not, such as blood type, the risk for specific diseases, or the thousands of basic biochemical processes that constitute life.
These alleles encode slightly different versions of a protein, which cause different phenotypical traits.
Usage of the term "having a gene" (e.g., "good genes," "hair colour gene") typically refers to containing a different allele of the same, shared gene.
The concept of gene continues to be refined as new phenomena are discovered.
Therefore, a broad, modern working definition of a gene is any discrete locus of heritable, genomic sequence which affect an organism's traits by being expressed as a functional product or by regulation of gene expression.
It is inspired by the ancient Greek: γόνος, gonos, that means offspring and procreation.
Main article: History of genetics
Discovery of discrete inherited units
The existence of discrete inheritable units was first suggested by Gregor Mendel (1822–1884).
He described these mathematically as 2 combinations where n is the number of differing characteristics in the original peas.
Although he did not use the term gene, he explained his results in terms of discrete inherited units that give rise to observable physical characteristics.
Mendel was also the first to demonstrate independent assortment, the distinction between dominant and recessive traits, the distinction between a heterozygote and homozygote, and the phenomenon of discontinuous inheritance.
Prior to Mendel's work, the dominant theory of heredity was one of blending inheritance, which suggested that each parent contributed fluids to the fertilisation process and that the traits of the parents blended and mixed to produce the offspring.
Darwin used the term gemmule to describe hypothetical particles that would mix during reproduction.
Mendel's work went largely unnoticed after its first publication in 1866, but was rediscovered in the late 19th century by Hugo de Vries, Carl Correns, and Erich von Tschermak, who (claimed to have) reached similar conclusions in their own research.
Specifically, in 1889, Hugo de Vries published his book Intracellular Pangenesis, in which he postulated that different characters have individual hereditary carriers and that inheritance of specific traits in organisms comes in particles.
De Vries called these units "pangenes" (Pangens in German), after Darwin's 1868 pangenesis theory.
Sixteen years later, in 1905, Wilhelm Johannsen introduced the term 'gene' and William Bateson that of 'genetics' while Eduard Strasburger, amongst others, still used the term 'pangene' for the fundamental physical and functional unit of heredity.
Discovery of DNA
Advances in understanding genes and inheritance continued throughout the 20th century.
Deoxyribonucleic acid (DNA) was shown to be the molecular repository of genetic information by experiments in the 1940s to 1950s.
The structure of DNA was studied by Rosalind Franklin and Maurice Wilkins using X-ray crystallography, which led James D. Watson and Francis Crick to publish a model of the double-stranded DNA molecule whose paired nucleotide bases indicated a compelling hypothesis for the mechanism of genetic replication.
In the early 1950s the prevailing view was that the genes in a chromosome acted like discrete entities, indivisible by recombination and arranged like beads on a string.
The experiments of Benzer using mutants defective in the rII region of bacteriophage T4 (1955–1959) showed that individual genes have a simple linear structure and are likely to be equivalent to a linear section of DNA.
An automated version of the Sanger method was used in early phases of the Human Genome Project.
Modern synthesis and its successors
Main article: Modern synthesis (20th century)
In this view, the molecular gene transcribes as a unit, and the evolutionary gene inherits as a unit.
Related ideas emphasizing the centrality of genes in evolution were popularized by Richard Dawkins.
Main article: DNA
The vast majority of organisms encode their genes in long strands of DNA (deoxyribonucleic acid).
DNA consists of a chain made from four types of nucleotide subunits, each composed of: a five-carbon sugar (2-deoxyribose), a phosphate group, and one of the four bases adenine, cytosine, guanine, and thymine.
Two chains of DNA twist around each other to form a DNA double helix with the phosphate-sugar backbone spiraling around the outside, and the bases pointing inwards with adenine base pairing to thymine and guanine to cytosine.
The specificity of base pairing occurs because adenine and thymine align to form two hydrogen bonds, whereas cytosine and guanine form three hydrogen bonds.
The two strands in a double helix must, therefore, be complementary, with their sequence of bases matching such that the adenines of one strand are paired with the thymines of the other strand, and so on.
Due to the chemical composition of the pentose residues of the bases, DNA strands have directionality.
The two strands of a double-helix run in opposite directions.
Nucleic acid synthesis, including DNA replication and transcription occurs in the 5'→3' direction, because new nucleotides are added via a dehydration reaction that uses the exposed 3' hydroxyl as a nucleophile.
The expression of genes encoded in DNA begins by transcribing the gene into RNA, a second type of nucleic acid that is very similar to DNA, but whose monomers contain the sugar ribose rather than deoxyribose.
RNA molecules are less stable than DNA and are typically single-stranded.
The genetic code is nearly the same for all known organisms.
A chromosome consists of a single, very long DNA helix on which thousands of genes are encoded.
The region of the chromosome at which a particular gene is located is called its locus.
Each locus contains one allele of a gene; however, members of a population may have different alleles at the locus, each with a slightly different gene sequence.
The majority of eukaryotic genes are stored on a set of large, linear chromosomes.
DNA packaged and condensed in this way is called chromatin.
The manner in which DNA is stored on the histones, as well as chemical modifications of the histone itself, regulate whether a particular region of DNA is accessible for gene expression.
In addition to genes, eukaryotic chromosomes contain sequences involved in ensuring that the DNA is copied without degradation of end regions and sorted into daughter cells during cell division: replication origins, telomeres and the centromere.
Replication origins are the sequence regions where DNA replication is initiated to make two copies of the chromosome.
Telomeres are long stretches of repetitive sequences that cap the ends of the linear chromosomes and prevent degradation of coding and regulatory regions during DNA replication.
The length of the telomeres decreases each time the genome is replicated and has been implicated in the aging process.
Similarly, some eukaryotic organelles contain a remnant circular chromosome with a small number of genes.
Prokaryotes sometimes supplement their chromosome with additional small circles of DNA called plasmids, which usually encode only a few genes and are transferable between individuals.
Whereas the chromosomes of prokaryotes are relatively gene-dense, those of eukaryotes often contain regions of DNA that serve no obvious function.
Simple single-celled eukaryotes have relatively small amounts of such DNA, whereas the genomes of complex multicellular organisms, including humans, contain an absolute majority of DNA without an identified function.
This DNA has often been referred to as "junk DNA".
However, more recent analyses suggest that, although protein-coding DNA makes up barely 2% of the human genome, about 80% of the bases in the genome may be expressed, so the term "junk DNA" may be a misnomer.
Structure and function
These include DNA regions that are not transcribed as well as untranslated regions of the RNA.
Flanking the open reading frame, genes contain a regulatory sequence that is required for their expression.
First, genes require a promoter sequence.
A gene can have more than one promoter, resulting in messenger RNAs (mRNA) that differ in how far they extend in the 5' end.
Highly transcribed genes have "strong" promoter sequences that form strong associations with transcription factors, thereby initiating transcription at a high rate.
Others genes have "weak" promoters that form weak associations with transcription factors and initiate transcription less frequently.
Additionally, genes can have regulatory regions many kilobases upstream or downstream of the open reading frame that alter expression.
These act by binding to transcription factors which then cause the DNA to loop so that the regulatory sequence (and bound transcription factor) become close to the RNA polymerase binding site.
For example, enhancers increase transcription by binding an activator protein which then helps to recruit the RNA polymerase to the promoter; conversely silencers bind repressor proteins and make the DNA less available for RNA polymerase.
Many prokaryotic genes are organized into operons, with multiple protein-coding sequences that are transcribed as a unit.
The term cistron in this context is equivalent to gene.
The transcription of an operon's mRNA is often controlled by a repressor that can occur in an active or inactive state depending on the presence of specific metabolites.
When active, the repressor binds to a DNA sequence at the beginning of the operon, called the operator region, and represses transcription of the operon; when the repressor is inactive transcription of the operon can occur (see e.g. Lac operon).
The products of operon genes typically have related functions and are involved in the same regulatory network.
Defining exactly what section of a DNA sequence comprises a gene is difficult.
Regulatory regions of a gene such as enhancers do not necessarily have to be close to the coding sequence on the linear molecule because the intervening DNA can be looped out to bring the gene and its regulatory region into proximity.
Similarly, a gene's introns can be much larger than its exons.
Regulatory regions can even be on entirely different chromosomes and operate in trans to allow regulatory regions on one chromosome to come in contact with target genes on another chromosome.
Early work in molecular genetics suggested the concept that one gene makes one protein.
This concept (originally called the one gene-one enzyme hypothesis) emerged from an influential 1941 paper by George Beadle and Edward Tatum on experiments with mutants of the fungus Neurospora crassa.
Norman Horowitz, an early colleague on the Neurospora research, reminisced in 2004 that “these experiments founded the science of what Beadle and Tatum called biochemical genetics.
In actuality they proved to be the opening gun in what became molecular genetics and all the developments that have followed from that.” The one gene-one protein concept has been refined since the discovery of genes that can encode multiple proteins by alternative splicing and coding sequences split in short section across the genome whose mRNAs are concatenated by trans-splicing.
A broad operational definition is sometimes used to encompass the complexity of these diverse phenomena, where a gene is defined as a union of genomic sequences encoding a coherent set of potentially overlapping functional products.
This definition categorizes genes by their functional products (proteins or RNA) rather than their specific DNA loci, with regulatory elements classified as gene-associated regions.
Main article: Gene expression
In all organisms, two steps are required to read the information encoded in a gene's DNA and produce the protein it specifies.
Second, that mRNA is translated to protein.
RNA-coding genes must still go through the first step, but are not translated into protein.
The nucleotide sequence of a gene's DNA specifies the amino acid sequence of a protein through the genetic code.
Sets of three nucleotides, known as codons, each correspond to a specific amino acid.
The principle that three sequential bases of DNA code for each amino acid was demonstrated in 1961 using frameshift mutations in the rIIB gene of bacteriophage T4 (see Crick, Brenner et al. ). experiment
There are 64 possible codons (four possible nucleotides at each of three positions, hence 4 possible codons) and only 20 standard amino acids; hence the code is redundant and multiple codons can specify the same amino acid.
The correspondence between codons and amino acids is nearly universal among all known living organisms.
The mRNA acts as an intermediate between the DNA gene and its final protein product.
The gene's DNA is used as a template to generate a complementary mRNA.
To initiate transcription, the polymerase first recognizes and binds a promoter region of the gene.
Thus, a major mechanism of gene regulation is the blocking or sequestering the promoter region, either by tight binding by repressor molecules that physically block the polymerase or by organizing the DNA so that the promoter region is not accessible.
In eukaryotes, transcription occurs in the nucleus, where the cell's DNA is stored.
Alternative splicing mechanisms can result in mature transcripts from the same gene having different sequences and thus coding for different proteins.
This is a major form of regulation in eukaryotic cells and also occurs in some prokaryotes.
Translation is carried out by ribosomes, large complexes of RNA and protein responsible for carrying out the chemical reactions to add new amino acids to a growing polypeptide chain by the formation of peptide bonds.
Each tRNA has three unpaired bases known as the anticodon that are complementary to the codon it reads on the mRNA.
When the tRNA binds to its complementary codon in an mRNA strand, the ribosome attaches its amino acid cargo to the new polypeptide chain, which is synthesized from amino terminus to carboxyl terminus.
A cell regulates its gene expression depending on its external environment (e.g. available nutrients, temperature and other stresses), its internal environment (e.g. cell division cycle, metabolism, infection status), and its specific role if in a multicellular organism.
A typical protein-coding gene is first copied into RNA as an intermediate in the manufacture of the final protein product.
RNA-mediated epigenetic inheritance has also been observed in plants and very rarely in animals.
Organisms inherit their genes from their parents.
Asexual organisms simply inherit a complete copy of their parent's genome.
Sexual organisms have two copies of each chromosome because they inherit one complete set from each parent.
Each gene specifies a particular trait with a different sequence of a gene (alleles) giving rise to different phenotypes.
Most eukaryotic organisms (such as the pea plants Mendel worked on) have two alleles for each trait, one inherited from each parent.
Alleles at a locus may be dominant or recessive; dominant alleles give rise to their corresponding phenotypes when paired with any other allele for the same trait, whereas recessive alleles give rise to their corresponding phenotype only when paired with another copy of the same allele.
If you know the genotypes of the organisms, you can determine which alleles are dominant and which are recessive.
For example, if the allele specifying tall stems in pea plants is dominant over the allele specifying short stems, then pea plants that inherit one tall allele from one parent and one short allele from the other parent will also have tall stems.
Although Mendelian inheritance remains a good model for many traits determined by single genes (including a number of well-known genetic disorders) it does not include the physical processes of DNA replication and cell division.
DNA replication and cell division
Because the DNA double helix is held together by base pairing, the sequence of one strand completely specifies the sequence of its complement; hence only one strand needs to be read by the enzyme to produce a faithful copy.
The process of DNA replication is semiconservative; that is, the copy of the genome inherited by each daughter cell contains one original and one newly synthesized strand of DNA.
The rate of DNA replication in living cells was first measured as the rate of phage T4 DNA elongation in phage-infected E. coli and found to be impressively rapid.
During the period of exponential DNA increase at 37 °C, the rate of elongation was 749 nucleotides per second.
After DNA replication is complete, the cell must physically separate the two copies of the genome and divide into two distinct membrane-bound cells.
In prokaryotes (bacteria and archaea) this usually occurs via a relatively simple process called binary fission, in which each circular genome attaches to the cell membrane and is separated into the daughter cells as the membrane invaginates to split the cytoplasm into two membrane-bound portions.
Binary fission is extremely fast compared to the rates of cell division in eukaryotes.
Eukaryotic cell division is a more complex process known as the cell cycle; DNA replication occurs during a phase of this cycle known as S phase, whereas the process of segregating chromosomes and splitting the cytoplasm occurs during M phase.
The duplication and transmission of genetic material from one generation of cells to the next is the basis for molecular inheritance and the link between the classical and molecular pictures of genes.
Organisms inherit the characteristics of their parents because the cells of the offspring contain copies of the genes in their parents' cells.
During the process of meiotic cell division, an event called genetic recombination or crossing-over can sometimes occur, in which a length of DNA on one chromatid is swapped with a length of DNA on the corresponding homologous non-sister chromatid.
This can result in reassortment of otherwise linked alleles.
The Mendelian principle of independent assortment asserts that each of a parent's two genes for each trait will sort independently into gametes; which allele an organism inherits for one trait is unrelated to which allele it inherits for another trait.
This is in fact only true for genes that do not reside on the same chromosome or are located very far from one another on the same chromosome.
The closer two genes lie on the same chromosome, the more closely they will be associated in gametes and the more often they will appear together (known as genetic linkage).
Genes that are very close are essentially never separated because it is extremely unlikely that a crossover point will occur between them.
Main article: Molecular evolution
DNA replication is for the most part extremely accurate, however errors (mutations) do occur.
This means that each generation, each human genome accumulates 1–2 new mutations.
Small mutations can be caused by DNA replication and the aftermath of DNA damage and include point mutations in which a single base is altered and frameshift mutations in which a single base is inserted or deleted.
Additionally, DNA repair mechanisms can introduce mutational errors when repairing physical damage to the molecule.
The repair, even with mutation, is more important to survival than restoring an exact copy, for example when repairing double-strand breaks.
Most different alleles are functionally equivalent, however some alleles can give rise to different phenotypic traits.
Some mutations do not change the amino acid sequence because multiple codons encode the same amino acid (synonymous mutations).
Other mutations can be neutral if they lead to amino acid sequence changes, but the protein still functions similarly with the new amino acid (e.g. conservative mutations).
Genetic disorders are the result of deleterious mutations and can be due to spontaneous mutation in the affected individual, or can be inherited.
These genes appear either from gene duplication within an organism's genome, where they are known as paralogous genes, or are the result of divergence of the genes after a speciation event, where they are known as orthologous genes, and often perform the same or similar functions in related organisms.
It is often assumed that the functions of orthologous genes are more similar than those of paralogous genes, although the difference is minimal.
The relationship between genes can be measured by comparing the sequence alignment of their DNA.
The degree of sequence similarity between homologous genes is called conserved sequence.
Most changes to a gene's sequence do not affect its function and so genes accumulate mutations over time by neutral molecular evolution.
Additionally, any selection on a gene will cause its sequence to diverge at a different rate.
The sequence differences between genes can be used for phylogenetic analyses to study how those genes have evolved and how the organisms they come from are related.
Origins of new genes
The resulting genes (paralogs) may then diverge in sequence and in function.
Sets of genes formed in this way compose a gene family.
Gene duplications and losses within a family are common and represent a major source of evolutionary biodiversity.
Sometimes, gene duplication may result in a nonfunctional copy of a gene, or a functional copy may be subject to mutations that result in loss of function; such nonfunctional genes are called pseudogenes.
"Orphan" genes, whose sequence shows no similarity to existing genes, are less common than gene duplicates.
The human genome contains an estimate 18 to 60 genes with no identifiable homologs outside humans.
Orphan genes arise primarily from either de novo emergence from previously non-coding sequence, or gene duplication followed by such rapid sequence change that the original relationship becomes undetectable.
De novo genes are typically shorter and simpler in structure than most eukaryotic genes, with few if any introns.
Over long evolutionary time periods, de novo gene birth may be responsible for a significant fraction of taxonomically-restricted gene families.
This mechanism is a common source of new genes in prokaryotes, sometimes thought to contribute more to genetic variation than gene duplication.
Number of genes
The genome size, and the number of genes it encodes varies widely between organisms.
Conversely, plants can have extremely large genomes, with rice containing >46,000 protein-coding genes.
The total number of protein-coding genes (the Earth's proteome) is estimated to be 5 million sequences.
Although the number of base-pairs of DNA in the human genome has been known since the 1960s, the estimated number of genes has changed over time as definitions of genes, and methods of detecting them have been refined.
Initial theoretical predictions of the number of human genes were as high as 2,000,000.
Early experimental measures indicated there to be 50,000–100,000 transcribed genes (expressed sequence tags).
Subsequently, the sequencing in the Human Genome Project indicated that many of these transcripts were alternative variants of the same genes, and the total number of protein-coding genes was revised down to ~20,000 with 13 genes encoded on the mitochondrial genome.
With the GENCODE annotation project, that estimate has continued to fall to 19,000.
Every multicellular organism has all its genes in each cell of its body but not every gene functions in every cell .
Main article: Essential gene
Essential genes are the set of genes thought to be critical for an organism's survival.
This definition assumes the abundant availability of all relevant nutrients and the absence of environmental stress.
Only a small portion of an organism's genes are essential.
In the budding yeast Saccharomyces cerevisiae the number of essential genes is slightly higher, at 1000 genes (~20% of their genes).
Although the number is more difficult to measure in higher eukaryotes, mice and humans are estimated to have around 2000 essential genes (~10% of their genes).
The synthetic organism, Syn 3, has a minimal genome of 473 essential genes and quasi-essential genes (necessary for fast growth), although 149 have unknown function.
Genetic and genomic nomenclature
Gene nomenclature has been established by the HUGO Gene Nomenclature Committee (HGNC), a committee of the Human Genome Organisation, for each known human gene in the form of an approved gene name and symbol (short-form abbreviation), which can be accessed through a database maintained by HGNC.
Symbols are chosen to be unique, and each gene has only one symbol (although approved symbols sometimes change).
Main article: Genetic engineering
Since the 1970s, a variety of techniques have been developed to specifically add, remove and edit genes in an organism.
The related term synthetic biology is sometimes used to refer to extensive genetic engineering of an organism.
Genetic engineering is now a routine research tool with model organisms.
However, the genomes of cells in an adult organism can be edited using gene therapy techniques to treat genetic diseases.
Credits to the contents of this page go to the authors of the corresponding Wikipedia page: en.wikipedia.org/wiki/Gene.