Phylogenetic tree
A phylogenetic tree or evolutionary tree is a branching diagram or "tree" showing the evolutionary relationships among various biological species or other entities—their phylogeny (/faɪˈlɒdʒəni/)—based upon similarities and differences in their physical or genetic characteristics.
All life on Earth is part of a single phylogenetic tree, indicating common ancestry.
In a rooted phylogenetic tree, each node with descendants represents the inferred most recent common ancestor of those descendants, and the edge lengths in some trees may be interpreted as time estimates.
Each node is called a taxonomic unit.
Internal nodes are generally called hypothetical taxonomic units, as they cannot be directly observed.
Trees are useful in fields of biology such as bioinformatics, systematics, and phylogenetics.
Unrooted trees illustrate only the relatedness of the leaf nodes and do not require the ancestral root to be known or inferred.
History
The idea of a "tree of life" arose from ancient notions of a ladder-like progression from lower into higher forms of life (such as in the Great Chain of Being).
Early representations of "branching" phylogenetic trees include a "paleontological chart" showing the geological relationships among plants and animals in the book Elementary Geology, by Edward Hitchcock (first edition: 1840).
Charles Darwin (1859) also produced one of the first illustrations and crucially popularized the notion of an evolutionary "tree" in his seminal book The Origin of Species.
Over a century later, evolutionary biologists still use tree diagrams to depict evolution because such diagrams effectively convey the concept that speciation occurs through the adaptive and semirandom splitting of lineages.
Over time, species classification has become less static and more dynamic.
The term phylogenetic, or phylogeny, derives from the two ancient greek words (phûlon), meaning "race, lineage", and (génesis), meaning "origin, source".
Properties
Rooted tree
A rooted phylogenetic tree (see two graphics at top) is a directed tree with a unique node — the root — corresponding to the (usually imputed) most recent common ancestor of all the entities at the leaves of the tree.
The root node does not have a parent node, but serves as the parent of all other nodes in the tree.
The root is therefore a node of degree 2, while other internal nodes have a minimum degree of 3 (where "degree" here refers to the total number of incoming and outgoing edges).
The most common method for rooting trees is the use of an uncontroversial outgroup—close enough to allow inference from trait data or molecular sequencing, but far enough to be a clear outgroup.
Unrooted tree
Unrooted trees illustrate the relatedness of the leaf nodes without making assumptions about ancestry.
They do not require the ancestral root to be known or inferred.
Unrooted trees can always be generated from rooted ones by simply omitting the root.
By contrast, inferring the root of an unrooted tree requires some means of identifying ancestry.
This is normally done by including an outgroup in the input data so that the root is necessarily between the outgroup and the rest of the taxa in the tree, or by introducing additional assumptions about the relative rates of evolution on each branch, such as an application of the molecular clock hypothesis.
Bifurcating versus multifurcating
Both rooted and unrooted trees can be either bifurcating or multifurcating.
A rooted bifurcating tree has exactly two descendants arising from each interior node (that is, it forms a binary tree), and an unrooted bifurcating tree takes the form of an unrooted binary tree, a free tree with exactly three neighbors at each internal node.
In contrast, a rooted multifurcating tree may have more than two children at some nodes and an unrooted multifurcating tree may have more than three neighbors at some nodes.
Labeled versus unlabeled
Both rooted and unrooted trees can be either labeled or unlabeled.
A labeled tree has specific values assigned to its leaves, while an unlabeled tree, sometimes called a tree shape, defines a topology only.
Some sequence-based trees built from a small genomic locus, such as Phylotree, feature internal nodes labeled with inferred ancestral haplotypes.
Enumerating trees
The number of possible trees for a given number of leaf nodes depends on the specific type of tree, but there are always more labeled than unlabeled trees, more multifurcating than bifurcating trees, and more rooted than unrooted trees.
The last distinction is the most biologically relevant; it arises because there are many places on an unrooted tree to put the root.
For bifurcating labeled trees, the total number of rooted trees is:
For bifurcating labeled trees, the total number of unrooted trees is:
Labeled
leaves |
Binary
unrooted trees |
Binary
rooted trees |
Multifurcating
rooted trees |
All possible
rooted trees |
---|---|---|---|---|
1 | 1 | 1 | 0 | 1 |
2 | 1 | 1 | 0 | 1 |
3 | 1 | 3 | 1 | 4 |
4 | 3 | 15 | 11 | 26 |
5 | 15 | 105 | 131 | 236 |
6 | 105 | 945 | 1,807 | 2,752 |
7 | 945 | 10,395 | 28,813 | 39,208 |
8 | 10,395 | 135,135 | 524,897 | 660,032 |
9 | 135,135 | 2,027,025 | 10,791,887 | 12,818,912 |
10 | 2,027,025 | 34,459,425 | 247,678,399 | 282,137,824 |
Special tree types
Construction
Main article: Computational phylogenetics
Phylogenetic trees composed with a nontrivial number of input sequences are constructed using computational phylogenetics methods.
Distance-matrix methods such as neighbor-joining or UPGMA, which calculate genetic distance from multiple sequence alignments, are simplest to implement, but do not invoke an evolutionary model.
Many sequence alignment methods such as ClustalW also create trees by using the simpler algorithms (i.e. those based on distance) of tree construction.
Maximum parsimony is another simple method of estimating phylogenetic trees, but implies an implicit model of evolution (i.e. parsimony).
More advanced methods use the optimality criterion of maximum likelihood, often within a Bayesian framework, and apply an explicit model of evolution to phylogenetic tree estimation.
Identifying the optimal tree using many of these techniques is NP-hard, so heuristic search and optimization methods are used in combination with tree-scoring functions to identify a reasonably good tree that fits the data.
Tree-building methods can be assessed on the basis of several criteria:
- efficiency (how long does it take to compute the answer, how much memory does it need?)
- power (does it make good use of the data, or is information being wasted?)
- consistency (will it converge on the same answer repeatedly, if each time given different data for the same model problem?)
- robustness (does it cope well with violations of the assumptions of the underlying model?)
- falsifiability (does it alert us when it is not good to use, i.e. when assumptions are violated?)
Tree-building techniques have also gained the attention of mathematicians.
Trees can also be built using T-theory.
File formats
Images
General
- An overview of different methods of tree visualization is available at Page, R. D. M. (2011). "Space, time, form: Viewing the Tree of Life". Trends in Ecology & Evolution. 27 (2): 113–120. doi:. PMID .
- An interactive tree based on the U.S. National Science Foundation's Assembling the Tree of Life Project
- This is a programming library to analyze, manipulate and visualize phylogenetic trees.
- Fang, H.; Oates, M. E.; Pethica, R. B.; Greenwood, J. M.; Sardar, A. J.; Rackham, O. J. L.; Donoghue, P. C. J.; Stamatakis, A.; De Lima Morais, D. A.; Gough, J. (2013). . Scientific Reports. 3: 2015. Bibcode:. doi:. PMC . PMID .
Credits to the contents of this page go to the authors of the corresponding Wikipedia page: en.wikipedia.org/wiki/Phylogenetic tree.