Genome analyses are delivering unprecedented amounts of data from an abundance of organisms, raising expectations that in the near future, resolving the tree of life (TOL) will simply be a matter of data collection. However, recent analyses of some key clades in life's history have produced bushes and not resolved trees. The patterns observed in these clades are both important signals of biological history and symptoms of fundamental challenges that must be confronted. Here we examine how the combination of the spacing of cladogenetic events and the high frequency of independently evolved characters (homoplasy) limit the resolution of ancient divergences. Because some histories may not be resolvable by even vast increases in amounts of conventional data, the identification of new molecular characters will be crucial to future progress.
“… there is, after all, one true tree of life, the unique pattern of evolutionary branchings that actually happened. It exists. It is in principle knowable. We don't know it all yet. By 2050 we should – or if we do not, we shall have been defeated only at the terminal twigs, by the sheer number of species.”
Richard Dawkins 
Who are tetrapods' closest living relatives? Which is the earliest-branching animal phylum? Answers to such fundamental questions would be easy if the historical connections among all living organisms in the TOL were known. Obtaining an accurate depiction of the evolutionary history of all living organisms has been and remains one of biology's great challenges.
The discipline primarily responsible for assembling the TOL—molecular systematics—has produced many new insights by illuminating episodes in life's history, posing new hypotheses, as well as providing the evolutionary framework within which new discoveries can be interpreted . Molecular systematics has surmounted the confusion stemming from comparisons of morphologically disparate species to reveal unexpected evolutionary relationships such as the Afrotheria, a clade composed of strikingly different mammals including elephants, aardvarks, manatees, and golden moles . It has also aided the placement of the history of life in a temporal framework, shedding light on key evolutionary events independently of—and in many cases well before—the availability of fossil or biogeographic evidence. A notable example is the discovery that the Hawaiian drosophilid lineage predates by many million years the oldest extant Hawaiian island, having originated on islands now submerged .
With such powers in mind, for the casual reader of the phylogenetics literature, the contents table of the May 2005 issue of Molecular Biology and Evolution may be somewhat bewildering. Two articles only a few pages apart paradoxically provide evidence for both rejecting  and corroborating  the existence of Ecdysozoa, a metazoan clade uniting moulting phyla such as arthropods and nematodes. Surely, (at least) one of these studies must be wrong; and yet, identifying which is not as straightforward as one might think. Cases like the Ecdysozoa are a common sight in the molecular systematics literature [2,3,7–12]. How can it be that despite the availability of large amounts of data and powerful statistical techniques, evolutionary trees upon which experts agree have not been reached?
Here we discuss how and why certain critical parts of the TOL may be difficult to resolve, regardless of the quantity of conventional data available. We do not mean this essay to be a comprehensive review of molecular systematics. Rather, we have focused on the emerging evidence from genome-scale studies on several branches of the TOL that sharply contrasts with viewpoints—such as that in the opening quotation—which imply that the assembly of all branches of the TOL will simply be a matter of data collection. We view this difficulty in obtaining full resolution of particular clades—when given substantial data—as both biologically informative and a pressing methodological challenge. The recurring discovery of persistently unresolved clades (bushes) should force a re-evaluation of several widely held assumptions of molecular systematics. Now, as the field is transformed from a data-limited to an analysis-limited discipline, it is an opportune time to do so.
Stems and Branches: Trees and Bushes
The TOL has been molded by cladogenesis and extinction. Starting from a single lineage that undergoes cladogenesis and splits into two, the rate at which the lineages arising from this cladogenetic event undergo further cladogenetic events determines the lengths of the nascent stems. Once these stems have been generated, the only process that can modify their lengths is extinction. At its core, the elucidation of evolutionary relationships is the identification, through statistical means, of the tree's stems.
It is vital to appreciate that cladogenetic events typically begin as inconspicuous divergences between very similar populations. The subsequent divergences in phenotypic appearances are not phylogenetically informative. This is especially important to bear in mind for extant representatives of clades (Box 1) that originated hundreds of million years ago, in deep time. These forms represent the end products of long series of evolutionary changes . The features by which we recognize these clades today have succeeded the cladogenetic events we are trying to disentangle; their current divergence in body-plan architecture will be uninformative as to the time spans and branching order of the stems separating these clades.
Box 1. Glossary
Clade: A group of organisms is considered a clade when it includes all and only all of the descendants arising from a most recent common ancestor.
Homoplasy: Shared characters found in different branches of a phylogenetic tree not directly inherited from a common ancestor; these may arise by chance or selection.
Horizontal gene transfer: The occurrence of transfer of genes between genetically isolated populations or species . Gene transfer obscures the evolutionary history of organisms, because the phylogenies of genes that have undergone transfer differ from the overlying species phylogeny.
Hybridization: The occurrence of gene flow between genetically isolated populations .
Lineage sorting: The process by which incomplete sorting of ancestrally polymorphic alleles of molecular characters leads to character histories differing from the species' history. Lineage sorting typically occurs in stems spanning less than 2–3 million years, the exact time span being determined by population size and generation time.
Long-branch attraction: When the branches leading to certain species are very long, the rate of occurrence of parallel and convergent substitutions at these long branches can become sufficiently high and overwhelm the true historical signal at the stems .
Parsimony-informative characters (PICs): Those characters in a dataset that have two or more states that are each present in more than one species in the dataset. In a parsimony framework, the distribution of PICs determines the optimal phylogeny.
Rare genomic changes (RGCs): Rare mutational events—such as retroposon integrations , insertions and deletions in coding sequences , and gains and losses of introns —that generally exhibit lower levels of homoplasy, because they are less likely to occur in the same precise way independently .
In the course of evolution, the relative rates of cladogenesis and extinction have differed enormously across clades , resulting in different tree shapes (Figure 1A). For example, the occurrence of cladogenetic events at widely spaced intervals generates clades characterized by long stems, and as time elapses, the phylogeny acquires a tree-like shape. In contrast, a radiation where a series of cladogenetic events occurs within a short time span generates a clade characterized by short stems. As the elapsed time since the radiation increases, the external branches lengthen and the phylogeny becomes bush-like.
The Shape of a Clade Influences its Resolvability
The relative shape of clades is a key determinant of the prospects for the accurate reconstruction of their history . This is because the amount of signal for a given stem is finite and proportional to the time span of the stem in question . In a parsimony framework—which we illustrate here for simplicity—the signal for a given stem essentially equals the number of parsimony-informative characters (PICs; Box 1) supporting that stem (Figure 1B).
Because molecular characters typically have a few alternative states, the probability of several species acquiring the same nucleotide or amino acid independently (homoplasy; Box 1) is significant and can overwhelm the true historical signal given sufficient time, irrespective of the phylogenetic method used . Bush-shaped clades are characterized by longer external branches relative to the stems, and therefore more homoplastic changes are likely to occur on the external branches , thus generating characters that conflict with the true phylogenetic signal (Figure 1C).
One strategy to circumvent homoplasy has been the use of rare genomic changes (RGCs; Box 1). RGCs have more alternative states and thus are less vulnerable to homoplasy. Their solid support for a clade of cetaceans (whales and dolphins) and hippopotamuses within cetartiodactyls is a stellar example of their power . However, two caveats are worth mentioning in the use of all characters (RGCs as well as linear sequence data) for phylogenetic reconstruction purposes. First, all characters can be subject to horizontal gene transfer [20,21] (Box 1), which obscures organismic phylogenetic history. Second, when stems are short in absolute time span, characters can be influenced by population-level processes, such as the lineage sorting of ancestral polymorphisms  and hybridization  (Box 1). In all such cases, there is not a single true molecular phylogeny, because the species' DNA record is an amalgam of different evolutionary histories.
Thus, absolutely or relatively short stems present distinct challenges that could be described as the bane of the molecular systematist. Yet, it is precisely these stems—associated with some of the most interesting episodes in life's history—that most intrigue the evolutionist. Analyses of large molecular datasets from clades at different time depths of the TOL illustrate how short stems, whether placed just 6 million or 600 million years in the past, can confound phylogenetic resolution. Below, we describe four exemplar stems and dissect the major factors hindering phylogenetic resolution.
Bushes in the Tree of Life
The gorilla/chimp/human tree (5–8 million years ago). Whereas genomic analyses have shown that at the species level, chimpanzees are humans' closest relatives , many of the genes and genomic segments examined have followed different evolutionary paths [24–26]. Specifically, analyses of almost 100 genes (under two different optimality criteria) show that ~55% of genes support a human-chimpanzee clade, 40% are evenly split among the two alternative topologies, with the remaining genes being uninformative [25,26] (Figure 2A). Similarly, whereas 76% of PICs from a genome-scale survey support a human–chimpanzee clade, 24% of PICs disagree  (Figure 2A).
Four Notable Bushes at Different Temporal Depths of the TOL
What can account for this conflict in such a recent clade? The short stem (~2 million years) leading to the human–chimpanzee clade strongly suggests that the culprit is lineage sorting [24,26]. The number of homoplastic characters are also surprising for a young clade, accounting for up to 32% of the conflict present in the PICs . Transposon-insertion RGCs also offer support for the human–chimpanzee clade  (Figure 2A), but even these data include one character that conflicts with the species tree—yet another indicator of lineage sorting. And this may be too simplistic a view of how humans split from their primate relatives; the spatial distribution of genetic variation in primate genomes has raised the possibility of hybridization between the human and chimp lineages .
The phylogenetic patterns observed in these primates are by no means a unique circumstance on the TOL. Clades of similar age also exhibit multiple gene genealogies [28,29]. Given the complexity of the cladogenetic process revealed by the study of these young clades and the difficulties encountered in reconstructing their history, one can begin to anticipate the challenge of resolving clades with similar short stems but that originated deeper in time.
The elephant/sirenian/hyrax bush (57–65 million years ago). The relationships among elephants, sirenians, and hyraxes are uncertain, despite the availability of substantial amounts and kinds of molecular data  (Figure 2B). Data from 20 nuclear genes have failed to resolve this stem [3,30], because only a handful of PICs are available to weigh on the problem  (Figure 2B). Most other mammalian stems at similar evolutionary depths are supported by many more PICs. Furthermore, only a single RGC has been identified for this stem  —again contrasting with the many RGCs identified for other stems at similar evolutionary depths. Crucially, the phylogeny supported by nuclear PICs  conflicts with the phylogeny supported by the single RGC , which in turn conflicts with the phylogeny supported by mitochondrial PICs  (Figure 2B). The DNA record suggests that the three lineages split off from each other in quick succession, geologically speaking, but the phylogenetic relationships among the three orders cannot be reached at present.
The coelacanth/lungfish/tetrapod bush (370–390 million years ago). The cladogenetic events that gave rise to the tetrapod, coelacanth, and lungfish lineages have also proven difficult to resolve. The analysis of 44 genes (under three different optimality criteria) and the approximately 300 PICs found therein equally support each of the three alternative phylogenies  (Figure 2C). The lack of resolution is again suggestive of a short stem, a finding consistent with fossil evidence indicating that this stem is unlikely to have been longer than approximately 20 million years . The even distribution of the PICs across the three alternative phylogenies  is explained by the even spread of homoplasy across the three long external branches leading to tetrapods, coelacanths, and lungfish. Indeed, this pattern of distribution of PICs is diagnostic of bushy clades . Despite more than a dozen molecular phylogenetic analyses over the last 15 years and the current availability of an abundance of molecular sequence data, our knowledge as to the closest living relative of tetrapods is still uncertain.
The metazoan superbush (>550 million years ago). A similar inability of still larger datasets to resolve cladogenetic patterns is observed among metazoan clades that diverged even farther back in time. Many recent studies have reported support for many alternative conflicting phylogenies [5,6,9,10]. For example, Wolf and colleagues  analyzed 507 genes by maximum likelihood, finding support for Coelomata—a clade that joins phyla possessing a true coelom, such as arthropods and chordates, to the exclusion of phyla without one, such as nematodes (left-most tree in Figure 2D). In contrast, Dopazo and Dopazo  analyzed 610 genes also by maximum likelihood and, after exclusion of genes evolving at a faster rate in nematodes, found support for Ecdysozoa (rightmost tree in Figure 2D).
Three observations generally hold true across metazoan datasets that indicate the pervasive influence of homoplasy at these evolutionary depths. First, a large fraction of single genes produce phylogenies of poor quality. For example, Wolf and colleagues  omitted 35% of single genes from their data matrix, because those genes produced phylogenies at odds with conventional wisdom (Figure 2D). Second, in all studies, a large fraction of characters—genes, PICs or RGCs—disagree with the optimal phylogeny, indicating the existence of serious conflict in the DNA record. For example, the majority of PICs conflict with the optimal topology in the Dopazo and Dopazo study . Third, the conflict among these and other studies in metazoan phylogenetics [11,12] is occurring at very “high” taxonomic levels—above or at the phylum level.
The problems illustrated by these four clades are representative of those encountered at a variety of time depths across the TOL [2,7,11,12,33]. What is exceptional about these clades is that they have received the greatest data collection efforts and analysis. The persistent resolution of problems in the face of (a) increasing amounts and different kinds of data and (b) state-of-the-art analytical methodology suggest that other less–well analyzed, absolutely or relatively short stems in the TOL may pose similar challenges and be refractory to resolution with comparable datasets.
Why Hundreds of Genes Might Not Suffice
Excess homoplasy and the limits of phylogenetic resolution. Analyses of the four exemplar stems point to homoplasy as a major contributor to the observed lack of resolution. Homoplasy has long been appreciated in theoretical phylogenetics, with much effort invested into understanding its causes and providing corrections for them . However, the observed patterns (Figure 2) give cause for concern that the extent of homoplasy is much greater than expected under widely accepted models of sequence evolution and that the attendant consequences for the limits to phylogenetic resolution are not sufficiently appreciated.
For instance, theory  and simulation analyses  predict that a small fraction of substitutions will be homoplastic by chance (about 2–5%, depending upon model assumptions and evolutionary distances). However, analysis of the elephant/sirenian/hyrax dataset and the coelacanth/lungfish/ tetrapod dataset indicates that the actual level of homoplasy is ~10% of amino acid substitutions in the first case (178 homoplastic/1,743 total substitutions) and ~15% in the second case (588 homoplastic/3,800 total substitutions), several times greater than expected [8,34]. Similar high levels of homoplasy exist in datasets from other bushy clades  (unpublished data) and hold irrespective of analytical methodology .
Many processes bias molecular evolution—such as deviation in amino acid composition [36,37], unequal rates of evolution across sites  or lineages , nonindependent substitutions  and selection —and increase levels of homoplasy and compound the challenge of accurate reconstruction . Although we may be uncertain at present as to the causes of homoplasy, there are substantial grounds for considering the role of selection . Purifying selection has been shown to constrain what changes are permitted at variable sites [36,43]. Furthermore, recent studies indicate that a significant fraction of genes [44,45], including many genes commonly used for molecular systematics [36,43,46–48], has been shaped by positive selection, accounting for perhaps 35–45% of all amino acid substitutions . The high levels of homoplasy observed may be the outcome of the action of selection on the proteome [36,47,49].
No matter what the causes, the consequence of greater-than-expected levels of homoplasy is the imposition of even greater limits on the resolution of clades in deep time. Homoplasy on the external branches can swamp the signal on the stems . For example, if only ~5% of substitutions are homoplastic, then a practical limit to stem resolution is reached when the ratio of external branch to stem length exceeds 20:1. Although the effect of homoplasy on phylogenetic reconstruction may be reduced by the addition of taxa [50,51], this is not always so [52–54]. Perhaps more importantly, several lineages exist for which no additional species can be sampled (Figure 2B and 2C). Thus, the accurate resolution of a <20-million-year-long stem in a 400-million-year-old clade (Figure 2C) or a <30-million-year-long stem in a 600-million-year-old clade (Figure 2D) may not be possible with current practices [33,55].
Barking up the wrong trees: Systematic bias in large datasets. A second major consequence of homoplasy is the risk of systematic bias in large dataset analyses. Specifically, long external branches typically harbor high levels of homoplasy, which can positively mislead phylogenetic inference , leading to the well-known phenomenon of long-branch attraction (Box 1). Therefore, when levels of homoplasy are high, caution must be used in interpreting high clade-support values. For example, in the case of metazoan superclades (Figure 2D) what has been reported in two different studies is not a lack of resolution but two apparently well supported but contradicting phylogenies.
A simple numerical example illustrates the issue. Consider a dataset in which 53 PICs support one phylogeny—call it phylogeny A—and 47 PICs support phylogeny B, which is in conflict with phylogeny A. After crunching the numbers, it can be shown that phylogeny A will be supported by a bootstrap value of ~72%. Now consider what happens to clade support if the character set is expanded but the proportion of PICs supporting each phylogeny remains the same. With 530 PICs supporting phylogeny A and 470 PICs supporting phylogeny B, the bootstrap value obtained in support of phylogeny A will increase to ~97%. Thus, given that investigations of metazoan clades use genome-scale datasets, the recovery of 100% support is not surprising. However, although it is natural to place confidence in such high support values, one must be wary when the number of homoplastic characters is high. Small differences between study designs—such as in dataset construction and the selection of characters or genes analyzed—skew the distribution of PICs and produce the observed absolute support for conflicting clade phylogenies [5,6,9–12]. Thus, a priori expectations of obtaining fully resolved topologies  combined with the use of large amounts of data (which generate high support values) can make trees out of bushes.
What Will it Take to See the Trees?
Can we realistically hope to resolve diversification events spanning a few or even tens of millions of years that occurred in deep time? It is widely accepted that nucleotide data are of limited use for resolving deep divergences because of mutational saturation and homoplasy . Until the recent expansion in available data, it has not been possible to fully explore what the limits of the protein record might be. Like others in the field [5,8,9], we also had expectations that scaling up dataset size would be sufficient to resolve interesting groups [29,33]. The evidence presented here suggests that large amounts of conventional characters will not always suffice, even if analyzed by state-of-the-art methodology. Just as it would be futile to use radioisotopes with modest half lives to date ancient rocks, it appears unrealistic to expect conventional linear, homoplasy-sensitive sequences to reliably resolve series of events that transpired in a small fraction of deep time. Although we have known this from theory , we are now confronted with the actual pattern of molecular evolution.
We see two urgent priorities for the endeavour to assemble the TOL to succeed. First, the prevalence and causes of homoplasy need to be better understood so that improved models of molecular evolution that account for the noise in the protein record may be developed. It is perhaps indicative of the degree of difficulty involved in reconciling observed patterns in the molecular record with theoretical expectations that the area of theoretical phylogenetics is one in which much effort and progress has been made in recent years [18,59–61]. Second, molecular systematics must now move beyond conventional characters and mine genomic data for new, less-homoplastic characters such as RGCs .
What's Wrong with Bushes?
The identification of clades is of fundamental importance to molecular systematics . It is perhaps for this reason that over the years, systematists have emphasized reconstructing the topology of trees, while placing much less emphasis on the temporal information conveyed by unresolved stems. Currently, phylogenetic bushes are considered experimental failures. But that is seeing the glass as half empty. A bush in which series of cladogenetic events lie crammed and unresolved within a small section of a larger tree does harbour historical information [33,56]. Although it may be heresy to say so, it could be argued that knowing that strikingly different groups form a clade and that the time spans between the branching of these groups must have been very short, makes the knowledge of the branching order among groups potentially a secondary concern.
For example, the lack of phylogenetic resolution at the base of the tetrapod/lungfish/coelacanth clade has not hampered in the least evolutionary research on the anatomical changes that occurred early on in the evolution of the tetrapod lineage [64,65]. Similarly, if the origin of most bilaterian phyla was compressed in time , more than 550 million years later it may matter little to know the exact relationships between most phyla to understand the evolution of the molecular tool kit that enabled the evolution of the body plans of the 35 or so animal phyla [66–68].
We submit that if the current efforts to assemble the TOL have, by 2050 (if not much sooner), assembled an arborescent bush of life, Dawkins' prediction will have come to fruition.
We thank Benjamin Prud'homme, Barry Williams, W. Ford Doolittle, and an anonymous referee for comments on the manuscript.
|TOL||tree of life|
|RGC||rare genomic change|
Competing interests. The authors have declared that no competing interests exist.
Antonis Rokas is Research Scientist at The Broad Institute of MIT and Harvard, Microbial Genome Analysis and Annotation, Cambridge, Massachusetts, United States of America. Sean B. Carroll is Investigator at the Howard Hughes Medical Institute and Professor at the University of Wisconsin Madison, R. M. Bock Laboratories, Madison, Wisconsin, United States of America.
Funding. SBC is supported by the Howard Hughes Medical Institute.
- Dawkins R. A devil's chaplain. New York: Houghton Mifflin; 2003. p. 272.
- Cracraft J, Donoghue MJ. Assembling the tree of life. Oxford: Oxford University Press; 2004. p. 576.
- Nishihara H, Satta Y, Nikaido M, Thewissen JGM, Stanhope MJ, et al. A retroposon analysis of Afrotherian phylogeny. Mol Biol Evol. 2005;22:1823–1833.[PubMed]
- Beverley SM, Wilson AC. Ancient origin for Hawaiian Drosophilinae inferred from protein comparisons. Proc Natl Acad Sci USA. 1985;82:4753–4757.[PMC free article][PubMed]
- Philippe H, Lartillot N, Brinkmann H. Multigene analyses of bilaterian animals corroborate the monophyly of Ecdysozoa, Lophotrochozoa and Protostomia. Mol Biol Evol. 2005;22:1246–1253.[PubMed]
- Philip GK, Creevey CJ, McInerney JO. The Opisthokonta and the Ecdysozoa may not be clades: Stronger support for the grouping of plant and animal than for animal and fungi and stronger support for the Coelomata than Ecdysozoa. Mol Biol Evol. 2005;22:1175–1184.[PubMed]
- Lockhart PJ, Penny D. The place of Amborella within the radiation of angiosperms. Trends Plant Sci. 2005;10:201–202.[PubMed]
- Takezaki N, Figueroa F, Zaleska-Rutczynska Z, Takahata N, Klein J. The phylogenetic relationship of tetrapod, coelacanth, and lungfish revealed by the sequences of 44 nuclear genes. Mol Biol Evol. 2004;21:1512–1524.[PubMed]
- Wolf YI, Rogozin IB, Koonin EV. Coelomata and not Ecdysozoa: Evidence from genome-wide phylogenetic analysis. Genome Res. 2004;14:29–36.[PMC free article][PubMed]
- Dopazo H, Dopazo J. Genome-scale evidence of the nematode-arthropod clade. Genome Biol. 2005;6:R41.[PMC free article][PubMed]
- Matus DQ, Copley RR, Dunn CW, Hejnol A, Eccleston H, et al. Broad taxon and gene sampling indicate that chaetognaths are protostomes. Curr Biol. 2006;16:R575–576.[PubMed]
- Marletaz F, Martin E, Perez Y, Papillon D, Caubit X, et al. Chaetognath phylogenomics: A protostome with deuterostome-like development. Curr Biol. 2006;16:R577–578.[PubMed]
- Budd GE, Jensen S. A critical reappraisal of the fossil record of the bilaterian phyla. Biol Rev. 2000;75:253–295.[PubMed]
- Simpson GG. The major features of evolution. New York: Columbia University Press; 1953. p. 434.
- Fiala KI, Sokal RR. Factors determining the accuracy of cladogram estimation: evaluation using computer simulation. Evolution. 1985;39:609–622.
- Lanyon SM. The stochastic mode of molecular evolution: What consequences for systematic investigations? Auk. 1988;105:565–573.
- Huelsenbeck JP. Performance of phylogenetic methods in simulation. Syst Biol. 1995;44:17–48.
- Felsenstein J. Inferring phylogenies. Sunderland (Massachussetts): Sinauer; 2003. p. 664.
- Nikaido M, Rooney AP, Okada N. Phylogenetic relationships among cetartiodactyls based on insertions of short and long interpersed elements: Hippopotamuses are the closest extant relatives of whales. Proc Natl Acad Sci USA. 1999;96:10261–10266.[PMC free article][PubMed]
- Doolittle WF. Phylogenetic classification and the universal tree. Science. 1999;284:2124–2129.[PubMed]
- Gogarten JP. Gene transfer: Gene swapping craze reaches eukaryotes. Curr Biol. 2003;13:R53–54.[PubMed]
- Degnan JH, Rosenberg NA. Discordance of species trees with their most likely gene trees. PLoS Genetics. 2006;2(5):e68. DOI: 10.1371/journal.pgen.0020068. [PMC free article][PubMed]
- Arnold ML. Natural hybridization and evolution. Oxford: Oxford University Press; 1997. p. 232.
- Patterson N, Richter DJ, Gnerre S, Lander ES, Reich D. Genetic evidence for complex speciation of humans and chimpanzees. Nature. 2006;441:1103–1108.[PubMed]
- Satta Y, Klein J, Takahata N. DNA archives and our nearest relative: The trichotomy problem revisited. Mol Phylog Evol. 2000;14:259–275.[PubMed]
- Chen FC, Li WH. Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am J Hum Genet. 2001;68:444–456.[PMC free article][PubMed]
- Salem AH, Ray DA, Xing JC, Callinan PA, Myers JS, et al. Alu elements and hominid phylogenetics. Proc Natl Acad Sci USA. 2003;100:12787–12791.[PMC free article][PubMed]
- Jennings WB, Edwards SV. Speciational history of Australian grass finches (Poephila) inferred from thirty gene trees. Evolution. 2005;59:2033–2047.[PubMed]
- Rokas A, Williams BL, King N, Carroll SB. Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature. 2003;425:798–804.[PubMed]
- Amrine-Madsen H, Koepfli KP, Wayne RK, Springer MS. A new phylogenetic marker, apolipoprotein B, provides compelling evidence for eutherian relationships. Mol Phylogenet Evol. 2003;28:225–240.[PubMed]
- Murata Y, Nikaido M, Sasaki T, Cao Y, Fukumoto Y, et al. Afrotherian phylogeny as inferred from complete mitochondrial genomes. Mol Phylogenet Evol. 2003;28:253–260.[PubMed]
- Clack JA. Gaining ground: The origin and evolution of tetrapods. Bloomington: Indiana University Press; 2002. p. 369.
- Rokas A, Kruger D, Carroll SB. Animal evolution and the molecular signature of radiations compressed in time. Science. 2005;310:1933–1938.[PubMed]
- Zhang J, Kumar S. Detection of convergent and parallel evolution at the amino acid sequence level. Mol Biol Evol. 1997;14:527–536.[PubMed]
- O'Huigin C, Satta Y, Takahata N, Klein J. Contribution of homoplasy and of ancestral polymorphism to the evolution of genes in anthropoid primates. Mol Biol Evol. 2002;19:1501–1513.[PubMed]
- Naylor GJP, Brown WM. Structural biology and phylogenetic estimation. Nature. 1997;388:527–528.[PubMed]
- Hickey DA, Singer GA. Genomic and proteomic adaptations to growth at high temperature. Genome Biol. 2004;5:117.[PMC free article][PubMed]
- Pupko T, Galtier N. A covarion-based method for detecting molecular adaptation: Application to the evolution of primate mitochondrial genomes. Proc R Soc Lond B Biol Sci. 2002;269:1313–1316.[PMC free article][PubMed]
- Felsenstein J. Cases in which parsimony and compatibility methods will be positively misleading. Syst Zool. 1978;27:401–410.
- Averof M, Rokas A, Wolfe KH, Sharp PM. Evidence for a high frequency of simultaneous double-nucleotide substitutions. Science. 2000;287:1283–1286.[PubMed]
- Gillespie JH. The causes of molecular evolution. Oxford: Oxford University Press; 1991. p. 336.
- Sanderson MJ, Shaffer HB. Troubleshooting molecular phylogenetic analyses. Annu Rev Ecol Syst. 2002;33:49–72.
- Wells RS. Excessive homoplasy in an evolutionarily constrained protein. Proc R Soc Lond B Biol Sci. 1996;263:393–400.[PubMed]
- Smith NG, Eyre-Walker A. Adaptive protein evolution in Drosophila. Nature. 2002;415:1022–1024.[PubMed]
- Fay JC, Wyckoff GJ, Wu CI. Testing the neutral theory of molecular evolution with genomic data from Drosophila . Nature. 2002;415:1024–1026.[PubMed]
- Chamary JV, Parmley JL, Hurst LD. Hearing silence: Non-neutral evolution at synonymous sites in mammals. Nat Rev Genet. 2006;7:98–108.[PubMed]
- Bazin E, Glemin S, Galtier N. Population size does not influence mitochondrial genetic diversity in animals. Science. 2006;312:570–572.[PubMed]
- Wang HC, Xia X, Hickey D. Thermal adaptation of the small subunit ribosomal RNA gene: A comparative study. J Mol Evol. 2006;63:120–126.[PubMed]
- Bull JJ, Badgett MR, Wichman HA, Huelsenbeck JP, Hillis DM, et al. Exceptional convergent evolution in a virus. Genetics. 1997;147:1497–1507.[PMC free article][PubMed]
- Hillis DM. Inferring complex phylogenies. Nature. 1996;383:130–131.[PubMed]
- Graybeal A. Is it better to add taxa or characters to a difficult phylogenetic problem? Syst Biol. 1998;47:9–17.[PubMed]
- Rokas A, Carroll SB. More genes or more taxa? The relative contribution of gene number and taxon number to phylogenetic accuracy. Mol Biol Evol. 2005;22:1337–1344.[PubMed]
- Rosenberg MS, Kumar S. Incomplete taxon sampling is not a problem for phylogenetic inference. Proc Natl Acad Sci U S A. 2001;98:10751–10756.[PMC free article][PubMed]
- Kim J. Large-scale phylogenies and measuring the performance of phylogenetic estimators. Syst Biol. 1998;47:43–60.[PubMed]
- Steel M, Hendy MD, Penny D. Reconstructing phylogenies from nucleotide pattern probabilities: A survey and some new results. Discrete Appl Math. 1998;88:367–396.
- Hoelzer GA, Melnick DJ. Patterns of speciation and limits to phylogenetic resolution. Trends Ecol Evol. 1994;9:104–107.[PubMed]
- Abouheif E, Zardoya R, Meyer A. Limitations of metazoan 18S rRNA sequence data: Implications for reconstructing a phylogeny of the animal kingdom and inferring the reality of the Cambrian explosion. J Mol Evol. 1998;47:394–405.[PubMed]
- Mossel E, Steel M. How much can evolved characters tell us about the tree that generated them. In: Gascuel O, editor. Mathematics of evolution and phylogeny. New York: Oxford University Press; 2005. pp. 384–412. editor.
- Huelsenbeck JP, Rannala B. Phylogenetic methods come of age: Testing hypotheses in an evolutionary context. Science. 1997;276:227–232.[PubMed]
- Penny D, McComish BJ, Charleston MA, Hendy MD. Mathematical elegance with biochemical realism: The covarion model of molecular evolution. J Mol Evol. 2001;53:711–723.[PubMed]
- Huelsenbeck JP, Ronquist F, Nielsen R, Bollback JP. Bayesian inference of phylogeny and its impact on evolutionary biology. Science. 2001;294:2310–2314.[PubMed]
- Rokas A, Holland PWH. Rare genomic changes as a tool for phylogenetics. Trends Ecol Evol. 2000;15:454–459.[PubMed]
- Sanderson MJ. Where have all the clades gone? A systematist's take on Inferring Phylogenies . Evolution. 2005;59:2056–2058.
- Shubin NH, Daeschler EB, Jenkins FA., Jr The pectoral fin of Tiktaalik roseae and the origin of the tetrapod limb. Nature. 2006;440:764–771.[PubMed]
- Daeschler EB, Shubin NH, Jenkins FA., Jr A Devonian tetrapod-like fish and the evolution of the tetrapod body plan. Nature. 2006;440:757–763.[PubMed]
- Ohno S. The notion of the Cambrian pananimalia genome. Proc Natl Acad Sci U S A. 1996;93:8475–8478.[PMC free article][PubMed]
- Nichols SA, Dirks W, Pearse JS, King N. Early evolution of animal cell signaling and adhesion genes. Proc Natl Acad Sci U S A. 2006;103:12451–12456.[PMC free article][PubMed]
- Kusserow A, Pang K, Sturm C, Hrouda M, Lentfer J, et al. Unexpected complexity of the Wnt gene family in a sea anemone. Nature. 2005;433:156–160.[PubMed]
- Venkatesh B, Erdmann MV, Brenner S. Molecular synapomorphies resolve evolutionary relationships of extant jawed vertebrates. Proc Natl Acad Sci USA. 2001;98:11382–11387.[PMC free article][PubMed]
Articles from PLoS Biology are provided here courtesy of Public Library of Science
Homologs of the Hh signalling network in C. elegans*
Department of Biosciences and Nutrition and Center for Genomics and Bioinformatics, Karolinska Institutet and Life Sciences, Södertörns University College, Huddinge, Sweden
Department of Biochemistry, University of Bristol, Bristol, UK BS8 1TD
InDrosophilaand vertebrates, Hedgehog (Hh) signalling is mediated by a cascade of genes, which play essential roles in cell proliferation and survival, and in patterning of the embryo, limb buds and organs. InC. elegans,this pathway has undergone considerable evolutionary divergence; genes encoding homologues of key pathway members, including Hh, Smoothened, Cos2, Fused and Suppressor of Fused, are absent. Surprisingly, over sixty proteins (i.e. WRT, GRD, GRL, and QUA), encoded by a set of genes collectively referred to as the Hh-related genes, and two co-orthologs (PTC-1,-3) of fly Patched, a Hh receptor, are present inC. elegans.Several of the Hh-related proteins are bipartite and all can potentially generate peptides with signalling activity, although none of these peptides shares obvious sequence similarity with Hh. In addition, theptc-related (ptr) genes, which are present in a single copy inDrosophilaand vertebrates and encode proteins closely related to Patched, have undergone an expansion in number in nematodes. A number of functions, including roles in molting, have been attributed to theC. elegansHh-related, PTC and PTR proteins; most of these functions involve processes that are associated with the trafficking of proteins, sterols or sterol-modified proteins. Genes encoding other components of the Hh signalling pathway are also found inC. elegans,but their functions remain to be elucidated.
The Drosophila hedgehog (hh) mutant was identified by Nüsslein-Volhard and Wieschaus in their classic genetic screen that led to the identification of key embryonic segment polarity genes (Nüsslein-Volhard and Wieschaus, 1980). Subsequently, the single Drosophila and the multiple vertebrate Hh proteins, Sonic Hedgehog (Shh), Indian Hedgehog (Ihh) and Desert Hedgehog (Dhh), have collectively proven to be key effectors in patterning the embryo, limb buds and neural tube, organ formation, cell proliferation, axon guidance and cell survival (for reviews, see Ingham and McMahon, 2001; Lum and Beachy, 2004). Mutations in genes constituting the Hh signalling pathway are also implicated in various human syndromes and cancers (Bale, 2002; Beachy et al., 2004; Pasca di Magliano and Hebrok, 2003; Taipale and Beachy, 2001). Details of the pathway are summarised in Figure 1 (Ingham and McMahon, 2001; Lum and Beachy, 2004). The various proteins involved in the Hh and Ptc signalling pathway, including those present in C. elegans, are listed in Table 1.
What role does the Hh/Ptc signalling pathway play in C. elegans development? To address this question, it is first reasonable to ask whether the major proteins of the pathway are encoded by the C. elegans genome, a task that is simplified by the availability of the complete genome sequence (Table 1; C. elegans Sequencing Consortium, 1998). Surprisingly, genes encoding recognisable Hh and Smo proteins are absent. In addition, homologs for Cos2, Fu and Su(fu), which transduce the Hh signal from Smo to Ci, are absent. The Ci homolog itself, TRA-1, is not essential for embryonic pattern formation, but continues to act at the transcriptional level to control sexual cell fate decisions and male gonadal development (Hodgkin, 1983; Mathies et al., 2004), although there is no evidence that the nuclear localisation or activity of TRA-1 is controlled in the same manner as Ci (Kuwabara et al., 2000; Zarkower and Hodgkin, 1992). It has been speculated that the C. elegans sex determination pathway could be evolutionarily related to the Hh/Ptc pathway because additional similarities have also been noted between the two pathways (Kuwabara et al., 1992).
Table 1. Components of the Hh signaling network. Note: the function of many of the C. elegans components is not clear and may be different from that in Drosophila and/or mammals. The abbreviations used for the hh pathway in Figure 1 are highlighted in yellow in this table. Slashes separate synonymous gene names, parentheses contain the abbreviations of the full gene names, comas separate paralogs.
|Hint and associated domains||qua-1, wrt-1 - wrt-10, grd-1 - grd-17, grl-1 - grl-32, hog-1||hedgehog (hh)||Sonic hedgehog (Shh), Indian hedgehog (Ihh), Desert hedgehog (Dhh)|
|Palmitoylation, membrane bound O-acyl transferase, MBOAT||Y57G11C.17, ZC101.3||rasp/skinny hedgehog (ski)/sightless||Ski1/Skn/HHAT|
|12-Pass TM, SSD||che-14, ptd-2||dispatched (disp)||DISP|
|Hip module (homologs in plants, bacteria)||–||–||Hedgehog interacting protein (Hip1/Hip)|
|GPI-anchored external plasma membrane glyco-protein (growth arrest when overexpressed)||phg-1||–||Growth arrest specific 1 (Gas1)|
|Low density lipoprotein receptor family||lrp-1||Megalin/gp330/LRP2|
|Glycosyltransferases involved in Heparan sulfate proteoglycan (HSPG) synthesis, exostosin domain||rib-1 (lacks C-term. putative catalytic region with DXD motif)||tout-velu (ttv)||EXT1|
|rib-2||brother of tout-velu (botv)||EXTL3|
|–||sister of tout-velu (sotv)||EXT2|
|Glypican: heparan sulfate proteoglycan anchored to the cell membrane by a GPI-anchor||gpn-1||dally-like (dlp/dly)||Glypican-6|
|12-Pass TM, SSD||ptc-1, ptc-3||patched (ptc)||PTCH1, PTCH2|
|7TM serpentine receptors||–||smoothened (smo)||SMO|
|Serine/threonine kinase (homologs in plants)||–||fused (fu)||FU|
|Sufu domain (homologs in bacteria)||–||Suppressor of fused (Su(fu))||SUFU|
|Kinesin-like||–||costal 2/costa (cos)||KIF27, KIF7|
|Zinc finger transcription factor||tra-1||cubitus interruptus (ci)||Gli1, Gli2, Gli3|
The C. elegans genome encodes an abundance of what will be collectively referred to as Hh-related (Hh-r) proteins, three Patched proteins and 24 Patched-related proteins (PTR). Four Frizzled receptor homologues (see Wnt signaling), as well as a series of molecules participating in the early stages of Hh signalling are also present. Given our current understanding of the Hh/Ptc signal transduction pathway in other organisms, the presence of multiple C. elegans Hh-r, PTC and PTR proteins and the absence of Hh, Smo and other components is unexpected and raises a number of questions about the evolution and the function of these genes in C. elegans.
2. Hh pathway components in C. elegans
The Hh proteins are composed of an N-terminal signalling domain, Hh-N, and a C-terminal autoprocessing domain, Hh-C. Hh-C has autoproteolytic activity, which cleaves Hh and subsequently attaches a cholesterol moiety to the Hh-N C-terminus (see scheme Figure 2A; Lee et al., 1994; Porter et al., 1996). Hh-C shares sequence similarity with the self-splicing inteins; hence, this region was named Hint (Dalgaard et al., 1997; Hall et al., 1997; Koonin, 1995). The 60 terminal residues of Hh-C are named the sterol recognition region (SRR) because they are required for adding cholesterol to Hh-N (Figure 2A; Beachy et al., 1997). Here, we refer to the combined Hint-SRR domain as the Hog domain.
Surprisingly, early database searches uncovered eight distinct C. elegans ORFs sharing similarity to Hh (Bürglin, 1996; Porter et al., 1996). Further investigation revealed that the sequence similarity shared by Hh and these ORFs was confined only to the Hog domain. Analyses of the N-terminal regions revealed several new motifs (Bürglin, 1996; Porter et al., 1996), which were also found in other ORFs that lacked Hog domains. In mimicry of the bipartite “Hedge-hog” structure, the three new gene families were named, warthog (wrt), groundhog (grd) and ground-like (grl) and the protein domains found at their respective N-terminal regions were called Wart, Ground and Ground-like (Aspöck et al., 1999; Bürglin, 1996). The final additions to the nomenclature are two previously unnamed single copy genes, qua-1 for quahog (T05C12.10/M110, Hao et al., 2006b), and an ORF consisting only of the Hog domain, hog-1 (W06B11.4). The structural organization of Hh and the Hh-r genes is shown in Figure 2B.
Over 60 Hh-r genes have been identified in C. elegans, most of which are expressed, although a few appear to be pseudogenes (e.g., grd-17; Figure 2B). Many of these genes share a one to one orthology to C. briggsae genes, although several have arisen through recent duplications within the C. elegans lineage. Preliminary searches of the ongoing Brugia malayi genome project reveal that representatives of the different C. elegans families are also present (Figure 2B). This clearly demonstrates that these Hh-r gene families were already present in the last common ancestor of C. elegans and B. malayi more than 300 million years ago.
The Wart, Ground, Ground-like and Qua domain each have a characteristic pattern of conserved cysteine residues (Figure 2C). The Ground and Ground-like domains differ in their arrangement of these conserved cysteine residues (Aspöck et al., 1999). In some instances, it has been possible to predict likely disulfide bond arrangements because of pairwise cysteine substitutions (Figure 2C; Aspöck et al., 1999). The Wart domain is about twice the size of the Ground and Ground-like domain; however, the C-terminal half of the Wart domain has a cysteine arrangement similar to that of the Ground domain and both share a short conserved motif similarly positioned between cysteine residues (FDøI, FEøV, respectively, Figure 2C; Aspöck et al., 1999). Thus, it is likely that the C-terminal part of the Wart domain and the Ground domain was derived from the same ancestral domain. The short conserved FDøV “core” motif is also present in the Hh-N core, although its position relative to the cysteine residues is different (Figure 2C; Aspöck et al., 1999).
3. Hh-r genes, the structure
All Hh-r genes in C. elegans, apart from hog-1, are predicted to encode secreted molecules. Porter et al. (1996) have shown that WRT-1, similar to Hh, undergoes autoprocessing; thus it is expected that other Hh-r proteins with a Hog domain will also have autocatalytic properties. However, it remains unclear whether the Hh-r proteins can bind cholesterol or other sterols, because the region corresponding to the SRR of Hh is slightly different in the Hh-r proteins (see Figure 1A in Aspöck et al., 1999).
An expression survey of wrt and grd genes revealed that they are expressed primarily in ectodermal tissues, particularly the hypodermis and seam cells, but can also be found in neurons, neuron-associated cells and gland cells (Aspöck et al., 1999). qua-1 is also expressed in the hypodermis (Hao et al., 2006b). An internal deletion in qua-1 results in severe molting defects (Hao et al., 2006b). Functional studies of other wrt genes are in progress and preliminary results indicate that they also play important roles in development. For example, wrt-5 functions in the hypodermis during embryogenesis (Hao et al., 2006a). At present it is still unclear how Hh-r molecules function, and whether their activity could relate to the function of the PTC and PTR proteins.
In Drosophila, Hh is further modified by Skinny hedgehog (Ski), which palmitoylates the N-terminus of Hh-N, an essential process for Hh function (Chamoun et al., 2001). C. elegans has two co-orthologs, Y57G11C.17 and ZC101.3 (T.R. Bürglin, unpublished), but large-scale RNAi experiments have not uncovered any associated mutant phenotype. Modified Hh is secreted with the help of Dispatched; C. elegans has two Disp homologs, CHE-14 and PTD-2 (see below).
Secreted Hh forms a multimeric complex that moves through the extracellular matrix space. Heparan sulphate proteoglycans (HSPG) play an important role in Hh signal transduction. Three Drosophila genes, tout-velu (ttv), sister of tout-velu (sotv), and brother of tout-velu (botv) encode glycosyltransferases of the EXT family that are required for the biosynthesis of heparan sulfate and are essential for the movement of N-Hh (Han et al., 2004a; Takei et al., 2004). One target for these enzymes is Dally-like (Dlp), a glypican of the HSPG family that is also essential for Hh signalling (Desbordes and Sanson, 2003; Han et al., 2004b; Lum et al., 2003). C. elegans has two EXT family members, rib-1 and rib-2 (for review, see Berninsone and Hirschberg, 2002). rib-1 is orthologous to ttv and EXT1, and rib-2 is ortholgous to botv and EXTL3; however, orthologs of other EXT members are absent (Figure 3). The EXT proteins have two domains: the exostosin domain and a 300 amino acid C-terminal region that is highly conserved between members and probably corresponds to the catalytic transferase domain, which is missing in rib-1. Homozygous rib-2 animals show delayed larval growth and reduced life span, while their progeny display more than 90% early embryonic lethality (Morio et al., 2003). The C. elegans Dally-like gene ortholog, gpn-1, shows no phenotype in large-scale RNAi screens.
Two other molecules that bind and restrict the availability and range of N-Hh are Hip and Growth arrest specific 1 (GAS1; for review, see Cohen, 2003). Hip is absent in both Drosophila and C. elegans. GAS1 is a GPI-linked protein identified by its ability to arrest cell cycle progression when overexpressed. The C. elegans ortholog phg-1/phas-1 is expressed in the pharynx and may play a similar role in mitotic cell cycle regulation (Agostoni et al., 2002). Megalin, a member of the family of low-density lipoprotein receptors, is a third protein that binds to N-Hh (see Cohen, 2003). The C. elegans homolog lrp-1 is expressed in the hypodermis; mutations in lrp-1 produce molting defects (Yochem et al., 1999).
4. Modification and function of Hh-r genes
Three ptc genes, ptc-1,-2,-3 have been identified in C. elegans, although ptc-2 is absent in C. briggsae and is likely to be a pseudogene arising from a recent duplication of ptc-1. In addition, 24 genes have been identified that encode proteins sharing both sequence and topological similarities to PTC; these proteins have been named PTR (for Patched-related; Figure 4).
Drosophila, human and mouse are each predicted to encode a single PTR protein; thus the PTR family appears to have undergone an expansion in C. elegans (Kuwabara et al., 2000). Moreover, all 24 ptr genes are present in C. briggsae and share a one to one orthology with C. elegans.
The C. elegans PTC and PTR proteins share characteristics common to all known Ptc proteins. The PTC and PTR proteins are predicted to have 12-membrane spanning domains with cytoplasmic N- and C-termini. The membrane domains can be further sub-divided into two cassettes of 1+5, which are separated by a large intracellular loop. Each cassette contains a large extracellular loop predicted to facilitate Hh binding (Figure 5). Carried within the first set of TM domains is a sterol-sensing domain (SSD), which was first identified in SCAP, a protein involved in cholesterol homeostasis (Radhakrishnan et al., 2004).
5. SSD Proteins: PTC and PTC-related proteins
ptc-1 is the first member of the ptc and ptr family to be analysed in detail (Kuwabara et al., 2000). Surprisingly, it was found that ptc-1 plays a crucial role in germline cytokinesis. Elimination of ptc-1 activity either by RNAi or mutation produces sterile adults that carry multinucleate oocytes and sperm. In other respects, development of a ptc-1 mutant is essentially wild type. In situ hybridization reveals that ptc-1 mRNA is maternally provided, and becomes enriched in P4, the germ line precursor. Immunocytochemistry performed using a polyclonal antibody capable of detecting PTC-1 protein in wild type adult hermaphrodites reveals that PTC-1 protein is enriched in the plasma membrane of mitotic germ cells undergoing proliferation, or in membranes undergoing rapid expansions, such as oocytes. PTC-1 protein is particularly enriched in vesicles at the apex of the membrane furrows separating individual germ nuclei.
An analysis of ptc-1 mutants further revealed that the absence of cytokinesis generated membrane furrows leads to a loss of cell autonomous development in the germline syncytium, as evidenced by the presence of clusters of nuclei in M-phase (Kuwabara et al., 2000). The involvement of ptc-1 in cytokinesis has led to the hypothesis that PTC-1 could play a role in the transport of lipids, proteins or lipoproteins to the cleavage furrow. Importantly, a proteomic survey of proteins involved in mammalian cytokinesis reveals that Ptc-1 is a component of the mid-body (Skop et al., 2004). Hence, the involvement of PTC-1 in cytokinesis is likely to highlight an ancestral function for Ptc proteins, which is apparently independent of Smoothened.
A mutation has also been obtained for ptc-3. Preliminary phenotypic characterisation indicates that ptc-3 is an essential gene; mutants arrest in late embryogenesis (P.E.K., unpublished; Zugasti et al., 2005). A survey of the other ptr genes by RNAi reveals that many have roles in molting, but also affect growth and morphogenesis (Zugasti et al., 2005). It has also been demonstrated that C. elegans daf-6 encodes a Patched-related protein required for lumen formation (Perens and Shaham, 2005).
6. Role of PTC proteins in C. elegans
Another clue to the function of PTC-1 in worms is provided by the presence of an SSD and its connections to sterols and protein transport. The SSD of SCAP, the first identified SSD protein, is capable of binding directly to cholesterol (Radhakrishnan et al., 2004; Figure 4). Other SSD protein families that are closely related to PTC are the Dispatched family, which includes CHE-14 and PTD-2, and the Niemann-Pick Type C family, which includes the NCR-1, -2 proteins in C. elegans (Kuwabara and Labouesse, 2002). che-14 was first identified in a screen for animals that were defective in dye-filling (Perkins et al., 1986). Subsequent work demonstrated that che-14 promotes apical sorting in epithelial cells. che-14 mutants accumulate vesicles that are trapped near the apical surface, indicative of problems in protein secretion (Michaux et al., 2000). The phenotype of che-14 mutants reveals that the problem is more likely to be associated with defects in exocytosis, rather than in endocytosis. The presence of an SSD in CHE-14 led the authors to speculate that CHE-14 might be involved in the secretion of proteins containing sterol or GPI adducts. Similarly, Disp promotes the exocytosis of Hh away from the cells from which it was synthesised (Burke et al., 1999). A protein transported by CHE-14 might be WRT-6, which is expressed in sensory socket cells (Aspöck et al., 1999).
Niemann-Pick disease type C1 (NPC1) is a neurodegenerative disease associated with defects in cholesterol storage/transport. In C. elegans, ncr-1; ncr-2 double mutants are hypersensitive to cholesterol deprivation and inappropriately form dauers under favourable food conditions, a dauer-constitutive (Daf-c) phenotype (Sym et al., 2000). Recent studies further show that ncr-1 is hypersensitive to progesterone, a known inhibitor of cholesterol trafficking (Li et al., 2004). Hence, in C. elegans there is a strong link between SSD proteins and the transport of sterols, sterol modified proteins or in the case of dauer formation, a sterol modified hormone.
C. elegans are cholesterol auxotrophs, requiring exogenous cholesterol for growth and survival (Chitwood, 1999; Hieb and Rothstein, 1968). Sterols are involved in at least two processes in the worm: the decision to enter diapause, which is mediated by a sterol-derived hormone, and molting, which is impaired by sterol depletion (Matyash et al., 2004). Animals depleted for cholesterol arrest in the second generation during early larval stages (Matyash et al., 2001; Merris et al., 2003). These L2 arrested larvae became dauer-like and fail to complete molting (Matyash et al., 2004).
Because worms require so relatively little cholesterol, it has been postulated that cholesterol is unlikely to be an essential architectural component of membranes and could play a more important role in signalling (Matyash et al., 2001). Mutations in lrp-1, qua-1 and RNAi inhibition of most ptr genes, and post-embryonic RNAi of ptc gene activity all lead to defects in molting (Yochem et al., 1999; Zugasti et al., 2005; Hao et al., 2006b). Thus, taken together, it is speculated that the PTC and PTR proteins could promote the transport of sterols or sterol-modified proteins, possibly including sterol-modified Hh-r proteins.
7. Other SSD proteins provide clues to PTC and PTR function
C. elegans lacks the typical Hh signal transduction cascade found in flies and vertebrates that is composed of Smo, Fu, Su(fu), and Cos2. Since homologs of Fu and Su(fu) are found in plants and bacteria (T.R.B., unpublished), it follows that this part of the pathway was lost in nematodes. Concomitantly, we have observed a large increase in the number of C. elegans Hh-r and ptr genes. Several lines of evidence point towards the Hh-r expansion being nematode specific: 1) Homologs of the Hh-r genes have so far not been identified outside of the nematodes. 2) Conversely, Hh is present in both arthropods and vertebrates and only missing in nematodes. 3) Phylogenetic analyses of the Hog domain and motif similarity shared by the Wrt and Grd domains indicate that the wrt, grd and grl gene families are derived from a single ancestral gene (Aspöck et al., 1999); qua-1 may also be derived from the same ancestral gene, as its Hog domain is also more similar to those of the Wrt and Grd proteins than to that of Hh. 4) C. briggsae has fewer Hh-r genes than C. elegans, and B. malayi even fewer, which underscores the capacity of these genes to expand. The expansion of the ptr genes is likewise nematode specific, since Drosophila and humans have only a single copy.
There is a running debate whether or not nematodes should be classified as ecdysozoa. If nematodes are indeed ecdysozoa, or group at least within the protostome branch of ecdysozoa and lophotrochozoa (see chapter on evolution), then the loss, expansion and diversification of various Hh pathway components, which have been described here, are most parsimoniously explained by postulating that a C. elegans Hh signalling pathway did exist, but was disrupted at some unknown time in early nematode evolution. It also implies that the Hh-r genes were derived from Hh. On the other hand, if nematodes are an outgroup to coelomata, then one of several alternatives is that a complete Hh signalling pathway was never acquired.
It is clear that the Hh and Ptc homologs in C. elegans have undergone considerable evolutionary divergence when compared to their Drosophila and vertebrate counterparts. Studies in C. elegans are providing insights into the roles and mechanisms underlying the control of the Hh-r, PTC and PTC-related proteins and have established that these proteins can function independently of Smo. The involvement of Hh-r and ptr genes in molting suggests that at least some of the Hh-r and ptr genes can participate in similar pathways. Thus, it will be of particular interest to determine whether the Hh-r proteins and the PTC and PTR proteins have maintained a ligand/receptor relationship that could potentially mediate cell-cell signalling.
TRB would like to thank the Swedish Foundation for Strategic Research (SSF) for support. PEK thanks the MRC (UK) for funding. The authors would like to thank David Fitch for constructive discussions related to nematode phylogeny.
Agostoni, E., Gobessi, S., Petrini, E., Monte, M., and Schneider, C. (2002). Cloning and characterization of the C. elegansgas1 homolog: phas-1. Biochim. Biophys. Acta 1574, 1–9.Abstract
Alcedo, J., Ayzenzon, M., Von Ohlen, T., Noll, M., and Hooper, J.E. (1996). The Drosophila smoothened gene encodes a seven-pass membrane protein, a putative receptor for the hedgehog signal. Cell 86, 221–232.AbstractArticle
Aspöck, G., Kagoshima, H., Niklaus, G., and Bürglin, T.R. (1999). Caenorhabditis elegans has scores of hedgehog-related genes: sequence and expression analysis. Genome Res. 9, 909–923.AbstractArticle
Bale, A.E. (2002). Hedgehog signaling and human disease. Annu. Rev. Genomics Hum. Genet. 3, 47–65.AbstractArticle
Beachy, P.A., Cooper, M.K., Young, K.E., von Kessler, D.P., Park, W.J., Hall, T.M., Leahy, D.J., and Porter, J.A. (1997). Multiple roles of cholesterol in hedgehog protein biogenesis and signaling. Cold Spring Harb. Symp. Quant. Biol. 62, 191–204.Abstract
Beachy, P.A., Karhadkar, S.S., and Berman, D.M. (2004). Tissue repair and stem cell renewal in carcinogenesis. Nature 432, 324–331.AbstractArticle
Berninsone, P.M., and Hirschberg, C.B. (2002). The nematode Caenorhabditis elegans as a model to study the roles of proteoglycans. Glycoconj J. 19, 325–330.AbstractArticle
Bürglin, T.R. (1996). Warthog and Groundhog, novel families related to Hedgehog. Curr. Biol. 6, 1047–1050.Abstract
Burke, R., Nellen, D., Bellotto, M., Hafen, E., Senti, K.A., Dickson, B.J., and Basler, K. (1999). Dispatched, a novel sterol-sensing domain protein dedicated to the release of cholesterol-modified hedgehog from signaling cells. Cell 99, 803–815.AbstractArticle
C. elegans Sequencing Consortium (1998). Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282, 2012–2018.AbstractArticle
Chamoun, Z., Mann, R.K., Nellen, D., von Kessler, D.P., Bellotto, M., Beachy, P.A., and Basler, K. (2001). Skinny hedgehog, an acyltransferase required for palmitoylation and activity of the hedgehog signal. Science 293, 2080–2084.AbstractArticle
Chen, Y., and Struhl, G. (1998). In vivo evidence that Patched and Smoothened constitute distinct binding and transducing components of a Hedgehog receptor complex. Development 125, 4943–4948.Abstract
Chitwood, D.J. (1999). Biochemistry and function of nematode steroids. Crit. Rev. Biochem. Mol. Biol. 34, 273–284.AbstractArticle
Cohen, M.M., Jr. (2003). The hedgehog signaling network. Am. J. Med. Genet. A 123, 5–28.AbstractArticle
Dalgaard, J.Z., Moser, M.J., Hughey, R., and Mian, I.S. (1997). Statistical modeling, phylogenetic analysis and structure prediction of a protein splicing domain common to inteins and hedgehog proteins. J. Comput. Biol. 4, 193–214.Abstract
Desbordes, S.C., and Sanson, B. (2003). The glypican Dally-like is required for Hedgehog signalling in the embryonic epidermis of Drosophila. Development 130, 6245–6255.AbstractArticle
Hall, T.M., Porter, J.A., Beachy, P.A., and Leahy, D.J. (1995). A potential catalytic site revealed by the 1.7-Å crystal structure of the amino-terminal signalling domain of Sonic hedgehog. Nature 378, 212–216.AbstractArticle
Hall, T.M., Porter, J.A., Young, K.E., Koonin, E.V., Beachy, P.A., and Leahy, D.J. (1997). Crystal structure of a Hedgehog autoprocessing domain: homology between Hedgehog and self-splicing proteins. Cell 91, 85–97.AbstractArticle
Han, C., Belenkaya, T.Y., Khodoun, M., Tauchi, M., and Lin, X. (2004a). Distinct and collaborative roles of Drosophila EXT family proteins in morphogen signalling and gradient formation. Development 131, 1563–1575.AbstractArticle
Han, C., Belenkaya, T.Y., Wang, B., and Lin, X. (2004b). Drosophila glypicans control the cell-to-cell movement of Hedgehog by a dynamin-independent process. Development 131, 601–611.AbstractArticle
Hao, L., Aspöck, G., and Bürglin, T.R. (2006a). The hedgehog-related gene wrt-5 is essential for hypodermal development in Caenorhabditis elegans. Dev. Biol. in press.
Hao, L., Mukherjee, K., Liegeois, S., Baillie, D., Labouesse, M., and Bürglin, T.R. (2006b). The hedgehog-related gene qua-1 is required for molting in Caenorhabditis elegans. Dev. Dynamics, in press.
Hieb, W.F., and Rothstein, M. (1968). Sterol requirement for reproduction of a free-living nematode. Science 160, 778–780.Abstract
Hodgkin, J. (1983). Two types of sex determination in a nematode. Nature 304, 267–268.AbstractArticle
Hooper, J.E. (2003). Smoothened translates Hedgehog levels into distinct responses. Development 130, 3951–3963.AbstractArticle
Hooper, J.E., and Scott, M.P. (1989). The Drosophila patched gene encodes a putative membrane protein required for segmental patterning. Cell 59, 751–765.AbstractArticle
Ingham, P.W., and McMahon, A.P. (2001). Hedgehog signaling in animal development: paradigms and principles. Genes Dev. 15, 3059–3087.AbstractArticle
Ingham, P.W., Taylor, A.M., and Nakano, Y. (1991). Role of the Drosophila patched gene in positional signalling. Nature 353, 184–187.AbstractArticle
Jia, J., Tong, C., and Jiang, J. (2003). Smoothened transduces Hedgehog signal by physically interacting with Costal2/Fused complex through its C-terminal tail. Genes Dev. 17, 2709–2720.AbstractArticle
Koonin, E.V. (1995). A protein splice-junction motif in hedgehog family proteins. Trends Biochem. Sci. 20, 141–142.AbstractArticle
Kuwabara, P.E., and Labouesse, M. (2002). The sterol-sensing domain: multiple families, a unique role? Trends Genet. 18, 193–201.AbstractArticle
Kuwabara, P.E., Lee, M.H., Schedl, T., and Jefferis, G.S. (2000). A C. elegans patched gene, ptc-1, functions in germ-line cytokinesis. Genes Dev. 14, 1933–1944.Abstract
Kuwabara, P.E., Okkema, P.G., and Kimble, J. (1992). tra-2 encodes a membrane protein and may mediate cell communication in the Caenorhabditis elegans sex determination pathway. Mol. Biol. Cell 3, 461–473.Abstract
Lee, J.J., Ekker, S.C., von Kessler, D.P., Porter, J.A., Sun, B.I., and Beachy, P.A. (1994). Autoproteolysis in hedgehog protein biogenesis. Science 266, 1528–1537.Abstract
Lee, J.J., von Kessler, D.P., Parks, S., and Beachy, P.A. (1992). Secretion and localized transcription suggest a role in positional signaling for products of the segmentation gene hedgehog. Cell 71, 33–50.AbstractArticle