Can we estimate diversification rates on extant phylogenies?
An exciting new preprint by Stilianos Louca and Matt Pennell^{1} just dropped on bioRxiv. I think it’s an interesting paper and wanted to quickly jot down my thoughts on it.
There’s been widespread use of software tools like BAMM, RPANDA, and SSEclass models to estimate how diversity is generated by varying diversification rates. The huge amount of attention on papers that highlight where our blind spots could be when estimating rates^{2} indicate that the field really cares about getting this stuff right, because it has important implications for all of our downstream analyses.
Nee et al. 1994^{3} showed that you could estimate speciation and extinction rates on an extantonly phylogeny by fitting two separate rates to a lineagethroughtime plot (left panel). However, when rates are variable, it becomes impossible to identify whether there are, for example, two separate speciation rates, one older and one newer, or whether there is high extinction towards the present (right panel).^{4}
Dealing with these kinds of nonidentifiability issues is really critical when fitting ratevarying diversification models. For example, with incompletely sampled phylogenies, you need to know either the amount of incomplete sampling or the extinction rate, otherwise these are nonidentifiable.^{5} A paper that I was coauthor on also found that when both speciation and extinction are allowed to be variable, model fit suffers accordingly, likely due to identifiability problems, and our estimates of diversification rates can therefore vary wildly.^{6}
Louca and Pennell unify the findings of these (and other) previous works and show that for any extant phylogeny, where (potentially timevarying) speciation and extinction rates have been fit, you can construct an infinite number of alternative speciation and extinction rate histories that have the same likelihood. They also show that, when model and/or parameter space is limited, the “infinite plausible histories” usually collapse down into a single bestsupported diversification history. This is why for many methods that fit diversification rates, we can generally find a way to limit the models chosen to avoid the identifiability problem.
My intuition tells me that there are lots of extensions to the general birthdeath model where these results might also be relevant. We know from the Stadler paper^{5} that incomplete sampling can be nonidentifiable, but I think this finding is fully general to many other extensions of the birthdeath model, such as the birthdeathpreservation model used for fossil phylogenies. In that case, we can estimate extinction due to the presence of extinct lineages, but varying extinction and preservation rates also lead to a nonidentifiable model.^{7}
In light of these results, I think there’s a strong argument against trying to estimate a single true diversification history. There’s a reason why BAMM suggests using model averaging to summarize your posterior distribution of rate shift data, rather than just giving you a point estimate of the “bestsupported” event configuration. This is also, I think, pretty important for the SSEclass models, as a modelaveraging approach was shown to be quite good in a recent paper.^{8}
Furthermore, I also wonder if this identifiability issue is the reason why, when simulating trees, the maximum likelihood estimate of that tree’s speciation and extinction rates don’t always match the generating parameters. There are a number of known pitfalls when simulating phylogenies,^{9} and our conditioning of the likelihood function can similarly mislead us,^{5} but it would be interesting to see if the results from this paper could be used to somehow improve the simulation of phylogenies as well.
Finally, I think these results suggest that the way we construct our models (e.g., assuming constant extinction), or impose our priors for Bayesian models, are going to remain important for breaking ties along the flat parts in parameter space. The proof that there will always be a ridge on the likelihood surface if you permit all possible models and areas of parameter space, suggests to me that, as a field, we should be thinking hard about how we encode the assumptions and biological knowledge into our models of diversification rate estimation, and justifying those prior specifications adequately.
With all of the papers identifying the weaknesses of common analysis methods, it is easy to be discouraged about the state of comparative methods. I still recall the deep sense of despair at the first standalone Systematic Biology meeting in Ann Arbor, shortly after the Raboksy and Goldberg^{2} manuscript came out and frightened the attendees into worrying whether we could do inference on phylogenies at all. The field has probably recovered psychologically from that shock, but I think as long as we are careful with how we analyze our data, justify the assumptions that are encoded into our models, and test for areas where our methods can be misleading, there is still room for optimism in comparative methods.
References

Louca, S. and Pennell, MW. (2019). Phylogenies of extant species are consistent with an infinite array of diversification histories. bioRxiv preprint doi:10.1101/719435 ↩

Rabosky, D. L., & Goldberg, E. E. (2015). Model Inadequacy and Mistaken Inferences of TraitDependent Speciation. Systematic Biology, 64(2), 340–355. doi:10.1093/sysbio/syu131 ↩

Nee, S., May, R. M., & Harvey. P.H. (1994). The reconstructed evolutionary process. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 344(1309), 305–311. doi:10.1098/rstb.1994.0068 ↩

Rabosky, D. L. (2009). Extinction rates should not be estimated from molecular phylogenies. Evolution, 64(6), 1816–1824. doi:10.1111/j.15585646.2009.00926.x ↩

Stadler, T. (2012). How Can We Improve Accuracy of Macroevolutionary Rate Estimates? Systematic Biology, 62(2), 321–329. doi:10.1093/sysbio/sys073 ↩

Burin, G., Alencar, L. R. V., Chang, J., Alfaro, M. E., & Quental, T. B. (2018). How Well Can We Estimate Diversity Dynamics for Clades in Diversity Decline? Systematic Biology, 68(1), 47–62. doi:10.1093/sysbio/syy037 ↩

Foote, M., Sadler, P. M., Cooper, R. A., & Crampton, J. S. (2019). Completeness of the known graptoloid palaeontological record. Journal of the Geological Society, jgs2019–061. doi:10.1144/jgs2019061 ↩

Caetano, D. S., O’Meara, B. C., & Beaulieu, J. M. (2018). Hidden state models improve statedependent diversification approaches, including biogeographical models. Evolution, 72(11), 2308–2324. doi:10.1111/evo.13602 ↩

Hartmann, K., Wong, D., & Stadler, T. (2010). Sampling Trees from Evolutionary Models. Systematic Biology, 59(4), 465–476. doi:10.1093/sysbio/syq026 ↩

Thanks to James Saulsbury and Caroline ParinsFukuchi for suggestions that improved this blog post! ↩