What R package for phylogenetics is the most popular?

While writing my first R package and its associated manuscript, I needed to talk about some other R packages in the phylogenetics research community. The most obvious choice would be to just cite the ones that I actually use, but that doesn’t necessarily mean that other practicing phylogeneticists do the same. I needed to get some stats on which phylogenetics packages were actually popular, but luckily the R ecosystem has the tools to make this easy. Click here to skip straight to the popularity table.

Instructions

First, let’s install the CRAN Task Views package and the CRAN-logs API package. We’ll use the development version of cranlogs since it hasn’t been updated on CRAN in a while and some stuff has changed.

install.packages("ctv")
devtools::install_github("metacran/cranlogs")

library(ctv)
library(cranlogs)

We can get a list of all the CRAN task views using the available.views function. Annoyingly, there’s no way to filter and extract JUST the Phylogenetics task view, so we’ll have to write a short filter to extract it.1

available.views()

phylo_ctv <- Filter(function(x) x$name == "Phylogenetics", available.views())[[1]]

Now we can extract the list of packages that are associated with the “Phylogenetics” task view, and using that list of packages, query the CRAN-logs server to figure out the most popular phylogenetics packages in the last year.

phylo_ctv$packagelist
phylo_packages <- phylo_ctv$packagelist$name

output <- cran_downloads(to = "2018-10-01", from = "2017-10-01", packages = phylo_packages)
head(output)

Of course, what about the phylogenetics packages that aren’t in the Phylogenetics task view? Another way to view it is to consider if a package depends on the ape package. Pretty much every phylogenetics package will use ape in some form or another, so it might also be a good proxy of what a “phylo” package is.

Get a list of all the reverse dependencies of ape using `devtools.2

revdep_packages <- devtools::revdep("ape")
all_phylo_packages <- unique(c(phylo_packages, revdep_packages))
output2 <- cran_downloads(to = "2018-10-01", from = "2017-10-01", packages = all_phylo_packages)

By default, the output from cranlogs is the number of downloads for a given package on a given date. We want to sum up all of these counts so we have a total number of downloads per package.

Aggregate these data using dplyr.

library(dplyr)

output2 %>% group_by(package) %>% summarise(downloads = sum(count)) %>% arrange(-downloads)

Exercises

  1. Filter the table to only include packages in both the CRAN task view and reverse dependencies list. (This will exclude e.g., ggplot2 and other arguably-peripheral packages.)

  2. Use the lubridate package to find the ten most popular packages by year. (The CRAN logs go back to October 2012.)

Table

Here’s the full version of the table. There’s some packages in here that are only peripherally associated with phylogenetics, but it gives a good picture of what the state of the field looks like. I’ve also annotated each package with which list it came from, the CRAN Task View list or the reverse dependencies list.

  Package Downloads CTV? Revdep?
1 ggplot2 5624177 🚫
2 igraph 1248409 🚫
3 dendextend 440772
4 ape 433337 🚫
5 vegan 426398 🚫
6 ade4 336755 🚫
7 brms 90632 🚫
8 phangorn 78319
9 adegenet 68176 🚫
10 metafor 64462 🚫
11 data.tree 60224 🚫
12 Seurat 55905 🚫
13 MCMCglmm 48496
14 phytools 43721
15 HSAUR2 40079 🚫
16 HSAUR 38960 🚫
17 taxize 34790
18 rncl 31910
19 aqp 30681 🚫
20 pegas 29424
21 RNeXML 29278
22 rotl 28502
23 picante 26858 🚫
24 geiger 26292
25 phylobase 25190
26 HSAUR3 24462 🚫
27 FD 23007 🚫
28 EpiModel 21486 🚫
29 adephylo 20348
30 poppr 19874 🚫
31 vcfR 19346 🚫
32 geomorph 18793
33 adespatial 18084 🚫
34 ggimage 16587 🚫
35 BoSSA 15993
36 asnipe 14109 🚫
37 hierfstat 13967 🚫
38 caper 13847
39 DDD 11690
40 tidygraph 11612 🚫
41 DHARMa 11596 🚫
42 paleotree 11359
43 betapart 11111
44 polysat 10831 🚫
45 phyclust 10631
46 MVA 10280 🚫
47 GUniFrac 9007
48 enveomics.R 8696 🚫
49 AbSim 8432 🚫
50 stylo 8336 🚫
51 phylolm 8172
52 apTreeshape 8101
53 BioGeoBEARS 8007
54 expands 7764 🚫
55 mvMORPH 7579
56 BAMMtools 7344
57 sand 7014 🚫
58 diversitree 6946
59 homals 6933 🚫
60 tidytree 6378
61 convevol 6351
62 ecospat 6320 🚫
63 entropart 6299 🚫
64 phylotools 6182
65 rmetasim 6064 🚫
66 rphast 5890
67 corHMM 5836
68 apex 5750 🚫
69 bayou 5686
70 cati 5572
71 ouch 5520
72 hisse 5267 🚫
73 phyloclim 5223
74 rdryad 5188 🚫
75 dartR 5169 🚫
76 SYNCSA 5102 🚫
77 OutbreakTools 5057
78 TreeSim 4988
79 ALA4R 4943 🚫
80 ips 4873
81 PCPS 4858
82 metacoder 4796 🚫
83 OUwie 4792
84 aphid 4791 🚫
85 brranching 4670 🚫
86 warbleR 4645 🚫
87 MPSEM 4639
88 adhoc 4599
89 distory 4578
90 Momocs 4443 🚫
91 phyloTop 4426
92 ggmuller 4403
93 paleoTS 4392 🚫
94 BIEN 4380 🚫
95 HTSSIP 4198 🚫
96 strap 4153
97 nodiv 4103 🚫
98 BPEC 4095 🚫
99 scrm 4084 🚫
100 FinePop 4031 🚫
101 idendr0 4020 🚫
102 HMPTrees 4015
103 PHYLOGR 4011 🚫
104 evobiR 4000
105 outbreaker 3931
106 nLTT 3925
107 kmer 3891 🚫
108 markophylo 3885
109 DAMOCLES 3879
110 jaatha 3861 🚫
111 TESS 3860
112 SigTree 3823
113 strataG 3786 🚫
114 treeplyr 3732
115 phylogram 3726
116 treebase 3724
117 pmc 3717 🚫
118 surface 3701
119 gamclass 3648 🚫
120 TreePar 3647
121 PBD 3591
122 RAM 3546 🚫
123 Rphylip 3506
124 expoTree 3447
125 HyPhy 3419
126 adiv 3393
127 coalescentMCMC 3384 🚫
128 kdetrees 3330
129 adaptiveGPCA 3325 🚫
130 MAGNAMWAR 3324 🚫
131 phylocanvas 3302
132 iteRates 3301
133 BBMV 3295 🚫
134 CommEcol 3246 🚫
135 netdiffuseR 3191 🚫
136 pastis 3155
137 AnnotationBustR 3148 🚫
138 phyloland 3129
139 phyext2 3119
140 Canopy 2992 🚫
141 RPANDA 2952 🚫
142 BMhyb 2935 🚫
143 phylopath 2912 🚫
144 GLSME 2846 🚫
145 phylotate 2846 🚫
146 phylosignal 2820 🚫
147 shazam 2754 🚫
148 harrietr 2707 🚫
149 prioritizr 2705 🚫
150 BarcodingR 2657 🚫
151 msaR 2552 🚫
152 mvSLOUCH 2522 🚫
153 bcRep 2472 🚫
154 colordistance 2448 🚫
155 treeman 2417 🚫
156 sharpshootR 2409 🚫
157 BMhyd 2408 🚫
158 aptg 2385 🚫
159 qlcData 2383 🚫
160 sensiPhy 2344 🚫
161 GrammR 2229 🚫
162 treespace 2219 🚫
163 dispRity 2217
164 metricTester 2205 🚫
165 evolqg 2181 🚫
166 geomedb 2160 🚫
167 PhyloMeasures 2139 🚫
168 CNull 2107 🚫
169 taxlist 2098 🚫
170 pez 2079 🚫
171 phyreg 2061 🚫
172 structSSI 2047 🚫
173 MiSPU 2039 🚫
174 dcGOR 2031 🚫
175 lefse 2020 🚫
176 SeqFeatR 2006 🚫
177 HAP.ROR 1970 🚫
178 symmoments 1932 🚫
179 genBaRcode 1923 🚫
180 PhylogeneticEM 1912 🚫
181 windex 1900
182 phylocurve 1875 🚫
183 MonoPhy 1865 🚫
184 TreeSimGM 1844 🚫
185 spider 1840 🚫
186 Rsampletrees 1837 🚫
187 Rphylopars 1830 🚫
188 graphscan 1829 🚫
189 recluster 1811 🚫
190 paco 1804 🚫
191 phylosim 1768 🚫
192 ecolottery 1723 🚫
193 outbreaker2 1708 🚫
194 STEPCAM 1697 🚫
195 primerTree 1691 🚫
196 PhySortR 1675 🚫
197 gquad 1673 🚫
198 gromovlab 1669 🚫
199 indelmiss 1666 🚫
200 phybreak 1662 🚫
201 msap 1659 🚫
202 rase 1629 🚫
203 rdiversity 1629 🚫
204 perspectev 1617 🚫
205 ML.MSBD 1614 🚫
206 sidier 1611 🚫
207 pcrcoal 1588 🚫
208 StructFDR 1586 🚫
209 idar 1579 🚫
210 PIGShift 1534 🚫
211 jrich 1511 🚫
212 TotalCopheneticIndex 1509 🚫
213 subniche 1508 🚫
214 Plasmidprofiler 1495 🚫
215 TKF 1487 🚫
216 rwty 1485 🚫
217 TreeSearch 1474 🚫
218 PhyInformR 1402 🚫
219 skeleSim 1400 🚫
220 insect 1383 🚫
221 treeDA 1344 🚫
222 multilaterals 1337 🚫
223 CollessLike 1304 🚫
224 vhica 1299 🚫
225 motmot.2.0 1115 🚫
226 treedater 926
227 ratematrix 813
228 PVR 772 🚫
229 P2C2M 749 🚫
230 metaboGSE 747 🚫
231 RRphylo 692 🚫
232 ggrasp 677 🚫
233 CommT 634 🚫
234 FossilSim 529 🚫
235 POUMM 513 🚫
236 rhierbaps 391 🚫
237 RPS 390 🚫
238 balance 0 🚫
239 hillR 0 🚫
240 kmeRs 0 🚫
241 phylocomr 0 🚫
242 rr2 0 🚫
243 slouch 0 🚫
  1. Note that we can’t use the typical filtering mechanism using the single square bracket [ because of the way lists work. In particular, there’s no good destructuring syntax for lists-of-lists as there are for simple vectors. See ?Extract for more details. 

  2. The builtin package tools also has it, but it only returns packages that you have currently installed.