Splitting a concatenated RAxML-style PHYLIP file
I’ve written a little script in Python 3 that will unconcatenate a RAxML-style PHYLIP + partitions file. I’ve used it recently to do a gene tree-species tree analysis for phylogenetic inference. It’s a little slow since it doesn’t manage file handles well (or at all), but it should use very little memory and therefore be able to handle very large concatenated alignments.
Here’s a short worked example using the data from the RAxML “hands-on” tutorial
-
Download the script and all data files
curl -LO https://gist.githubusercontent.com/jonchang/34c2e8e473ec2e8f50574671e62c3365/raw/unconcatenate_phylip.py curl -LO https://sco.h-its.org/exelixis/resource/download/hands-on/dna.phy curl -LO https://sco.h-its.org/exelixis/resource/download/hands-on/simpleDNApartition.txt
-
Run the script
python3 unconcatenate_phylip.py dna.phy simpleDNApartition.txt
-
Examine the output
INFO: Working on 10 taxa INFO: Wrote to dna_DNA_p1.phylip INFO: Wrote to dna_DNA_p2.phylip
By default, the output is written in the PHYLIP format and is named like
{INPUT_FILE}_{PARTITION_NAME}.{FORMAT}
. For FASTA output, pass--type=fasta
. You can also specify a prefix to add to the output file names with--prefix=subdir/
, and optionally trim gaps and drop sequences consisting of only gaps with--trim
. Check the video above for a quick demonstration of all the options.You can also enable detailed output with
--verbose
.
This currently doesn’t support partition formats that partition by e.g., 1st + 2nd postition and 3rd position.