Splitting a concatenated RAxML-style PHYLIP file

06 Sep 2017

I’ve written a little script in Python 3 that will unconcatenate a RAxML-style PHYLIP + partitions file. I’ve used it recently to do a gene tree-species tree analysis for phylogenetic inference. It’s a little slow since it doesn’t manage file handles well (or at all), but it should use very little memory and therefore be able to handle very large concatenated alignments.

Here’s a short worked example using the data from the RAxML “hands-on” tutorial

Download the script and all data files

curl -LO https://gist.githubusercontent.com/jonchang/34c2e8e473ec2e8f50574671e62c3365/raw/unconcatenate_phylip.py
curl -LO https://sco.h-its.org/exelixis/resource/download/hands-on/dna.phy
curl -LO https://sco.h-its.org/exelixis/resource/download/hands-on/simpleDNApartition.txt

Run the script

python3 unconcatenate_phylip.py dna.phy simpleDNApartition.txt

Examine the output
```
INFO: Working on 10 taxa
INFO: Wrote to dna_DNA_p1.phylip
INFO: Wrote to dna_DNA_p2.phylip
```
By default, the output is written in the PHYLIP format and is named like {INPUT_FILE}_{PARTITION_NAME}.{FORMAT}. For FASTA output, pass --type=fasta. You can also specify a prefix to add to the output file names with --prefix=subdir/, and optionally trim gaps and drop sequences consisting of only gaps with --trim. Check the video above for a quick demonstration of all the options.

You can also enable detailed output with --verbose.

This currently doesn’t support partition formats that partition by e.g., 1st + 2nd postition and 3rd position.

See the source on GitHub.

If you found this post useful, please consider supporting my work with a cup of sake 🍶.

Jonathan Chang

Splitting a concatenated RAxML-style PHYLIP file

Related Posts