Splitting a concatenated RAxML-style PHYLIP file

I’ve written a little script in Python 3 that will unconcatenate a RAxML-style PHYLIP + partitions file. I’ve used it recently to do a gene tree-species tree analysis for phylogenetic inference. It’s a little slow since it doesn’t manage file handles well (or at all), but it should use very little memory and therefore be able to handle very large concatenated alignments.

Here’s a short worked example using the data from the RAxML “hands-on” tutorial

  1. Download the script and all data files

     curl -LO https://gist.githubusercontent.com/jonchang/34c2e8e473ec2e8f50574671e62c3365/raw/unconcatenate_phylip.py
     curl -LO https://sco.h-its.org/exelixis/resource/download/hands-on/dna.phy
     curl -LO https://sco.h-its.org/exelixis/resource/download/hands-on/simpleDNApartition.txt
    
  2. Run the script

     python3 unconcatenate_phylip.py dna.phy simpleDNApartition.txt
    
  3. Examine the output

     INFO: Working on 10 taxa
     INFO: Wrote to dna_DNA_p1.phylip
     INFO: Wrote to dna_DNA_p2.phylip
    

    By default, the output is written in the PHYLIP format and is named like {INPUT_FILE}_{PARTITION_NAME}.{FORMAT}. For FASTA output, pass --type=fasta. You can also specify a prefix to add to the output file names with --prefix=subdir/, and optionally trim gaps and drop sequences consisting of only gaps with --trim. Check the video above for a quick demonstration of all the options.

    You can also enable detailed output with --verbose.

This currently doesn’t support partition formats that partition by e.g., 1st + 2nd postition and 3rd position.

See the source on GitHub.