<?xml version="1.0" encoding="utf-8"?>
  <feed xmlns="http://www.w3.org/2005/Atom">
    <title>Jonathan Chang</title>
    <subtitle>evolutionary biologist</subtitle>
    <link href="https://jonathanchang.org/feed.xml" rel="self" type="application/atom+xml"/>
    <link href="https://jonathanchang.org/" rel="alternate" type="text/html"/>
    <id>https://jonathanchang.org/</id>
    <generator uri="https://jekyllrb.com/" version="4.3.3">Jekyll</generator>
    <updated>2024-02-21T00:34:51+00:00</updated>
    <author>
      <name>Jonathan Chang</name>
      <email>me@jonathanchang.org</email>
      <uri>https://jonathanchang.org</uri>
    </author>
    <rights>Copyright © 2024 Jonathan Chang</rights>
    <entry>
      <title><![CDATA[Download shapefiles from ESRI ArcGIS Online Story Maps]]></title>
      <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/downloading-esri-online-shapefiles/"/>
      <id>https://jonathanchang.org/blog/downloading-esri-online-shapefiles</id>
      <published>2022-11-04T00:00:00+00:00</published>
      <updated>2022-11-04T00:00:00+00:00</updated>
      <content type="html"><![CDATA[
     <p>Recently, we needed to get out some shapefiles from an <a href="https://www.arcgis.com/apps/MapSeries/index.html?appid=34603bd48c9f496fa2750a770f655013">ArcGIS Online map</a>. It’s immediately clear that there’s a lot of data, and no obvious way to get it from a download or share link anywhere on the app page. The desired solution is anything <em>but</em> taking a screenshot and tracing it in ImageJ, as that’s an absolute last resort. In this post, I’ll walk through how I managed to get those shapefiles downloaded, and hopefully provide some easy tips to do the same for other ArcGIS online maps.</p>
      <h2>The power of the web inspector</h2>
      <p>This is fundamentally a <a href="https://en.wikipedia.org/wiki/Web_scraping">web scraping task</a>, and I’ll start with opening the web developer tools in Firefox, by right-clicking a promising bit on the page (the map itself) and selecting “Inspect”. Looking through the HTML tree in the web inspector panel that pops up, I can see that while the shapefiles do appear to exist locally, these are parsed into a gnarly embedded <a href="https://en.wikipedia.org/wiki/Scalable_Vector_Graphics">SVG object</a>. This could be used to reconstruct the shapefile, but it seems like a big pain that I don’t want to deal with, so I move on from this avenue.</p>
      <p><img src="/uploads/2022/geojson/inspector-html.png" alt="Screenshot of an HTML source code tree, showing a complex SVG object." srcset="/uploads/2022/geojson/inspector-html.png 2x"></p>
      <p>Next, I’ll check out the network tab. I’ll need to refresh the page, and I can see that there are a ton of requests that go to a lot of different places. But, I suspect that any shapefile that’s loaded will likely be downloaded via <a href="https://en.wikipedia.org/wiki/XMLHttpRequest">XHR</a>, initiated from Javascript, and quite possibly hitting some API endpoint that probably speaks in JSON. I filter by JS and XHR and immediately see an request that pops out at me, to an endpoint at <code>services.arcgis.com</code> called <code>data</code> with a query payload of <code>f=json</code>. Inspecting that response object leads me to another API endpoint that appears to be what I want!</p>
      <p><img src="/uploads/2022/geojson/inspector-json.png" alt="Screenshot of the network panel of the web developer console, showing a JSON response object with interesting URL fields." srcset="/uploads/2022/geojson/inspector-json.png 2x"></p>
      <h2>ESRI API endpoints</h2>
      <p>I’m actually fairly familiar with ESRI’s REST APIs, and I know that I can actually <a href="https://services.arcgis.com/8df8p0NlLFEShl0r/arcgis/rest/services/FHA_Grades/FeatureServer/0">navigate to the API endpoint</a> and it’ll provide a fairly good description of its data. I can also interactively query it in the browser, without having to muck about with cURL in Terminal or anything like that. ESRI is quite humane in this respect, but again, there doesn’t seem to be an easy way to download the full shapefile directly from this endpoint, and I don’t feel quite up to the task of writing out a shapefile by copying and pasting a bunch of stuff.</p>
      <p><img src="/uploads/2022/geojson/esri-endpoint.png" alt="Screenshot of the ESRI REST API query tool, showing the result of a query with complex shapefile geometries." srcset="/uploads/2022/geojson/esri-endpoint.png 2x"></p>
      <p>A quick Google sojourn leads me to <code>pyesridump</code>, <a href="https://github.com/openaddresses/pyesridump">a wonderful tool</a> by the folks over at OpenAddresses. This is actually exactly what I needed! Install the <code>esri2geojson</code> command with <a href="https://pypa.github.io/pipx/">pipx</a>:</p>
      <div class="language-console?prompt=% highlighter-rouge">
        <div class="highlight">
          <pre class="highlight"><code data-lang="console?prompt=%"><span class="gp">%</span><span class="w"> </span>pipx <span class="nb">install </span>esridump
<span class="go">  installed package esridump 1.11.0, installed using Python 3.10.8
  These apps are now globally available
    - esri2geojson
done! ✨ 🌟 ✨
<p></span><span class="gp">%</span><span class="w"> </span>esri2geojson <span class="s2">“<a href="https://services.arcgis.com/8df8p0NlLFEShl0r/ArcGIS/rest/services/FHA_Grades/FeatureServer/0">https://services.arcgis.com/8df8p0NlLFEShl0r/ArcGIS/rest/services/FHA_Grades/FeatureServer/0</a>”</span> fha.geojson
<span class="go">2022-11-03 23:42:54,990 - cli.esridump - INFO - Built 1 requests using resultOffset method
</span></code></pre>
        </div>
      </div>
    </p>
    <p>Now to fire up R and see that everything looks right by plotting it.</p>
    <div class="language-console?lang=r&comments=true&output=plaintext&prompt=> highlighter-rouge">
      <div class="highlight">
        <pre class="highlight"><code data-lang="console?lang=r&comments=true&output=plaintext&prompt=>"><span class="gp">&gt;</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">sf</span><span class="p">)</span><span class="w">
</span>Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
<p><span class="gp">&gt;</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span>
<span class="gp">&gt;</span><span class="w"> </span><span class="n">xx</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read_sf</span><span class="p">(</span><span class="s2">“fha.geojson”</span><span class="p">)</span><span class="w">
</span>
<span class="gp">&gt;</span><span class="w"> </span><span class="n">xx</span><span class="w">
</span>Simple feature collection with 74 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -77.188 ymin: 38.79005 xmax: -76.8772 ymax: 39.0666
Geodetic CRS:  WGS 84
<span class="c"># A tibble: 74 × 5
</span>     FID Grade Shape__Area Shape__Length                         geometry
&lt;int&gt; &lt;chr&gt;       &lt;dbl&gt;         &lt;dbl&gt;                    &lt;POLYGON [°]&gt;
1     1 E5       2268576.         5849. ((-76.90432 38.85715, -76.9018 …
2     2 G7      13563378.        19322. ((-76.93371 38.87391, -76.90942…
3     3 H2       7772476.        12002. ((-76.88671 38.90218, -76.89007…
4     4 H1      12964128.        20269. ((-76.90942 38.89269, -76.93095…
5     5 G1       6516531.        19844. ((-76.93428 38.88311, -76.93574…
6     6 C4       7199183.        16914. ((-76.93371 38.87391, -76.96229…
7     7 H2       7328078.        14489. ((-76.96229 38.85169, -76.97798…
8     8 E2       9790479.        21957. ((-76.98859 38.8399, -76.9885 3…
9     9 F2       5253684.        15352. ((-76.99618 38.85609, -77.00305…
10    10 H1       1810343.         8218. ((-76.97203 38.89815, -76.9833 …
<span class="c"># … with 64 more rows</p>
<h1>ℹ Use <code>print(n = ...)</code> to see more rows</h1>
</span>
<span class="gp">&gt;</span><span class="w"> </span><span class="n">ggplot</span><span class="p">(</span><span class="n">xx</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">geom_sf</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Grade</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">theme_minimal</span><span class="p">()</span><span class="w">
</span></code></pre>
      </div>
    </div>
    <picture>
      <source type="image/svg+xml" srcset="/uploads/2022/geojson/fha-shapefile.svg">
      <img src="/uploads/2022/geojson/fha-shapefile.png" alt="Plot of the FHA shapefile dataset of Washington, DC.">
    </picture>
    <p>It looks fantastic, and is ready for further data analysis now!</p>
    <h2>A more direct route</h2>
    <p>After all of this, I was curious if there was a better way, so I did some digging. This ArcGIS Online tool is called ESRI Story Map Series, and the source code is actually <a href="https://github.com/Esri/storymap-series">available on GitHub</a>. Looking through the repository we can see it’s a Javascript app with a fairly rich library API, intended for ESRI’s customers to develop “story maps” with deep integrations to justify their hefty enterprise contracts. In the README, one of the <a href="https://github.com/Esri/storymap-series/blob/109e94458da8f297cd21b7ed877832b8a8ce9867/README.md#link-between-entries">code suggestions</a> points in an interesting direction, and I reopened the web inspector console to check it out.</p>
    <p>Based on the README example, I learned that the top-level object is called <code>app</code>, and that layers can be obtained through a method on the <code>app.map</code> object. I grub around in the app’s internal data structures using the Javascript console, and discover an interesting <code>_layers</code> key inside this object, which seems to have the relevant data that I’m interested in.</p>
    <p><img src="/uploads/2022/geojson/inspector-js.png" alt="Screenshot of the console panel of the web developer console, showing a Javascript data object corresponding to the ESRI map being shown in the map app." srcset="/uploads/2022/geojson/inspector-js.png 2x"></p>
    <p>The full invocation in the Javascript console to get the ESRI REST API endpoint is therefore:</p>
    <div class="language-javascript highlighter-rouge">
      <div class="highlight">
        <pre class="highlight"><code data-lang="javascript"><span class="nx">app</span><span class="p">.</span><span class="nx">map</span><span class="p">.</span><span class="nx">_layers</span><span class="p">.</span><span class="nx">FHA_Grades_4159</span><span class="p">.</span><span class="nx">url</span>
<span class="c1">// "https://services.arcgis.com/8df8p0NlLFEShl0r/arcgis/rest/services/FHA_Grades/FeatureServer/0" </span>
</code></pre>
      </div>
    </div>
    ]]></content>
</entry>
<entry>
  <title><![CDATA[Deploy your website to Neocities using GitHub Actions]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/deploying-your-static-site-to-neocities-using-github-actions/"/>
  <id>https://jonathanchang.org/blog/deploying-your-static-site-to-neocities-using-github-actions</id>
  <published>2022-01-26T19:01:00+00:00</published>
  <updated>2022-01-26T19:01:00+00:00</updated>
  <content type="html"><![CDATA[
     <p>This personal website (and a few of my hobby websites) are hosted on <a href="https://neocities.org/">Neocities</a>, a free web host service reminiscent of the <a href="https://en.wikipedia.org/wiki/Yahoo!_GeoCities">now-defunct GeoCities</a>. Neocities makes it incredibly easy to start creating websites right away, <a href="https://neocities.org/tutorials">even for total beginners</a>, with their web interface.</p>
  <p>However, I prefer to have all of my websites version controlled with Git and hosted on GitHub. To minimize the amount of thinking I need to do when I want to publish something new, my ideal workflow is something like this:</p>
  <ol>
    <li>Edit website</li>
    <li><code>git commit &amp;&amp; git push</code></li>
    <li>Changes go live</li>
  </ol>
  <p>How do I accomplish this? By automating step 3 using <a href="https://docs.github.com/en/actions">GitHub Actions</a>.</p>
  <p><img src="/uploads/2022/neocities/preview.png" alt="Illustration of the GitHub Octocat, the GitHub Actions logo, and the Neocities Cat" /></p>
  <h2>General guidelines for automation</h2>
  <p>You often don’t need to dive straight into the deep end of automating “all the things” right away. It’s better to build it up over time, through a gradual process that might look something like this:</p>
  <ol>
    <li>Type commands into a terminal window</li>
    <li>Copy and paste from a playbook checked into version control</li>
    <li>Move the code into a script that you remind yourself to execute in README.md</li>
    <li>Set up a GitHub Action to run that script on each commit</li>
  </ol>
  <p>This kind of staged approach is important since you don’t want to fall into this classic trap:</p>
  <p><a href="https://xkcd.com/1319/">
      <p><img src="https://imgs.xkcd.com/comics/automation_2x.png" alt="I spend a lot of time on this task. I should write a program automating it! (Actually, you'll spend more time debugging and developing the automation code than actually doing the thing)" srcset="https://imgs.xkcd.com/comics/automation_2x.png 2x"></p>
    </a></p>
  <p>Automation is actually a pretty ideal situation when deploying a website. There should be little in the way of configuration changes or other tweaks necessary once you’ve got everything set up, so the need to debug your code is hopefully minimized after you’ve gotten over that initial hump.</p>
  <h2>A simple example</h2>
  <p>Suppose your website has a single HTML file, which is transformed with some terminal commands, the result of which is deployed to Neocities.</p>
  <p>In this scenario, your HTML file has some text in it, but you also want to add a “Last updated” line that will automatically update without manual intervention.</p>
  <h3><code>index.html.template</code></h3>
  <p>This is a basic HTML file, and I’ve included a marker, <code>__LAST_UPDATED__</code> that will be replaced with the last updated date.</p>
  <div class="language-html highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="html"><span class="cp">&lt;!doctype html&gt;</span>
<span class="nt">&lt;html&gt;</span>
<span class="nt">&lt;body&gt;</span>
<p><span class="nt">&lt;h1&gt;</span>Dogs I’ve met<span class="nt">&lt;/h1&gt;</span></p>
<p><span class="nt">&lt;ul&gt;</span>
<span class="nt">&lt;li&gt;</span>Bacon<span class="nt">&lt;/li&gt;</span>
<span class="nt">&lt;li&gt;</span>Poutine<span class="nt">&lt;/li&gt;</span>
<span class="nt">&lt;li&gt;</span>Muffin<span class="nt">&lt;/li&gt;</span>
<span class="nt">&lt;/ul&gt;</span></p>
<p><span class="nt">&lt;p&gt;</span>This list was last updated on:
<span class="nt">&lt;em&gt;</span><strong>LAST_UPDATED</strong><span class="nt">&lt;/em&gt;</span>
<span class="nt">&lt;/p&gt;</span></p>
<p><span class="nt">&lt;/body&gt;</span>
<span class="nt">&lt;/html&gt;</span>
</code></pre>
    </div>
  </div>
</p>
<h3><code>Makefile</code></h3>
<p>This Makefile uses <code>sed</code> to replace the marker <code>__LAST_UPDATED__</code> with the current date and time, retrieved from the <code>/bin/date</code> command.</p>
<div class="language-make highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="make"><span class="nv">DATE</span> <span class="o">:=</span> <span class="p">$(</span>shell /bin/date<span class="p">)</span>
<span class="nl">site</span><span class="o">:</span> <span class="nf">index.html.template</span>
	<span class="nb">mkdir</span> <span class="nt">-p</span> _site
	<span class="nb">sed</span> <span class="nt">-e</span> <span class="s2">"s/__LAST_UPDATED__/</span><span class="p">$(</span><span class="s2">DATE</span><span class="p">)</span><span class="s2">/"</span> index.html.template <span class="o">&gt;</span> _site/index.html
</code></pre>
  </div>
</div>
<p>Now, you can imagine that the above <code>Makefile</code> could have a <code>deploy</code> target, where you <code>rsync</code> some files to a remote server. But we’ll use GitHub Actions here instead, so that your deploy step will happen after you <code>git push</code> your site up to GitHub.</p>
<h3><code>.gitignore</code></h3>
<p>Since your <code>_site</code> directory is a generated artifact, it’s a good idea to exclude it from version control.</p>
<div class="language-text highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="text">_site
</code></pre>
  </div>
</div>
<h3><code>.github/workflows/ci.yml</code></h3>
<p>This is the file that tells GitHub how to run Actions on your repository. It’s longer and more complicated than the other files, so I’ll break it up into different pieces. You can see the fully assembled YAML file at the end of this blog post.</p>
<p>First, we need to name the workflow, and specify which events this workflow should run on (<code>push</code> events to branches, and all <code>pull_request</code> events). Another interesting event is the <a href="https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#workflow_dispatch"><code>workflow_dispatch</code> event</a>, which lets you trigger workflows manually. This can be useful when your build script, for example, fetches data from an external resource that updates at a different cadence from your typical <code>git push</code>es.</p>
<div class="language-yaml highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="yaml"><span class="na">name</span><span class="pi">:</span> <span class="s">Build site</span>
<span class="na">on</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">push</span><span class="pi">,</span> <span class="nv">pull_request</span><span class="pi">]</span>
</code></pre>
  </div>
</div>
<p>Define the job that will be run, and what <a href="https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners">kind of GitHub-hosted worker</a> to run the job on. We’ll use <code>ubuntu-latest</code> in this example. I recommend using this one whenever possible as it starts up the quickest (ideal for rapid iteration), and for private repositories, will cost the least in <a href="https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#minute-multipliers">billable runner minutes</a>.</p>
<div class="language-yaml highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="yaml"><span class="na">jobs</span><span class="pi">:</span>
  <span class="na">build</span><span class="pi">:</span>
    <span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span>
</code></pre>
  </div>
</div>
<p>Each job consists of several steps that are run in order, possibly with conditions and various configuration options. We can reuse parts of workflows that other people have created; in this case, it’s the <a href="https://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#minute-multipliers">GitHub-created <code>actions/checkout</code> workflow</a>, which checks out the current Git repository and handles the various permissions and authentication for you.</p>
<div class="language-yaml highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="yaml">    <span class="na">steps</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">actions/checkout@v2</span>
</code></pre>
  </div>
</div>
<p>The next step is to run our <code>make</code> command. If there’s no existing workflow that does something for you, it’s easy to define your own by using a <code>run:</code> option.</p>
<div class="language-yaml highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="yaml">      <span class="pi">-</span> <span class="na">run</span><span class="pi">:</span> <span class="s">make</span>
</code></pre>
  </div>
</div>
<p>Per the Makefile above, the last step will have created a new folder <code>_site</code> that contains the generated website. Now this will deploy that website to Neocities, using <a href="https://github.com/bcomnes/deploy-to-neocities">Bret Comnes’s deploy-to-neocities workflow</a>.</p>
<div class="language-yaml highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="yaml">      <span class="pi">-</span> <span class="na">uses</span><span class="pi">:</span> <span class="s">bcomnes/deploy-to-neocities@v1</span>
</code></pre>
  </div>
</div>
<p>We’ll need to specify <a href="https://docs.github.com/en/actions/learn-github-actions/expressions">some additional conditions</a>. We don’t want to deploy the website if the previous step failed, so add a <code>success()</code> qualifier, and if we’re not on the <code>main</code> branch (say if we were developing something in a feature branch), we shouldn’t deploy either.</p>
<div class="language-yaml highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="yaml">        <span class="na">if</span><span class="pi">:</span> <span class="s">${{ success() &amp;&amp; github.ref == 'refs/heads/main' }}</span>
</code></pre>
  </div>
</div>
<p>Finally, we need to configure the <code>bcomnes/deploy-to-neocities</code> workflow with a few things, namely our Neocities API token (which we’ll set up later), the folder that we want to deploy to Neocities (<code>dist_dir: _site</code>), and whether we want to delete files that exist on Neocities but don’t exist in our deployment folder (<code>cleanup: true</code>).</p>
<div class="language-yaml highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="yaml">        <span class="na">with</span><span class="pi">:</span>
          <span class="na">api_token</span><span class="pi">:</span> <span class="s">${{ secrets.NEOCITIES_API_KEY }}</span>
          <span class="na">dist_dir</span><span class="pi">:</span> <span class="s">_site</span>
          <span class="na">cleanup</span><span class="pi">:</span> <span class="kc">true</span>
</code></pre>
  </div>
</div>
<h2>Adding your Neocities API token</h2>
<p>Once you’ve <a href="https://neocities.org/#new">created your Neocities site</a> (and optionally given them money for a <a href="https://neocities.org/supporter">Supporter account</a>), head on over to the Settings page and generate an API token:</p>
<p><img src="/uploads/2022/neocities/neocities_api_key.png" alt="Screenshot of the Neocities API token generation page" srcset="/uploads/2022/neocities/neocities_api_key.png 2x"></p>
<p>(The screenshot has a fake token, but it should be a 32 character hexadecimal string.)</p>
<p>Copy this token and add it as a <a href="https://docs.github.com/en/actions/security-guides/encrypted-secrets#creating-encrypted-secrets-for-a-repository">secret to your GitHub repository</a>, with the name <code>NEOCITIES_API_KEY</code>.</p>
<p><img src="/uploads/2022/neocities/github_secret.png" alt="Screenshot of the GitHub repository secrets settings page" srcset="/uploads/2022/neocities/github_secret.png 2x"></p>
<p>Once you’ve added this token, you’re all set! Just <code>git commit</code> your repository and <code>git push</code>, and assuming everything has been done correctly, you can <a href="https://docs.github.com/en/actions/quickstart#viewing-your-workflow-results">view your Actions workflow results</a> directly in the GitHub web interface and watch your files get uploaded to Neocites.</p>
<h2>A fallback deploy option</h2>
<p>One issue that I’ve experienced is sometimes the clever diffing algorithm in the <code>bcomnes/deploy-to-neocities</code> workflow fails to work. I’m not sure whether it’s because of something on the workflows side or something on Neocities’ side, but I <a href="https://github.com/jonchang/deploy-neocities">wrote my own GitHub Action, <code>jonchang/deploy-neocities</code></a> to serve as a fallback should the other workflow fail. My version is a bit slower, but it uses the official Neocities Ruby gem, and I haven’t had any problems with it even when the number of changes to make is large.</p>
<p>To set it up, first tweak the earlier workflow step to add a few new fields. In particular, the <code>id:</code> field lets you <a href="https://docs.github.com/en/actions/learn-github-actions/contexts#steps-context">refer back to this workflow step</a>, and <code>continue-on-error:</code> means the workflow can keep on running even if this particular step encounters a problem.</p>
<div class="language-yaml highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="yaml">      <span class="pi">-</span> <span class="na">if</span><span class="pi">:</span> <span class="s">${{ success() &amp;&amp; github.ref == 'refs/heads/main' }}</span>
        <span class="na">id</span><span class="pi">:</span> <span class="s">deploy</span>
        <span class="na">continue-on-error</span><span class="pi">:</span> <span class="kc">true</span>
        <span class="na">uses</span><span class="pi">:</span> <span class="s">bcomnes/deploy-to-neocities@v1</span>
        <span class="na">with</span><span class="pi">:</span>
          <span class="na">api_token</span><span class="pi">:</span> <span class="s">${{ secrets.NEOCITIES_API_KEY }}</span>
          <span class="na">dist_dir</span><span class="pi">:</span> <span class="s">_site</span>
          <span class="na">cleanup</span><span class="pi">:</span> <span class="kc">true</span>
</code></pre>
  </div>
</div>
<p>Then add my GitHub Action. We gate this workflow step on whether it’s running on the <code>main</code> branch, and whether the previous step (<code>id: deploy</code>) had a “failure” outcome.</p>
<div class="language-yaml highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="yaml">      <span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">Deploy to Neocities fallback</span>
        <span class="na">if</span><span class="pi">:</span> <span class="s">${{ steps.deploy.outcome == 'failure' &amp;&amp; github.ref == 'refs/heads/main' }}</span>
        <span class="na">uses</span><span class="pi">:</span> <span class="s">jonchang/deploy-neocities@master</span>
        <span class="na">with</span><span class="pi">:</span>
          <span class="na">key</span><span class="pi">:</span> <span class="s">${{ secrets.NEOCITIES_API_KEY }}</span>
          <span class="na">dir</span><span class="pi">:</span> <span class="s">_site</span>
          <span class="na">clean</span><span class="pi">:</span> <span class="kc">true</span>
</code></pre>
  </div>
</div>
<p>The full <code>.github/workflows/ci.yml</code> file should be as follows:</p>
<div class="language-yaml highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="yaml"><span class="na">name</span><span class="pi">:</span> <span class="s">Build site</span>
<p><span class="na">on</span><span class="pi">:</span> <span class="pi">[</span><span class="nv">push</span><span class="pi">,</span> <span class="nv">pull_request</span><span class="pi">]</span></p>
<p><span class="na">jobs</span><span class="pi">:</span>
<span class="na">build</span><span class="pi">:</span>
<span class="na">runs-on</span><span class="pi">:</span> <span class="s">ubuntu-latest</span></p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;span class="na"&gt;steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;actions/checkout@v2&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;run&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;make&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ success() &amp;amp;&amp;amp; github.ref == 'refs/heads/main' }}&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;deploy&lt;/span&gt;
    &lt;span class="na"&gt;continue-on-error&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
    &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bcomnes/deploy-to-neocities@v1&lt;/span&gt;
    &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;api_token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.NEOCITIES_API_KEY }}&lt;/span&gt;
      &lt;span class="na"&gt;dist_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;_site&lt;/span&gt;
      &lt;span class="na"&gt;cleanup&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Deploy to Neocities fallback&lt;/span&gt;
    &lt;span class="na"&gt;if&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ steps.deploy.outcome == 'failure' &amp;amp;&amp;amp; github.ref == 'refs/heads/main' }}&lt;/span&gt;
    &lt;span class="na"&gt;uses&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;jonchang/deploy-neocities@master&lt;/span&gt;
    &lt;span class="na"&gt;with&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;key&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;${{ secrets.NEOCITIES_API_KEY }}&lt;/span&gt;
      &lt;span class="na"&gt;dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;_site&lt;/span&gt;
      &lt;span class="na"&gt;clean&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
</code></pre>
  </div>
</div>
<p></code></pre>
</div>
</div>
</p>
<p>This whole blog post was actually written to explain why this fallback exists, but it felt kind of weird to say “Neocities sometimes has these problems but you should still support it”, so I decided to write an entire tutorial instead. Go build a website with Neocities, it’s fun and you won’t regret it! They run their own CDN somehow! It’s a really impressive piece of work.</p>
<p>Note that if you’re not interested in doing all of this work, there are a number of solutions that are similar, such as Cloudflare Pages.</p>
]]></content>
</entry>
<entry>
  <title><![CDATA[Three ways to check and fix ultrametric phylogenies]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/three-ways-to-check-and-fix-ultrametric-phylogenies/"/>
  <id>https://jonathanchang.org/blog/three-ways-to-check-and-fix-ultrametric-phylogenies</id>
  <published>2021-07-13T01:55:00+00:00</published>
  <updated>2021-07-13T01:55:00+00:00</updated>
  <content type="html"><![CDATA[
     <p>A recent user <a href="https://github.com/jonchang/tact/issues/230">bug report</a> in my software <a href="https://github.com/jonchang/tact">TACT</a> led me to look into how phylogenetic software varies in the way they determine whether a given phylogenetic tree is ultrametric (where the root-to-tip distance is equal among all tips). If you infer an ultrametric phylogeny using something like BEAST or treePL, your supposedly ultrametric tree can still cause problems for other tools by virtue of <em>not being ultrametric enough</em>.</p>
  <p>This ultrametric purity test was previously a problem during the great BAMM controversy of 2017 (“BAMMghazi”), when the whales phylogeny used as an example in BAMM <a href="https://github.com/macroevolution/bammtools/issues/45">suddenly stopped being ultrametric</a> as measured by the R function <code>ape::is.ultrametric</code>.</p>
  <p>How then, do tools differ in the way that they check for ultrametricity?</p>
  <p><img src="/uploads/2021/ultrametric/preview.jpg" alt="Illustration of a construction worker wielding a ‘stop’ sign in front of a breaching humpback whale. The label  is shown, describing the situation by means of analogy." /></p>
  <h2>Method 1: variance</h2>
  <p>This is <a href="https://github.com/FePhyFoFum/phyx/blob/f6559150a4cf7f78f2ec6f4edab232ce34b99fda/src/tree_utils.cpp#L486-L492">used in phyx</a>, and was <a href="https://github.com/cran/ape/blob/650e3dfdde5de98680ea96e0e3bc6e30f52a51b2/R/is.ultrametric.R#L35">used in ape prior to version 4.0</a>.</p>
  <p>Load the whales tree into R and compute the root-to-tip distances for all tips:</p>
  <div class="language-r highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="r"><span class="n">tre</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read.tree</span><span class="p">(</span><span class="s2">"https://raw.githubusercontent.com/macroevolution/bamm/ab1b69be13e9841d9e103170d0f61e4324f78676/examples/diversification/whales/whaletree.tre"</span><span class="p">)</span><span class="w">
<p></span><span class="n">N</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">Ntip</span><span class="p">(</span><span class="n">tre</span><span class="p">)</span><span class="w">
</span><span class="n">root_node</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">N</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1</span><span class="w">
</span><span class="n">root_to_tip</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">dist.nodes</span><span class="p">(</span><span class="n">tre</span><span class="p">)[</span><span class="m">1</span><span class="o">:</span><span class="n">N</span><span class="p">,</span><span class="w"> </span><span class="n">root_node</span><span class="p">]</span><span class="w">
</span></code></pre>
    </div>
  </div>
</p>
<p>Compute the variance using those distances:</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">var</span><span class="p">(</span><span class="n">root_to_tip</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] 6.519647e-13</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>The default tolerance in ape is the square root of R’s <a href="https://en.wikipedia.org/wiki/Machine_epsilon">machine epsilon</a>, defined as the smallest positive floating-point number <em>x</em> such that 1 + <em>x</em> ≠ 1. This can vary from computer to computer, but on my laptop, this value is:</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="nf">sqrt</span><span class="p">(</span><span class="n">.Machine</span><span class="o">$</span><span class="n">double.eps</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] 1.490116e-08</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>The whales tree is ultrametric using the variance method, since 1e-8 is much larger than 1e-13.</p>
<h2>Method 2: relative difference</h2>
<p>In ape 4.0, the <code>is.ultrametric</code> method <a href="https://github.com/cran/ape/blob/fa2a72e2814d26112792cf4676f6737abb7f7e0d/R/is.ultrametric.R#L35-L37">was changed</a> to use the relative difference of the minimum and maximum root-to-tip distances. I’ll get into why this was changed, but first, let’s look at how to calculate this value:</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">min_tip</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">min</span><span class="p">(</span><span class="n">root_to_tip</span><span class="p">)</span><span class="w">
</span><span class="n">max_tip</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">root_to_tip</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="n">max_tip</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">min_tip</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">max_tip</span><span class="w">
</span><span class="c1">## [1] 1.115516e-07</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Immediately we can see why the whales tree stopped being ultrametric in ape 4.0, as this relative difference is larger than the default tolerance.</p>
<p>Why was this change made? It wasn’t merely to cause a lot of problems for everyone; instead, the answer is <a href="https://cran.r-project.org/web/packages/ape/ape.pdf#Rfn.is.ultrametric.1">in the documentation</a>, which tersely states:</p>
<blockquote>
  <p>The default criterion is invariant to linear changes of the branch lengths.</p>
</blockquote>
<p>What does this look like in practice? Let’s first scale the root-to-tip distance by multiplying everything by 1000:</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">scaled_root_to_tip</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">root_to_tip</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="m">1000</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Now compare the variance and relative difference:</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">var</span><span class="p">(</span><span class="n">scaled_root_to_tip</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] 6.519647e-07</span><span class="w">
<p></span><span class="n">min_tip</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">min</span><span class="p">(</span><span class="n">scaled_root_to_tip</span><span class="p">)</span><span class="w">
</span><span class="n">max_tip</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">scaled_root_to_tip</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="n">max_tip</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">min_tip</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">max_tip</span><span class="w">
</span><span class="c1">## [1] 1.115516e-07</span><span class="w">
</span></code></pre>
  </div>
</div>
</p>
<p>Uh oh! Our whales are no longer ultrametric per the variance statistic, even though <em>none of the branch lengths have changed</em> relative to each other. You may not think that your phylogenies should be stretching like a taffy pull, but there are probably valid use cases for phylogenies like this.</p>
<p>This is the method that TACT currently uses to determine ultrametricity.</p>
<h2>Method 3: node ages</h2>
<p>The previous two methods all relied on comparing some aspect of root-to-tip distances. Here, we’ll actually compare the distances to the tips of <em>all nodes</em>. This is the method <a href="https://github.com/jeetsukumaran/DendroPy/blob/29fd294bf05d890ebf6a8d576c501e471db27ca1/src/dendropy/datamodel/treemodel.py#L5649-L5655">used in DendroPy</a> and the original impetus for this investigation. I’ll reimplement enough of this method in R to illustrate this technique.</p>
<p>First, reorder the tree for postorder traversal, and set up some convenience variables.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">tre_node_adjust</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">reorder</span><span class="p">(</span><span class="n">tre</span><span class="p">,</span><span class="w"> </span><span class="s2">"postorder"</span><span class="p">)</span><span class="w">
<p></span><span class="n">e1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tre_node_adjust</span><span class="o">$</span><span class="n">edge</span><span class="p">[,</span><span class="w"> </span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="c1"># parent node</span><span class="w">
</span><span class="n">e2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tre_node_adjust</span><span class="o">$</span><span class="n">edge</span><span class="p">[,</span><span class="w"> </span><span class="m">2</span><span class="p">]</span><span class="w"> </span><span class="c1"># child node</span><span class="w">
</span><span class="n">EL</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tre_node_adjust</span><span class="o">$</span><span class="n">edge.length</span><span class="w">
</span></code></pre>
  </div>
</div>
</p>
<p>Also set up an <code>ages</code> variable that will hold internal calculations for how old a node should be.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">ages</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">numeric</span><span class="p">(</span><span class="n">N</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">tre_node_adjust</span><span class="o">$</span><span class="n">Nnode</span><span class="p">)</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Next, start iterating…</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">ii</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="nf">seq_along</span><span class="p">(</span><span class="n">EL</span><span class="p">))</span><span class="w"> </span><span class="p">{</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>If we haven’t already stored an age for the parent node, go ahead and compute that now from the (left)<sup class="footnote-ref"><a href="#fn1" id="fnref1">1</a></sup> child node  and the current edge length.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="w">    </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">ages</span><span class="p">[</span><span class="n">e1</span><span class="p">[</span><span class="n">ii</span><span class="p">]]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="n">ages</span><span class="p">[</span><span class="n">e1</span><span class="p">[</span><span class="n">ii</span><span class="p">]]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ages</span><span class="p">[</span><span class="n">e2</span><span class="p">[</span><span class="n">ii</span><span class="p">]]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">EL</span><span class="p">[</span><span class="n">ii</span><span class="p">]</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Otherwise, retrieve the stored age for the parent node, and re-compute what the age should be based on the (right)<sup class="footnote-ref"><a href="#fn1" id="fnref1">1</a></sup> child node.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="w">    </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w">
        </span><span class="n">recorded_age</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ages</span><span class="p">[</span><span class="n">e1</span><span class="p">[</span><span class="n">ii</span><span class="p">]]</span><span class="w">
        </span><span class="n">new_age</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">ages</span><span class="p">[</span><span class="n">e2</span><span class="p">[</span><span class="n">ii</span><span class="p">]]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">EL</span><span class="p">[</span><span class="n">ii</span><span class="p">]</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Now test whether those ages differ. I could actually use either the variance or the relative difference method, but here I’ll just check for absolute difference.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="w">        </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">recorded_age</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">new_age</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
            </span><span class="n">cat</span><span class="p">(</span><span class="n">sprintf</span><span class="p">(</span><span class="s2">"node %i age %.6f != %.6f\n"</span><span class="p">,</span><span class="w"> </span><span class="n">e1</span><span class="p">[</span><span class="n">ii</span><span class="p">],</span><span class="w"> </span><span class="n">recorded_age</span><span class="p">,</span><span class="w"> </span><span class="n">new_age</span><span class="p">))</span><span class="w">
        </span><span class="p">}</span><span class="w">
    </span><span class="p">}</span><span class="w">
</span><span class="p">}</span><span class="w">
<p></span><span class="c1">## node 154 age 3.291163 != 3.291164</span><span class="w">
</span><span class="c1">## node 153 age 4.570892 != 4.570893</span><span class="w">
</span><span class="c1">## node 151 age 5.263495 != 5.263494</span><span class="w">
</span><span class="c1">## node 150 age 6.975185 != 6.975185</span><span class="w">
</span><span class="c1">## node 146 age 3.048675 != 3.048675</span><span class="w">
</span><span class="c1">## node 145 age 4.452720 != 4.452720</span><span class="w">
</span><span class="c1">## node 143 age 6.047030 != 6.047031</span><span class="w">
</span><span class="c1">## node 142 age 8.209050 != 8.209050</span><span class="w">
</span><span class="c1">## node 135 age 5.616381 != 5.616380</span><span class="w">
</span><span class="c1">## node 133 age 14.061554 != 14.061555</span><span class="w">
</span><span class="c1">## node 132 age 17.939426 != 17.939427</span><span class="w">
</span><span class="c1">## node 130 age 24.698214 != 24.698213</span><span class="w">
</span><span class="c1">## node 127 age 5.096384 != 5.096384</span><span class="w">
</span><span class="c1">## node 125 age 5.796587 != 5.796586</span><span class="w">
</span><span class="c1">## node 124 age 6.375631 != 6.375632</span><span class="w">
</span><span class="c1">## node 121 age 7.176680 != 7.176680</span><span class="w">
</span><span class="c1">## node 120 age 7.677424 != 7.677424</span><span class="w">
</span><span class="c1">## node 116 age 13.042869 != 13.042870</span><span class="w">
</span><span class="c1">## node 114 age 11.028304 != 11.028304</span><span class="w">
</span><span class="c1">## node 113 age 14.540283 != 14.540283</span><span class="w">
</span><span class="c1">## node 112 age 15.669702 != 15.669702</span><span class="w">
</span><span class="c1">## node 108 age 31.621529 != 31.621528</span><span class="w">
</span><span class="c1">## node 104 age 22.044391 != 22.044391</span><span class="w">
</span><span class="c1">## node 103 age 33.799003 != 33.799004</span><span class="w">
</span><span class="c1">## node 100 age 11.382958 != 11.382958</span><span class="w">
</span><span class="c1">## node 93 age 26.063016 != 26.063016</span><span class="w">
</span><span class="c1">## node 90 age 8.816019 != 8.816019</span><span class="w">
</span></code></pre>
  </div>
</div>
</p>
<p>Many internal nodes have ages that differ depending on whether you use the left or right child node to compute the age. This metric allows you to pinpoint exactly where in your phylogeny precision issues could be causing ultrametricity issues. I can imagine this being quite useful when you are grafting subtrees onto a backbone phylogeny and trying to figure out if you did the math on your branch lengths correctly.</p>
<h2>Fix 1: extending the tips</h2>
<p>Now that we’re aware of the differences between different ultrametricity checks, let’s look at ways to correct for phylogenies that aren’t quite there.</p>
<p>One possibility is to simply extend the tips of the tree until the root-to-tip distances are completely equal. This is implemented in R as <code>BioGeoBEARS::extend_tips_to_ultrametricize</code><sup class="footnote-ref"><a href="#fn2" id="fnref2">2</a></sup> and <code>phytools::force.ultrametric(method = &quot;extend&quot;)</code>.<sup class="footnote-ref"><a href="#fn3" id="fnref3">3</a></sup></p>
<p>First, set up a copy of the whales tree and compute some important values, including the difference from each root-to-tip distance to their maximum:</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">tre_extend</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tre</span><span class="w">
</span><span class="n">age_difference</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">root_to_tip</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">root_to_tip</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Next, grab the edges from the edge matrix that correspond to the tips. Note that the edges in <code>$edge.label</code> correspond to the <em>row numbers</em> in <code>$edge</code>, not the values in those rows! This is tricky and confusing. We can also assume that the tip edges appear in ascending order, from 1 to <em>N</em>.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">tip_edges</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tre_extend</span><span class="o">$</span><span class="n">edge</span><span class="p">[,</span><span class="w"> </span><span class="m">2</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">Ntip</span><span class="p">(</span><span class="n">tre_extend</span><span class="p">)</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Finally, extend the tips outwards and confirm that the new tree is ultrametric:</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">tre_extend</span><span class="o">$</span><span class="n">edge.length</span><span class="p">[</span><span class="n">tip_edges</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tre_extend</span><span class="o">$</span><span class="n">edge.length</span><span class="p">[</span><span class="n">tip_edges</span><span class="p">]</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">age_difference</span><span class="w">
</span><span class="n">is.ultrametric</span><span class="p">(</span><span class="n">tre_extend</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] TRUE</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>We can write a simple function in R that compares two phylogenies with identical topologies but differing branch lengths.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">diff_edge_lengths</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">phy</span><span class="p">,</span><span class="w"> </span><span class="n">phy2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="n">diffs</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">phy2</span><span class="o">$</span><span class="n">edge.length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">phy</span><span class="o">$</span><span class="n">edge.length</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Use the <code>sign</code> function, which returns -1, 0, or 1 when the input is negative, zero, or positive, respectively. Then assign each of those values to a color (or lack thereof). This is <a href="https://colorbrewer2.org/#type=diverging&amp;scheme=PiYG&amp;n=11">ColorBrewer palette PiYG</a>.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="w">    </span><span class="n">cols</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">sign</span><span class="p">(</span><span class="n">diffs</span><span class="p">)</span><span class="w">
    </span><span class="n">cols</span><span class="p">[</span><span class="n">cols</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"#7fbc41"</span><span class="w">
    </span><span class="n">cols</span><span class="p">[</span><span class="n">cols</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">-1</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="s2">"#de77ae"</span><span class="w">
    </span><span class="n">cols</span><span class="p">[</span><span class="n">cols</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="kc">NA</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Plot the tree and report the results. The plot needs a bit of adjustment since the defaults of <code>ape::plot.phylo</code> are a bit aggravating.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="w">    </span><span class="n">plot</span><span class="p">(</span><span class="n">phy</span><span class="p">,</span><span class="w"> </span><span class="n">show.tip.label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> </span><span class="n">no.margin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
    </span><span class="n">edgelabels</span><span class="p">(</span><span class="n">pch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">15</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cols</span><span class="p">)</span><span class="w">
    </span><span class="n">sprintf</span><span class="p">(</span><span class="s2">"%i longer branches, %i shorter branches"</span><span class="p">,</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">diffs</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="m">0</span><span class="p">),</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">diffs</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="m">0</span><span class="p">))</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>This function places squares on branches that have changed in length, with red meaning a shorter branch and green meaning a longer branch. As expected, nearly all of the terminal branches have been extended to enforce ultrametricity.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">diff_edge_lengths</span><span class="p">(</span><span class="n">tre</span><span class="p">,</span><span class="w"> </span><span class="n">tre_extend</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] "85 longer branches, 0 shorter branches"</span><span class="w">
</span></code></pre>
  </div>
</div>
<p><img src="/uploads/2021/ultrametric/extend.png" alt="Phylogeny of whales, forced to be ultrametric via the “extend tips” method. 85 terminal branches have increased in length." /></p>
<h2>Fix 2: non-negative least squares</h2>
<p>This is the default approach used in the R function <code>phytools::force.ultrametric</code> and the topic of a <a href="http://blog.phytools.org/2016/08/fixing-ultrametric-tree-whose-edges-are.html">phytools blog post</a>.</p>
<blockquote>
  <p>This will give you the edge lengths that result in the distances between taxa with minimum sum of squared differences from the distances implied by your input tree, under the criterion that the resulting tree is ultrametric.</p>
</blockquote>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">tre_nnls</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">phangorn</span><span class="o">::</span><span class="n">nnls.tree</span><span class="p">(</span><span class="n">cophenetic</span><span class="p">(</span><span class="n">tre</span><span class="p">),</span><span class="w"> </span><span class="n">tre</span><span class="p">,</span><span class="w"> </span><span class="n">rooted</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">is.ultrametric</span><span class="p">(</span><span class="n">tre_nnls</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] TRUE</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Since it’s trying to minimize differences among the pairwise tip distance matrix, you’d expect many branches to be adjusted. Plotting the differences show that this is indeed the case:</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">diff_edge_lengths</span><span class="p">(</span><span class="n">tre</span><span class="p">,</span><span class="w"> </span><span class="n">tre_nnls</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] "86 longer branches, 85 shorter branches"</span><span class="w">
</span></code></pre>
  </div>
</div>
<p><img src="/uploads/2021/ultrametric/nnls.png" alt="Phylogeny of whales, forced to be ultrametric via the “non-negative least squares” method. 171 branches have changed length, with about half becoming longer and half becoming shorter, in different parts of the tree." /></p>
<h2>Fix 3: node adjustment</h2>
<p>This is the approach optionally used in DendroPy,<sup class="footnote-ref"><a href="#fn4" id="fnref4">4</a></sup> and is how TACT fixes ultrametricity issues if asked. Whenever a node’s age differs between its left and right children, correct one of the branch lengths so the node’s age will be calculated the same regardless of whether you’re using the left descendants or the right descendants.<sup class="footnote-ref"><a href="#fn1" id="fnref1">1</a></sup></p>
<p>Change the prior loop that checks node ages to additionally adjust the branch length:</p>
<div class="language-diff highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="diff"> for (ii in seq_along(EL)) {
     if (ages[e1[ii]] == 0) {
         ages[e1[ii]] &lt;- ages[e2[ii]] + EL[ii]
     } else {
         recorded_age &lt;- ages[e1[ii]]
         new_age &lt;- ages[e2[ii]] + EL[ii]
         if (recorded_age != new_age) {
             cat(sprintf("node %i age %.6f != %.6f\n", e1[ii], recorded_age, new_age))
<span class="gi">+            EL[ii] &lt;- recorded_age - ages[e2[ii]]
</span>         }
     }
 }
</code></pre>
  </div>
</div>
<p>Then, update the branch lengths in the phylogeny itself and confirm that it’s ultrametric.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">tre_node_adjust</span><span class="o">$</span><span class="n">edge.length</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">EL</span><span class="w">
</span><span class="n">is.ultrametric</span><span class="p">(</span><span class="n">tre_node_adjust</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] TRUE</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Plotting the differences shows that this method actually changes the fewest number of branches.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">diff_edge_lengths</span><span class="p">(</span><span class="n">tre</span><span class="p">,</span><span class="w"> </span><span class="n">tre_node_adjust</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] "13 longer branches, 14 shorter branches"</span><span class="w">
</span></code></pre>
  </div>
</div>
<p><img src="/uploads/2021/ultrametric/adjust_nodes.png" alt="Phylogeny of whales, forced to be ultrametric via the “node adjustment” method. 27 branches have changed length, with about half becoming longer and half becoming shorter, in different parts of the tree." /></p>
<h2>Issues with large phylogenies</h2>
<p>Another issue identified in the <a href="https://github.com/jonchang/tact/issues/230#issuecomment-871628907">bug report</a> was the inability to use <code>phytools::force.ultrametric</code> on large phylogenies. This is indeed the case when testing a random tree with 50,000 tips.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">ape</span><span class="p">)</span><span class="w">
</span><span class="p">(</span><span class="n">xx</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">rcoal</span><span class="p">(</span><span class="m">50000</span><span class="p">))</span><span class="w">
</span><span class="c1">## Phylogenetic tree with 50000 tips and 49999 internal nodes.</span><span class="w">
</span><span class="c1">## </span><span class="w">
</span><span class="c1">## Tip labels:</span><span class="w">
</span><span class="c1">##   t11339, t29898, t18919, t6336, t34524, t1665, ...</span><span class="w">
</span><span class="c1">## </span><span class="w">
</span><span class="c1">## Rooted; includes branch lengths.</span><span class="w">
<p></span><span class="n">is.ultrametric</span><span class="p">(</span><span class="n">xx</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] TRUE</span><span class="w"></p>
<p></span><span class="n">phytools</span><span class="o">::</span><span class="n">force.ultrametric</span><span class="p">(</span><span class="n">xx</span><span class="p">,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">“nnls”</span><span class="p">)</span><span class="w">
</span><span class="c1">## Error in double(nm * nm) : vector size cannot be NA</span><span class="w">
</span><span class="c1">## In addition: Warning message:</span><span class="w">
</span><span class="c1">## In nm * nm : NAs produced by integer overflow</span><span class="w"></p>
<p></span><span class="n">phytools</span><span class="o">::</span><span class="n">force.ultrametric</span><span class="p">(</span><span class="n">xx</span><span class="p">,</span><span class="w"> </span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">“extend”</span><span class="p">)</span><span class="w">
</span><span class="c1">## Error: vector memory exhausted (limit reached?)</span><span class="w">
</span></code></pre>
  </div>
</div>
</p>
<p>With <code>method = &quot;extend&quot;</code>, its current implementation calls <code>diag(vcv(tree))</code>, which requires creating an <em>N</em> by <em>N</em> matrix as a temporary value, where N is the number of tips. With 50,000 tips this implies a vector of length 2.5 billion, which exceeds R’s vector limit of 2.1 billion. This function could be optimized to avoid this storage requirement. With <code>method = &quot;nnls&quot;</code> creating this <em>N</em> by <em>N</em> matrix may be unavoidable, so for large phylogenies consider using the tip extension or the node adjustment methods instead.</p>
<h2>Closing thoughts</h2>
<p>None of this really matters anyway, except in the special case of really big phylogenies, and even then not so much. If you’re certain your phylogeny is ultrametric, use any of these methods and you’ll be able to get your tree to be so precisely ultrametric that even the strictest tools can’t complain.</p>
<p>I’m serious! Look at how well the “extend tips” method works:</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">is.ultrametric</span><span class="p">(</span><span class="n">tre_extend</span><span class="p">,</span><span class="w"> </span><span class="n">tolerance</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w">
</span><span class="c1">## [1] TRUE</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>This means there are literally no differences in root-to-tip distances even to a precision of 16 digits. <a href="https://www.jpl.nasa.gov/edu/news/2016/3/16/how-many-decimals-of-pi-do-we-really-need/">This is enough to leave the solar system</a>, so it’s probably good enough for your phylogeny, which likely has more interesting sources of error beyond numerical precision.</p>
<h2>Notes</h2>
<ul>
  <li><a href="/uploads/2021/ultrametric/ultrametric.R">ultrametric.R</a></li>
  <li><a href="/uploads/2021/ultrametric/ultrametric.md">ultrametric.md</a></li>
</ul>
<section class="footnotes">
  <ol>
    <li id="fn1">
      <p>Left and right are arbitrary here and named only to help the reader distinguish between the two. <a href="#fnref1" class="footnote-backref">↩</a></p>
    </li>
    <li id="fn2">
      <p>BioGeoBEARS also has a function to average the heights of tip nodes. <a href="#fnref2" class="footnote-backref">↩</a></p>
    </li>
    <li id="fn3">
      <p>You can also read my <a href="https://github.com/macroevolution/bammtools/issues/45#issuecomment-289850728">embarrassing, non-working original attempt</a>. I’ve gotten better at R, I promise. <a href="#fnref3" class="footnote-backref">↩</a></p>
    </li>
    <li id="fn4">
      <p>Note that DendroPy lets you pick either the maximum or the minimum implied ages; my implementation here just picks the first one it sees, so is sensitive to rearrangements such as from a ladderized tree. <a href="#fnref4" class="footnote-backref">↩</a></p>
    </li>
  </ol>
</section>
]]></content>
</entry>
<entry>
  <title><![CDATA[Easily showcase your Google Scholar metrics in Jekyll]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/easily-showcase-your-google-scholar-metrics-in-jekyll/"/>
  <id>https://jonathanchang.org/blog/easily-showcase-your-google-scholar-metrics-in-jekyll</id>
  <published>2021-05-31T21:00:00+00:00</published>
  <updated>2021-05-31T21:00:00+00:00</updated>
  <content type="html"><![CDATA[
     <p>Whether you’re applying for academic positions, comparing yourself to your colleagues, or determining whether people on Twitter really deserve their grant funding, it’s important to look at scholarly metrics such as <a href="https://en.wikipedia.org/wiki/H-index">h-index</a>, <a href="https://en.wikipedia.org/wiki/Author-level_metrics#i10-index">i10-index</a>, and number of citations. However, showing them off directly in your CV is a major bummer, especially when you’re an <a href="https://www.theguardian.com/higher-education-network/2015/feb/06/academic-superstars-stop-playing-to-the-cameras-and-get-back-to-your-labs">academic superstar</a> and are too busy to update these vital statistics for all the conferences, grants, and positions you’re applying for! In this blog post, I’ll tell you three easy steps to automatically fetch these stats from <a href="https://scholar.google.com/citations?user=z5el-twAAAAJ">Google Scholar</a> and display them on your online vita with <a href="https://jekyllrb.com/">Jekyll</a>.</p>
  <p>First, take the below code, save it as a new file, <code>scholar_stats.rb</code>, and place it in your <a href="https://jekyllrb.com/docs/plugins/installation/"><code>_plugins</code> folder</a>.<sup class="footnote-ref"><a href="#fn1" id="fnref1">1</a></sup></p>
  <div class="language-ruby highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="ruby"><span class="nb">require</span> <span class="s1">'open-uri'</span>
<span class="nb">require</span> <span class="s1">'nokogiri'</span>
<p><span class="k">module</span> <span class="nn">Jekyll</span>
<span class="k">class</span> <span class="nc">ScholarStats</span> <span class="o">&lt;</span> <span class="no">Generator</span>
<span class="c1"># Replace <code>SCHOLAR_ID</code> with your own Google Scholar ID</span>
<span class="no">SCHOLAR_ID</span> <span class="o">=</span> <span class="s1">‘XXXXXXXXXX’</span><span class="p">.</span><span class="nf">freeze</span>
<span class="no">SCHOLAR_URL</span> <span class="o">=</span> <span class="s1">‘<a href="http://scholar.google.com/citations?hl=en&amp;amp;user=">http://scholar.google.com/citations?hl=en&amp;amp;user=</a>’</span><span class="p">.</span><span class="nf">freeze</span>
<span class="k">def</span> <span class="nf">generate</span><span class="p">(</span><span class="n">site</span><span class="p">)</span>
<span class="n">doc</span> <span class="o">=</span> <span class="no">Nokogiri</span><span class="o">::</span><span class="no">HTML</span><span class="p">(</span><span class="no">URI</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="no">SCHOLAR_URL</span> <span class="o">+</span> <span class="no">SCHOLAR_ID</span><span class="p">).</span><span class="nf">open</span><span class="p">)</span>
<span class="n">tbl</span> <span class="o">=</span> <span class="n">doc</span><span class="p">.</span><span class="nf">css</span><span class="p">(</span><span class="s1">‘table’</span><span class="p">).</span><span class="nf">first</span>
<span class="n">tbl_data</span> <span class="o">=</span> <span class="p">{</span> <span class="s1">‘id’</span> <span class="o">=&gt;</span> <span class="no">SCHOLAR_ID</span> <span class="p">}</span>
<span class="n">tbl</span><span class="p">.</span><span class="nf">css</span><span class="p">(</span><span class="s1">‘tr’</span><span class="p">)[</span><span class="mi">1</span><span class="o">..</span><span class="p">].</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">tr</span><span class="o">|</span>
<span class="n">cell_data</span> <span class="o">=</span> <span class="n">tr</span><span class="p">.</span><span class="nf">css</span><span class="p">(</span><span class="s1">‘td’</span><span class="p">).</span><span class="nf">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:text</span><span class="p">)</span>
<span class="n">tbl_data</span><span class="p">[</span><span class="n">cell_data</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nf">downcase</span><span class="p">.</span><span class="nf">sub</span><span class="p">(</span><span class="s1">’-’</span><span class="p">,</span> <span class="s1">’_’</span><span class="p">)]</span> <span class="o">=</span> <span class="n">cell_data</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="nf">to_i</span>
<span class="k">end</span>
<span class="n">site</span><span class="p">.</span><span class="nf">data</span><span class="p">[</span><span class="s1">‘scholar’</span><span class="p">]</span> <span class="o">=</span> <span class="n">tbl_data</span>
<span class="k">end</span>
<span class="k">end</span>
<span class="k">end</span>
</code></pre>
    </div>
  </div>
</p>
<p>Second, update the <code>SCHOLAR_ID</code> constant with your Google Scholar ID. This is taken from the link for your Google Scholar profile, like <code>https://scholar.google.com/citations?user=z5el-twAAAAJ</code>, and it’s the characters after the <code>user=</code> part. The <code>SCHOLAR_ID</code> for this Scholar profile would therefore be <code>z5el-twAAAAJ</code>.</p>
<p>Finally, you might need to add these to your <code>Gemfile</code> and run <code>bundle install</code>:</p>
<div class="language-ruby highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="ruby"><span class="n">gem</span> <span class="s2">"nokogiri"</span>
<span class="n">gem</span> <span class="s2">"open-uri"</span>
</code></pre>
  </div>
</div>
<p>Or install them yourself with <code>gem install open-uri nokogiri</code>.</p>
<p>Once you’ve done this, you should have access to a few new variables in your Jekyll installation. You can use them in the Markdown or HTML documents using Liquid templates like so:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>* [Profile](https://scholar.google.com/citations?user={{ site.data.scholar.id }})
* Citations: {{ site.data.scholar.citations }}
* h-index: {{ site.data.scholar.h_index }}
* i10-index: {{ site.data.scholar.i10_index }}
</code></pre>
  </div>
</div>
<p>This Markdown code on my Jekyll site translates to:</p>
<ul>
  <li><a href="https://scholar.google.com/citations?user=z5el-twAAAAJ">Profile</a></li>
  <li>Citations: 2143</li>
  <li>h-index: 14</li>
  <li>i10-index: 17</li>
</ul>
<p>Now sit back, relax, and watch as your numbers go up!</p>
<section class="footnotes">
  <ol>
    <li id="fn1">
      <p>If you are using something like <a href="https://pages.github.com/">GitHub Pages</a> to publish your site, you might not be able to use custom plugins! Instead, generate your Jekyll site locally and then push the generated static files up to your git repository. <a href="#fnref1" class="footnote-backref">↩</a></p>
    </li>
  </ol>
</section>
]]></content>
</entry>
<entry>
  <title><![CDATA[Free yourself from the Spotify desktop client with spotifyd]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/setting-up-spotifyd-on-macos/"/>
  <id>https://jonathanchang.org/blog/setting-up-spotifyd-on-macos</id>
  <published>2020-05-18T12:33:00+00:00</published>
  <updated>2020-05-18T12:33:00+00:00</updated>
  <content type="html"><![CDATA[
     <p><a href="https://en.wikipedia.org/wiki/Electron_(software_framework)#Criticism">Electron apps are a plague</a>. If you’ve ever wondered why:</p>
  <ul>
    <li>your computer chugs to a halt once more than two of {Slack, Discord, Skype, Messenger, WhatsApp, Signal, GitHub Desktop, Steam, VS Code} are open on the same machine</li>
    <li>scrolling, or playing a GIF or whatever in those apps, is incredibly laggy</li>
    <li>every app download is now 100MB+</li>
    <li>the typing shortcuts you’re used to in macOS Just Don’t Work</li>
  </ul>
  <p>then it’s likely that <a href="https://josephg.com/blog/electron-is-flash-for-the-desktop/">Electron is to blame</a>. I’m writing this blog post on a maxed-out 2016 13” MacBook Pro, and it can barely keep up with all these Electron apps I need to keep running. We can only speculate why all these large companies with enormous engineering resources cannot use the money that I pay them for their services to make software that doesn’t suck, but that’s for another blog post.</p>
  <p>Lately I’ve gotten <em>especially</em> annoyed at all of the Electron-based junk running on my machine, since I have to work from home, which means needing to use Docker to run or test out various Linux things, which is another 2 gigs of my laptop’s precious memory eaten away. I decided to look for non-Electron alternative clients for all of those. Enter <a href="https://github.com/Spotifyd/spotifyd"><code>spotifyd</code></a> and <a href="https://github.com/Rigellute/spotify-tui"><code>spotify-tui</code></a>. After switching software, I have an extra half-gig of memory that isn’t being wasted running yet another instance of Chromium.</p>
  <p>In this blog post, I’ll show you how to set up these on your macOS machine. I assume basic familiarity with managing your machine via Terminal. You’ll also need a Spotify Premium account for any of this to work. The wood chipper that is modern society can’t operate without sacrificing a few limbs!</p>
  <h2>Installing and configuring <code>spotifyd</code></h2>
  <p>This is an always-on service (hence the <code>d</code> in its name, for <a href="https://en.wikipedia.org/wiki/Daemon_(computing)">daemon</a>) that will wait in the background and play music requested by whatever Spotify client we choose; in this case, <code>spotify-tui</code>.</p>
  <p>First, get <code>spotifyd</code> installed. I’ve added it to <a href="https://brew.sh">Homebrew</a> already, so if you need to get that set up first, go ahead.</p>
  <div class="language-console highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="console"><span class="gp">$</span><span class="w"> </span>brew <span class="nb">install </span>spotifyd
<span class="gp">==&gt;</span><span class="w"> </span>Downloading https://homebrew.bintray.com/bottles/spotifyd-0.2.24.catalina.bo
<span class="gp">==&gt;</span><span class="w"> </span>Downloading from https://akamai.bintray.com/3f/3f51d6a45bdb965dcc88e34411949
<span class="gp">#</span><span class="c">####################################################################### 100.0%</span>
<span class="gp">==&gt;</span><span class="w"> </span>Pouring spotifyd-0.2.24.catalina.bottle.tar.gz
<span class="gp">==&gt;</span><span class="w"> </span>Caveats
<span class="go">Configure spotifyd using these instructions:
</span><span class="gp">  https://github.com/Spotifyd/spotifyd#</span>configuration-file
<span class="go">
To have launchd start spotifyd now and restart at startup:
  sudo brew services start spotifyd
</span><span class="gp">==&gt;</span><span class="w"> </span>Summary
<span class="go">/usr/local/Cellar/spotifyd/0.2.24: 8 files, 8.5MB
</span></code></pre>
    </div>
  </div>
  <p>You’ll need to create a configuration file named <code>~/.config/spotifyd/spotifyd.conf</code> that specifies your login information and other details. You can <a href="https://github.com/Spotifyd/spotifyd#configuration-file">read the full instructions</a>, but I’ve annotated my own configuration here:</p>
  <div class="language-ini highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="ini"><span class="nn">[global]</span>
<span class="c"># Fill this in with your Spotify login.
</span><span class="py">username</span> <span class="p">=</span> <span class="s">USERNAME</span>
<p><span class="c"># You’ll be using the macOS keychain to specify your password.
</span><span class="py">use_keyring</span> <span class="p">=</span> <span class="s">true</span></p>
<p><span class="c"># How this machine shows up in Spotify Connect.
</span><span class="py">device_name</span> <span class="p">=</span> <span class="s">“spotifyd”</span>
<span class="py">device_type</span> <span class="p">=</span> <span class="s">“computer”</span></p>
<p><span class="c"># This is the default location of Spotify’s cache, so just replace LOGIN_NAME</p>
<h1>with your macOS login name (type <code>whoami</code> at a Terminal window).</h1>
<p></span><span class="py">cache_path</span> <span class="p">=</span> <span class="s">”/Users/LOGIN_NAME/Library/Application Support/Spotify/PersistentCache/Storage”</span>
<span class="py">no_audio_cache</span> <span class="p">=</span> <span class="s">false</span></p>
<p><span class="c"># Various playback options. Tweak these if Spotify is too quiet.
</span><span class="py">bitrate</span> <span class="p">=</span> <span class="s">320</span>
<span class="py">volume_normalisation</span> <span class="p">=</span> <span class="s">true</span>
<span class="py">normalisation_pregain</span> <span class="p">=</span> <span class="s">-10</span></p>
<p><span class="c"># These need to be set, but don’t need to be changed.
</span><span class="py">backend</span> <span class="p">=</span> <span class="s">“rodio”</span>
<span class="py">mixer</span> <span class="p">=</span> <span class="s">“PCM”</span>
<span class="py">volume_controller</span> <span class="p">=</span> <span class="s">“softvol”</span>
<span class="py">zeroconf_port</span> <span class="p">=</span> <span class="s">1234</span>
</code></pre>
    </div>
  </div>
</p>
<p>Create and edit this file with <code>vim</code>, or whatever text editor you prefer:</p>
<div class="language-bash highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="bash"><span class="nb">mkdir</span> <span class="nt">-p</span> ~/.config/spotifyd
vim ~/.config/spotifyd/spotifyd.conf
</code></pre>
  </div>
</div>
<p>Next you’ll need to add your password to the system password manager. You can do this via the Keychain Access app, or just right in the Terminal:</p>
<div class="language-bash highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="bash">security add-generic-password <span class="nt">-s</span> spotifyd <span class="nt">-D</span> rust-keyring <span class="nt">-a</span> &lt;your username&gt; <span class="nt">-w</span> &lt;your password&gt;
</code></pre>
  </div>
</div>
<p>Be sure to use your <em>Spotify</em> username here, not your macOS username. You can confirm that it was added correctly by opening up Keychain Access and searching for <code>spotifyd</code>.</p>
<p><img src="/uploads/2020/spotify/keychain.png" alt="Proper spotifyd credentials in Keychain Access" /></p>
<p>This should be all the configuring you need to do. To test if it worked, first run <code>spotifyd</code> as just a plain app. After you run the following command, grant <code>spotifyd</code> access to the macOS Keychain and Firewall in the pop up that appears:</p>
<div class="language-console highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="console"><span class="gp">$</span><span class="w"> </span>spotifyd <span class="nt">--no-daemon</span>
<span class="go">Loading config from "/Users/jonchang/.config/spotifyd/spotifyd.conf"
No proxy specified
Using software volume controller.
Failed to register IPv6 receiver: Os { code: 49, kind: AddrNotAvailable, message: "Can\'t assign requested address" }
Checking keyring for password
Connecting to AP "gae2-accesspoint-b-hzk2.ap.spotify.com:443"
Authenticated as "jonchang" !
Country: "US"
</span></code></pre>
  </div>
</div>
<p>If everything worked correctly, you should similar output to what I have above. Open the official Spotify client on your phone or laptop, and confirm that there’s a new device in Spotify Connect:</p>
<p><img src="/uploads/2020/spotify/spotify-connect.png" alt="Spotify Connect with spotifyd visible" /></p>
<p>Press <code>CTRL-C</code> to stop <code>spotifyd</code>. Now we’ll use <code>brew services</code> to to run <code>spotifyd</code> in the background:</p>
<div class="language-console highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="console"><span class="gp">$</span><span class="w"> </span>brew services start spotifyd
<span class="gp">==&gt;</span><span class="w"> </span>Successfully started <span class="sb">`</span>spotifyd<span class="sb">`</span> <span class="o">(</span>label: homebrew.mxcl.spotifyd<span class="o">)</span>
</code></pre>
  </div>
</div>
<p>If you still see <code>spotifyd</code> show up in Spotify Connect, it worked!</p>
<h2>Installing and configuring <code>spotify-tui</code></h2>
<p>The Terminal app <code>spotify-tui</code> is how you’ll actually control <code>spotifyd</code> by showing you playlists and giving you playback controls and so on. There’s not that much involved, as the app itself will give you instructions that you can follow quite easily.</p>
<p>You’ll have to click through the Spotify Developer agreement and copy and paste some stuff, but it’s nothing too onerous. Just remember to say you’re making a non-commercial app, and set the “Redirect URI” in the Spotify Developer dashboard and everything should be peachy.</p>
<div class="language-console highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="console"><span class="gp">$</span><span class="w"> </span>brew <span class="nb">install </span>spotify-tui 
<span class="gp">==&gt;</span><span class="w"> </span>Downloading https://homebrew.bintray.com/bottles/spotify-tui-0.19.0.catalina
<span class="gp">==&gt;</span><span class="w"> </span>Downloading from https://akamai.bintray.com/35/358ad4bae54d211d8be22d5ca3f46
<span class="gp">#</span><span class="c">####################################################################### 100.0%</span>
<span class="gp">==&gt;</span><span class="w"> </span>Pouring spotify-tui-0.19.0.catalina.bottle.tar.gz
<span class="go">/usr/local/Cellar/spotify-tui/0.19.0: 8 files, 7.7MB
<p></span><span class="gp">$</span><span class="w"> </span>spt
<span class="go">
_________  ____  / /<em>(</em>) <em><em>/</em>  __      / /</em>__  <strong>(_)
/ <em>/ __ / __ / __/ / /</em>/ / / /</strong><em>/ <strong>/ / / / /
(</strong>  ) /<em>/ / /</em>/ / /<em>/ / __/ /</em>/ /</em>__<strong>/ /<em>/ /</em>/ / /
/<em><em><strong>/ .</strong></em>/_</em></strong>/_<em>/</em>/<em>/  _</em>, /      _<em>/_</em>,<em>/</em>/
/<em>/                    /</em>___/</p>
<p>Config will be saved to /Users/jonchang/.config/spotify-tui/client.yml</p>
<p>How to get setup:</p>
<ol>
<li>Go to the Spotify dashboard - <a href="https://developer.spotify.com/dashboard/applications">https://developer.spotify.com/dashboard/applications</a></li>
<li>Click <code>Create a Client ID</code> and create an app</li>
<li>Now click <code>Edit Settings</code></li>
<li>Add <code>http://localhost:8888/callback</code> to the Redirect URIs</li>
<li>You are now ready to authenticate with Spotify!
</span></code></pre>
  </div>
</div>
</li>
</ol>
<p>If you’ve set everything up correctly you should see the text interface pop up like so:</p>
<p><img src="/uploads/2020/spotify/spotify-tui.png" alt="The spotify-tui text-mode interface" /></p>
<p>To be honest, I used it for a bit, and then decided that I didn’t really like text-mode interfaces all that much. Instead, I just control Spotify from my phone via Spotify Connect, so this hasn’t gotten that much use. Maybe one day I’ll teach myself Swift and write a native macOS Spotify Connect player…</p>
<h2>But I’m on Linux!</h2>
<p>I dunno, on Linux you’re generally expected to figure things out on your own, so maybe try <code>apt install spotifyd spotify-tui</code> followed by <code>sudo systemctl start spotifyd</code> and see if that works <code>¯\_(ツ)_/¯</code></p>
<h2>Is this all legal?</h2>
<p>Probably not. While it <em>would</em> be pretty weird for Spotify to sue or ask to imprison their own paying customers, I can’t predict how Spotify’s CEO might aim to Maximize Shareholder Value in the future.</p>
<h2>Updates</h2>
<p><strong>January 2021</strong>: <code>spt</code> has been updated to fix a lot of the issues I previously had so now I use it a lot more over the iPhone app via Spotify Connect!</p>
<p><strong>June 2021</strong>: I got tired of Spotify and now subscribe to Apple Music instead. Now I can get a gorgeous music discovery and player interface while also doing other things at the same time!</p>
]]></content>
</entry>
<entry>
  <title><![CDATA[Continuous integration using GitHub Actions for Homebrew on Linux]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/continuous-integration-using-github-actions-for-homebrew-on-linux/"/>
  <id>https://jonathanchang.org/blog/continuous-integration-using-github-actions-for-homebrew-on-linux</id>
  <published>2020-04-10T08:12:00+00:00</published>
  <updated>2020-04-10T08:12:00+00:00</updated>
  <content type="html"><![CDATA[
     <p>Homebrew is <a href="https://formulae.brew.sh/analytics/">arguably the most popular package manager for macOS</a>, but it hosts a small and growing userbase on Linux as well! This blog post will explain how the Homebrew maintainers build binary packages (<em>bottles</em>) on Linux.</p>
  <h2>How binaries used to be built on Linux</h2>
  <p><em>Taps</em>, in Homebrew parlance, are special directories in your Homebrew installation, which contain <em>formulae</em> that describe how to build and install software. In other package managers, taps are variously called repositories, channels, streams, and so on. The default tap on macOS is called <a href="https://github.com/Homebrew/homebrew-core">Homebrew/homebrew-core</a>, while the default tap on Linux is a fork of this called <a href="https://github.com/Homebrew/linuxbrew-core">Homebrew/linuxbrew-core</a>. For brevity, I’ll be calling these <code>Homebrew/core</code> and <code>Linuxbrew/core</code>. <em>Bottles</em> are binary packages that we ship to users. This saves a lot of time, as otherwise users would need to build everything from source, an approach that is unsustainable for both users and maintainers.</p>
  <p><img src="/uploads/2020/linuxbrew-ci/linux-workflow.svg" alt="The old workflow for Linux." /></p>
  <p>The above diagram shows how we used to build formulae for Linux. As <code>Linuxbrew/core</code> is a fork of <code>Homebrew/core</code>, maintainers would wait for changes to happen upstream in <code>Homebrew/core</code>, then use a <a href="https://github.com/Homebrew/homebrew-linux-dev">special command</a> <code>brew merge-homebrew</code> to update around 5–10 formulae. This merging process introduces merge conflicts, particularly in the <code>bottle do</code> block that signals the availability of binary packages:</p>
  <div class="language-diff highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="diff"><span class="gd">&lt;&lt;&lt;&lt;&lt;&lt;&lt; HEAD
</span>    sha256 "bd66be269cbfe387920651c5f4f4bc01e0793034d08b5975f35f7fdfdb6c61a7" =&gt; :sierra
    sha256 "7071cb98f72c73adb30afbe049beaf947fabfeb55e9f03e0db594c568d77d69d" =&gt; :el_capitan
    sha256 "c7c0fe2464771bdcfd626fcbda9f55cb003ac1de060c51459366907edd912683" =&gt; :yosemite
    sha256 "95d4c82d38262a4bc7ef4f0a10ce2ecf90e137b67df15f8bf8df76e962e218b6" =&gt; :x86_64_linux
<span class="gh">=======
</span>    sha256 "ee6db42174fdc572d743e0142818b542291ca2e6ea3c20ff6a47686589cdc274" =&gt; :sierra
    sha256 "e079a92a6156e2c87c59a59887d0ae0b6450d6f3a9c1fe14838b6bc657faefaa" =&gt; :el_capitan
    sha256 "c334f91d5809d2be3982f511a3dfe9a887ef911b88b25f870558d5c7e18a15ad" =&gt; :yosemite
<span class="gi">&gt;&gt;&gt;&gt;&gt;&gt;&gt; homebrew/master
</span></code></pre>
    </div>
  </div>
  <p>In this diff, <code>Linuxbrew/core</code>‘s version (above the <code>=======</code>) has an <code>:x86_64_linux</code> tag indicating the availability of binaries for Linux, but because upstream’s repository <code>homebrew</code> updated the <code>sha256</code> hash of its macOS bottles, Git can’t figure out how to merge these changes. These merge conflicts have to be resolved manually by maintainers; in this case, the conflict is resolved in favor of <code>Homebrew/core</code>’s version of the bottle block.</p>
  <p>Once the maintainer fixes the merge conflicts, <code>brew merge-homebrew</code> will automatically open a pull request against <code>Linuxbrew/core</code>. To close the pull request and update <code>Linuxbrew/core</code> with these new formulae, the maintainer runs <code>brew pull --clean</code> and <code>git push</code>.</p>
  <p>You’ll notice that in fixing the above merge conflict, the Linux bottles are no longer described, meaning that users who <code>brew install</code> that formula will have to build from source rather than downloading binary packages. To actually build bottles for Linux, a maintainer needs to run <code>brew find-formule-to-bottle</code> to identify which formulae need new binaries. This is piped into <code>brew build-bottle-pr</code> which will open individual pull requests for each formula requesting a bottle build. Here’s an example of a <a href="https://github.com/Homebrew/linuxbrew-core/pull/19249">small core merge</a> of a single formula and its <a href="https://github.com/Homebrew/linuxbrew-core/pull/19250">corresponding bottle pull request</a>:</p>
  <div class="language-diff highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="diff"><span class="gh">diff --git a/Formula/mpv.rb b/Formula/mpv.rb
index 5d2a63fe4e8..f7f6ee454be 100644
</span><span class="gd">--- a/Formula/mpv.rb
</span><span class="gi">+++ b/Formula/mpv.rb
</span><span class="p">@@ -1,3 +1,4 @@</span>
<span class="gi">+# mpv: Build a bottle for Linux
</span> class Mpv &lt; Formula
   desc "Media player based on MPlayer and mplayer2"
   homepage "https://mpv.io"
</code></pre>
    </div>
  </div>
  <p>If you look at the diff, you’ll see that it only adds a comment, so it doesn’t seem like the change in this pull request would actually do anything. But this actually signals to <a href="https://github.com/Homebrew/homebrew-test-bot"><code>brew test-bot</code></a> that it should build a new bottle for Linux. Behind the scenes, once the bottle is successfully built, the Azure Pipelines release job uploads the bottles to Bintray in an “unpublished” state. The release job also pushes a commit containing bottle data (its SHA-256 hash) to a fork of the <code>Linuxbrew/core</code> repository, and tags it with a pull request number (e.g., <code>pr-1234</code>).</p>
  <div class="language-diff highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="diff"><span class="gh">diff --git a/Formula/mpv.rb b/Formula/mpv.rb
index 5d2a63fe4e8..f73774e9328 100644
</span><span class="gd">--- a/Formula/mpv.rb
</span><span class="gi">+++ b/Formula/mpv.rb
</span><span class="p">@@ -8,5 +8,6 @@</span> class Mpv &lt; Formula
   bottle do
     sha256 "dd0fe84dea1268524e18d210595e31b295906e334ae8114124b94a94d130de60" =&gt; :catalina
     sha256 "22c3aa2fb8ec77b5125c836badf0ad7889b512280f54f310c5a6ab8e77099fa6" =&gt; :mojave
     sha256 "0477b20f9a166d746d84c2a7d0b191159c6825512fe66c38ddf9ca6c43403d97" =&gt; :high_sierra
<span class="gi">+    sha256 "41a811990283f63ce8d2132715ee2ada8b15fd29b11df5427d5ae0b40e947816" =&gt; :x86_64_linux
</span>   end
</code></pre>
    </div>
  </div>
  <p>After all this happens, a maintainer will then run <code>brew pull --bottle</code>, which cherry-picks the bottle commit to the maintainer’s local copy of <code>Linuxbrew/core</code>, and also publishes the bottles on Bintray. However, prior to pushing the new bottle descriptors to GitHub, the bottling request comment <code># foo: Build a bottle for Linux</code> needs to be removed, to avoid cluttering <code>Linuxbrew/core</code> with unnecessary comments. Maintainers use a <a href="https://github.com/Homebrew/homebrew-linux-dev">custom command</a> <code>brew squash-bottle-pr</code> to remove these comments when necessary.</p>
  <p>Finally, after confirming that everything is correct, the maintainer then pushes the bottling commits to GitHub. Linux users can now update and download the new bottles with <code>brew update</code> and <code>brew install</code>. Success! 🎉</p>
  <h3>Sidebar</h3>
  <p>This process, with all its horrors, was actually a significant improvement. Homebrew on Linux previously used CircleCI for its automation, which required an <a href="https://github.com/Linuxbrew/linuxbrew-lambda">AWS Lambda worker</a> to transfer bottles from Circle to Bintray. After the migration to Azure Pipelines, this Rube Goldberg machine was decommissioned.</p>
  <h2>How binaries are now built on Linux</h2>
  <p>You might have noticed how frustratingly manual this process is. In my diagrams, the solid lines indicate manual actions that maintainers need to do, while the dashed lines automatic steps executed by continuous integration. There is a distressing amount of manual intervention needed for Homebrew maintainers on Linux, which can lead to frustration and burnout from the high workload.</p>
  <p><img src="/uploads/2020/linuxbrew-ci/jonathan-hacks-outdoors.jpg" alt="Jonathan hacking on continuous integration on his laptop outdoors in the cold." /></p>
  <p><em>Jonathan trying to update and fix Homebrew’s GitHub Actions at the Homebrew annual meeting. “Just one more commit and it’ll work…”</em></p>
  <p>The Homebrew on Linux maintainers have now migrated to a much more sustainable system using GitHub Actions. Thanks to funding from Homebrew’s supporters on <a href="https://github.com/sponsors/">GitHub Sponsors</a> and <a href="https://github.com/sponsors/">Patreon</a>, a number of maintainers were able to fly to Brussels for Homebrew’s annual meeting and hack on continuous integration. This focused, in-person work paid off huge dividends, as Homebrew on Linux now has a much more manageable and automated bottling infrastructure, as can be seen in this figure:</p>
  <p><img src="/uploads/2020/linuxbrew-ci/linux-workflow-new.svg" alt="The newer workflow for Linux." /></p>
  <p>You’ll notice that much of the previously manual work is now completely automated. After merges from <code>Homebrew/core</code>, the GitHub Actions runners will automatically attempt to build bottles, without a maintainer needing to open individual pull requests to request build jobs. If the automated tests don’t find any problems when building the updated formulae, the maintainer’s work ends there — new binaries are automatically uploaded and published without further intervention.</p>
  <p>If one or more bottles fail to build, GitHub Actions <a href="https://github.com/Homebrew/linuxbrew-core/pull/19992#issuecomment-608367511">notifies the maintainer by posting an issue comment</a>. The maintainer can then open a <a href="https://github.com/Homebrew/linuxbrew-core/pull/19994">pull request to fix the build failure</a>, and, once tests pass on that pull request, merge in the changes. GitHub Actions will then automatically publish the binary bottles for the fixed formula.</p>
  <h2>Conclusion</h2>
  <p>What, in practice, do these changes mean for maintainers, contributors, and users?</p>
  <p>For maintainers, there is far less work needed to get new maintainers up to speed. The only <em>unique</em> aspect of maintaining <code>Linuxbrew/core</code> is now the <code>brew merge-homebrew</code> workflow, and it is a single command with relatively few sharp edges. Everything else just involves fixing Homebrew formulae to build on Linux, which is a relatively straightforward task with our <a href="https://hub.docker.com/r/homebrew/brew">Docker container</a>. We hope to automate merges from <code>Homebrew/core</code> in the short term, which would remove yet another source of manual intervention for Linux maintainers, and in the long-term, to merge <code>Linuxbrew/core</code> into <code>Homebrew/core</code>.</p>
  <p>For contributors, the contribution experience is much more straightforward. It’s now very obvious when builds have failed, and the new merge-based workflow means that contributors see their pull requests as “merged”, which I believe is both psychologically rewarding (to “see the purple”) and reduces confusion compared to the old rebase workflow, which marked pull requests as “closed” in red.</p>
  <p>For users, the most obvious benefit is that updates from <code>Homebrew/core</code> now happen much more rapidly due to lower maintainer burdens. In addition, we’re now able to bottle many more formulae. The increased number of bottles means more reliable software for users and fewer support requests that maintainers need to handle.</p>
  <p><img src="/uploads/2020/linuxbrew-ci/bottles.svg" alt="Statistics on the number of formulae bottled over time." /></p>
  <p><em>Top panel: Number of bottled formulae (solid line) and total formulae (dashed line) over time. Bottom panel: Percent of total formulae that are bottled on Linux.</em> <a href="/uploads/2020/linuxbrew-ci/stats.csv">Download the Linux bottling statistics</a>.</p>
  <p>After most of the bugs were ironed out of the new GitHub Actions process, the Linux maintainers attempted to build binary bottles for every single formula, from A to Z. You can clearly see the huge jump in the number of formulae with binaries available in the early part of 2020, after we had migrated to our new continuous integration infrastructure.</p>
  <p>Our goal in Brussels was to ensure that a maintainer could merge a pull request with passing tests from their phone’s web browser, and have everything else done behind the scenes by GitHub Actions. I’m proud to say that we achieved that, and we’ll hopefully be able to build on that expertise to modernize the continuous integration infrastructure for Homebrew on macOS.</p>
  <p>This work was only made possible thanks to travel funds provided by <a href="https://github.com/Homebrew/brew/#donations">Homebrew’s sponsors</a>. I’d also like to thank the Homebrew maintainers, especially Issy Long, Michka Popoff, and Sean Molenaar, for feedback on this post prior to publication, and the other Linux maintainers, especially Dawid Dziurla, for checking my work and proposing fixes to my terminally buggy code. Shaun Jackman prototyped the initial workflow in <a href="https://github.com/brewsci/">Brewsci</a>, which inspired the solution we settled on.</p>
  <p>In my next post, I’ll talk about the work we’ve done to migrate to GitHub Actions on macOS. While you wait, <a href="/blog/maintain-your-own-homebrew-repository-with-binary-bottles/">read about how binaries are currently built on macOS</a>, check out the <a href="https://mikemcquaid.com/2017/09/29/homebrew-ci-evolution/">history of Homebrew’s macOS infrastructure</a>, or <a href="https://www.macstadium.com/customers/homebrew">read about how we use Orka</a> for macOS hosting.</p>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[Can we estimate diversification rates on extant phylogenies?]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/nonidentifiability-in-diversification-rate-models/"/>
  <id>https://jonathanchang.org/blog/nonidentifiability-in-diversification-rate-models</id>
  <published>2019-08-01T04:14:00+00:00</published>
  <updated>2019-08-01T04:14:00+00:00</updated>
  <content type="html"><![CDATA[
     <p>An exciting new paper by Stilianos Louca and Matt Pennell, <a href="https://www.nature.com/articles/s41586-020-2176-1">“Extant timetrees are consistent with a myriad of diversification histories”</a><sup class="footnote-ref"><a href="#fn1" id="fnref1">1</a></sup>, is out now in <em>Nature</em>. I think it’s an interesting paper and wanted to quickly jot down my thoughts on it.</p>
  <p>There’s been widespread use of software tools like BAMM, RPANDA, and SSE-class models to estimate how diversity is generated by varying diversification rates. The huge amount of attention on papers that highlight where our blind spots could be when estimating rates<sup class="footnote-ref"><a href="#fn2" id="fnref2">2</a></sup> indicate that the field really cares about getting this stuff right, because it has important implications for all of our downstream analyses.</p>
  <p>Nee et al. 1994<sup class="footnote-ref"><a href="#fn3" id="fnref3">3</a></sup> showed that you could estimate speciation and extinction rates on an extant-only phylogeny by fitting two separate rates to a lineage-through-time plot (left panel). However, when rates are variable, it becomes impossible to identify whether there are, for example, two separate speciation rates, one older and one newer, or whether there is high extinction towards the present (right panel).<sup class="footnote-ref"><a href="#fn4" id="fnref4">4</a></sup></p>
  <p><img src="/uploads/2019/divrates/nee_vs_rabosky.png" alt="Lineage-through-time plot showing an increase in the number of lineages towards the present. Estimating speciation and extinction rates can use the “uptick” in the number of lineages close to the present to estimate speciation rate without the effect of extinction. Or, the “uptick” can estimate time-variable speciation rates on a phylogeny to infer the new rate of speciation closer to the present. Modified from Nee 2006 and Rabosky 2010." /></p>
  <p>Dealing with these kinds of non-identifiability issues is really critical when fitting rate-varying diversification models. For example, with incompletely sampled phylogenies, you need to know either the amount of incomplete sampling or the extinction rate, otherwise these are non-identifiable.<sup class="footnote-ref"><a href="#fn5" id="fnref5">5</a></sup> A paper that I co-authored also found that when both speciation and extinction are allowed to vary, model fit suffers substantially, likely due to identifiability problems, and our estimates of diversification rates can therefore vary wildly.<sup class="footnote-ref"><a href="#fn6" id="fnref6">6</a></sup></p>
  <p>Louca and Pennell unify the findings of these (and other) previous works and show that for <strong>any extant phylogeny</strong>, where (potentially time-varying) speciation and extinction rates have been fit, you can construct an infinite number of alternative speciation and extinction rate histories that have the <strong>same likelihood</strong>.  They also show that, when model and/or parameter space is limited, the “infinite plausible histories” usually collapse down into a single best-supported diversification history. This is why for many methods that fit diversification rates, we can generally find a way to limit the models chosen to avoid the identifiability problem.</p>
  <p><img src="/uploads/2019/divrates/louca2019.png" alt="Lineage-through-time plot of four similar looking models, but with very different diversification rate histories. Modified from Figure 1 of Louca &amp; Pennell 2019." /></p>
  <p>Many extensions to the general birth-death model are likely also affected by these results. We know from the Stadler paper<sup class="footnote-ref"><a href="#fn5" id="fnref5">5</a></sup> that incomplete sampling can be non-identifiable, but I think this finding is fully general to many other extensions of the birth-death model, such as the birth-death-preservation model used for fossil phylogenies. In that case, we can estimate extinction due to the presence of extinct lineages, but varying extinction and preservation rates also lead to a non-identifiable model.<sup class="footnote-ref"><a href="#fn7" id="fnref7">7</a></sup></p>
  <p>In light of this paper, I think there’s a strong argument against trying to estimate a single true diversification history. For example, BAMM suggests using model averaging to summarize your posterior distribution of rate shift data, rather than just giving you a point estimate of the “best-supported” event configuration. Model averaging has also been shown to be quite good for SSE-class models, as pointed out in a recent paper.<sup class="footnote-ref"><a href="#fn8" id="fnref8">8</a></sup></p>
  <p>I also suspect that identifiability is the reason why, when simulating trees, the maximum likelihood estimate of that tree’s speciation and extinction rates don’t always match the generating parameters. There are a number of known pitfalls when simulating phylogenies,<sup class="footnote-ref"><a href="#fn9" id="fnref9">9</a></sup> and our conditioning of the likelihood function can similarly mislead us,<sup class="footnote-ref"><a href="#fn5" id="fnref5">5</a></sup> but it would be interesting to see if the results from this paper could be used to somehow improve the simulation of phylogenies as well.</p>
  <p>Finally, I think these results suggest that the way we construct our models (e.g., assuming constant extinction), or impose our priors for Bayesian models, are going to remain important for breaking ties along the flat parts in parameter space. There will always be a ridge on the likelihood surface if you permit all possible models and areas of parameter space, so when analyzing diversification rates, practitioners should always consider how to encode assumptions and biological knowledge into our diversification rate models, and justifying those assumptions and priors adequately.</p>
  <p>With all of the papers identifying the weaknesses of common analysis methods, it is easy to be discouraged about the state of comparative methods. I still recall the deep sense of despair at the first standalone Systematic Biology meeting in Ann Arbor, shortly after the Raboksy and Goldberg<sup class="footnote-ref"><a href="#fn2" id="fnref2">2</a></sup> manuscript was published and frightened the attendees into worrying whether we could do inference on phylogenies at all. I don’t think the alarmism and despair is really warranted, because as long as we are careful with how we analyze our data, justify the assumptions that are encoded into our models, and test for areas where our methods can be misleading, there is plenty of room to do great work with comparative methods.</p>
  <span style="display:none">
    <p><sup class="footnote-ref"><a href="#fn10" id="fnref10">10</a></sup></p>
  </span>
  <h2>References</h2>
  <section class="footnotes">
    <ol>
      <li id="fn1">
        <p>Louca, S. and Pennell, MW. (2020). Extant timetrees are consistent with a myriad of diversification histories. Nature. doi:<a href="https://www.nature.com/articles/s41586-020-2176-1">10.1038/s41586-020-2176-1</a> <a href="#fnref1" class="footnote-backref">↩</a></p>
      </li>
      <li id="fn2">
        <p>Rabosky, D. L., &amp; Goldberg, E. E. (2015). Model Inadequacy and Mistaken Inferences of Trait-Dependent Speciation. Systematic Biology, 64(2), 340–355. doi:<a href="https://doi.org/10.1093/sysbio/syu131">10.1093/sysbio/syu131</a> <a href="#fnref2" class="footnote-backref">↩</a></p>
      </li>
      <li id="fn3">
        <p>Nee, S., May, R. M., &amp; Harvey. P.H. (1994). The reconstructed evolutionary process. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 344(1309), 305–311. doi:<a href="https://doi.org/10.1098/rstb.1994.0068">10.1098/rstb.1994.0068</a> <a href="#fnref3" class="footnote-backref">↩</a></p>
      </li>
      <li id="fn4">
        <p>Rabosky, D. L. (2009). Extinction rates should not be estimated from molecular phylogenies. Evolution, 64(6), 1816–1824. doi:<a href="https://doi.org/10.1111/j.1558-5646.2009.00926.x">10.1111/j.1558-5646.2009.00926.x</a> <a href="#fnref4" class="footnote-backref">↩</a></p>
      </li>
      <li id="fn5">
        <p>Stadler, T. (2012). How Can We Improve Accuracy of Macroevolutionary Rate Estimates? Systematic Biology, 62(2), 321–329. doi:<a href="https://doi.org/10.1093/sysbio/sys073">10.1093/sysbio/sys073</a> <a href="#fnref5" class="footnote-backref">↩</a></p>
      </li>
      <li id="fn6">
        <p>Burin, G., Alencar, L. R. V., Chang, J., Alfaro, M. E., &amp; Quental, T. B. (2018). How Well Can We Estimate Diversity Dynamics for Clades in Diversity Decline? Systematic Biology, 68(1), 47–62. doi:<a href="https://doi.org/10.1093/sysbio/syy037">10.1093/sysbio/syy037</a> <a href="#fnref6" class="footnote-backref">↩</a></p>
      </li>
      <li id="fn7">
        <p>Foote, M., Sadler, P. M., Cooper, R. A., &amp; Crampton, J. S. (2019). Completeness of the known graptoloid palaeontological record. Journal of the Geological Society, jgs2019–061. doi:<a href="https://doi.org/10.1144/jgs2019-061">10.1144/jgs2019-061</a> <a href="#fnref7" class="footnote-backref">↩</a></p>
      </li>
      <li id="fn8">
        <p>Caetano, D. S., O’Meara, B. C., &amp; Beaulieu, J. M. (2018). Hidden state models improve state-dependent diversification approaches, including biogeographical models. Evolution, 72(11), 2308–2324. doi:<a href="https://doi.org/10.1111/evo.13602">10.1111/evo.13602</a> <a href="#fnref8" class="footnote-backref">↩</a></p>
      </li>
      <li id="fn9">
        <p>Hartmann, K., Wong, D., &amp; Stadler, T. (2010). Sampling Trees from Evolutionary Models. Systematic Biology, 59(4), 465–476. doi:<a href="https://doi.org/10.1093/sysbio/syq026">10.1093/sysbio/syq026</a> <a href="#fnref9" class="footnote-backref">↩</a></p>
      </li>
      <li id="fn10">
        <p>Thanks to James Saulsbury and Tomomi Parins-Fukuchi for suggestions that improved this blog post! <a href="#fnref10" class="footnote-backref">↩</a></p>
      </li>
    </ol>
  </section>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[Maintain your own Homebrew repository, with binary bottles]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/maintain-your-own-homebrew-repository-with-binary-bottles/"/>
  <id>https://jonathanchang.org/blog/maintain-your-own-homebrew-repository-with-binary-bottles</id>
  <published>2019-05-06T10:53:00+00:00</published>
  <updated>2019-05-06T10:53:00+00:00</updated>
  <content type="html"><![CDATA[
     <p>I’ve been using the <a href="https://brew.sh">Homebrew package manager</a> for nearly a decade. After a few years of contributions I was asked to become a maintainer and was recently elected to serve on the Homebrew project leadership committee. Homebrew is an excellent cross-platform package manager, supporting macOS, Linux, and Windows 10.</p>
  <p>That being said, Homebrew does not package everything. Many things are too niche, specialized, or <em>complicated</em> for the Homebrew maintainers to build and distribute. <a href="https://formulae.brew.sh/analytics/install-on-request/365d/">Homebrew has over a million installs</a>, yet has a small team of only about 20 volunteer maintainers who deal with this huge responsibility. Homebrew relies extremely heavily on its community to report and fix bugs that crop up in the packages that <em>they</em> use, since maintainers can’t be expected to rigorously check the correctness of 4,000+ packages.</p>
  <h2>Getting started with taps</h2>
  <p>Suppose that your software doesn’t fit the requirements for the main Homebrew repository, but you’d still like to distribute it somehow. Homebrew’s built-in packages can be extended using <a href="https://docs.brew.sh/How-to-Create-and-Maintain-a-Tap">third-party repositories, called taps</a>. Taps are just a GitHub repository with names that start with <code>homebrew-*</code> that contain some Homebrew formula files. Homebrew has a built-in command to get you up and running immediately, <code>brew tap-new</code>.</p>
  <div class="language-console highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="console"><span class="gp">$</span><span class="w"> </span>brew tap-new jonchang/biology
<span class="gp">$</span><span class="w"> </span><span class="nb">cd</span> <span class="si">$(</span>brew <span class="nt">--repo</span> jonchang/biology<span class="si">)</span>
<span class="gp">$</span><span class="w"> </span><span class="nb">pwd</span>
<span class="go">/usr/local/Homebrew/Library/Taps/jonchang/homebrew-biology
</span></code></pre>
    </div>
  </div>
  <p>Now I can create a new formula for this tap with <code>brew create --tap=jonchang/biology &lt;URL&gt;</code> and edit it in the usual manner. For help with this step, see the <a href="https://docs.brew.sh/Formula-Cookbook">Formula Cookbook</a>.</p>
  <p>If your software requires compilation, however, one problem is that your users will need to build your software from source. This might be fine if the build process is straightforward, but if not, you might find yourself spending a lot of time debugging obscure compiler failures for your users.</p>
  <h2>Bottling formulae for your tap</h2>
  <p>Homebrew can distribute precompiled binaries of your software, <a href="https://docs.brew.sh/Bottles">called <em>bottles</em></a>. By default, everything in the main Homebrew repository (<a href="https://github.com/Homebrew/homebrew-core">Homebrew/homebrew-core</a>) is bottled. This provides a superior user experience and saves time when installing software that takes a long time to compile (e.g., GCC).</p>
  <p>However, this process can be quite complicated and involved. <a href="https://gist.github.com/maelvls/068af21911c7debc4655cdaa41bbf092">This GitHub Gist</a> and the flowchart below (by <a href="https://github.com/maelvls/">@maelvls</a>) gives an excellent overview of the Homebrew/homebrew-core process, including its internal Jenkins installation and the pull request workflow for maintainers. While it is <em>possible</em> to set something up like this using Travis by following the linked Gist, for a tap where you are bottling only a few formula for software that are infrequently updated, I think it is overkill since you can easily spend far too much time figuring out why Travis has broken yet again.</p>
  <p><a href="https://gist.github.com/maelvls/068af21911c7debc4655cdaa41bbf092"><img src="/uploads/2019/homebrew/homebrew-core-workflow.png" alt="A flow chart showing the Homebrew/homebrew-core workflow for building binary packages (bottles). It is very complicated." /></a></p>
  <p>Here, I lay out instructions on how to compile binary bottles on your own machine, using a Docker Linux build, as well as a virtualized macOS installation.</p>
  <h3>Bintray setup</h3>
  <p><a href="https://bintray.com/signup/oss">Sign up for an Open Source Bintray account</a>. Homebrew uses Bintray to distribute its own binaries, and accounts are free for open source software.</p>
  <p>Next, set up the Bintray repository where your bottles will be uploaded. This should be named like <code>https://bintray.com/USER/bottles-TAP</code>. You may also want to set a default license for your Bintray repository, since the Bintray OSS plan requires that all packages you distribute via their service to also be open source software.</p>
  <p><img src="/uploads/2019/homebrew/1-bintray-repo.png" alt="Bintray repository creation screen. The name begins with 'bottles-' and the default license is set to BSD." srcset="/uploads/2019/homebrew/1-bintray-repo.png 2x"></p>
  <p>If you didn’t specify a default open source license, you’ll also want to <em>Add New Package</em> in your new repository and set the license field appropriately. The name of the package should match the name of the formula you’re building a bottle for.</p>
  <p><img src="/uploads/2019/homebrew/2-bintray-package.png" alt="Bintray package creation screen. The license is set to BSD." srcset="/uploads/2019/homebrew/2-bintray-package.png 2x"></p>
  <h3>Linux setup</h3>
  <p>Install Docker (<code>brew cask install docker</code>) and ensure the Docker service is running by opening <code>Docker.app</code> and checking the Docker icon in the menu bar. Then, run the following commands to download the Docker image for Homebrew on Linux, and enter an interactive shell within that Docker container:</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">docker pull homebrew/brew
docker run <span class="nt">-it</span> <span class="nt">--name</span><span class="o">=</span>brew homebrew/brew
</code></pre>
    </div>
  </div>
  <p>Even if you’re following this guide on Linux, I still recommend using the Docker image, as this ensures that the build system for the software you’re packaging will not link against something you’ve already installed on your machine. As Homebrew on Linux does not yet support sandboxed builds, it’s possible that opportunistic linking during the packaging process will cause problems for your users. Homebrew attempts to detect and warn for this, but using the Docker image is still safer.</p>
  <h3>macOS Setup</h3>
  <p>By default, the main Homebrew repository builds binary bottles for the last three macOS versions. Building bottles for all three systems is a lot of work, but you can mitigate this by building a single bottle for the oldest version of macOS that you intend to support, and <a href="https://github.com/Homebrew/brew/pull/5100">Homebrew will attempt to use that bottle for newer macOS versions</a>. This is not always guaranteed to work correctly, and you should do a few tests to ensure your software still runs, but it should reduce how many separate bottles you need to build.</p>
  <p>I’ll use macOS Sierra here, since that’s the oldest macOS that Homebrew supports at the time of this writing. You can run an older version of macOS, but note that the <a href="https://brew.sh/2019/02/02/homebrew-2.0.0/">Homebrew does not support OS X Mountain Lion (10.8) or older</a>, so building bottles for very old systems is not possible.</p>
  <p>I used <a href="https://medium.com/@twister.mr/installing-macos-to-virtualbox-1fcc5cf22801">this guide</a> to set up a Sierra VM up with VirtualBox (<code>brew cask install virtualbox</code>). You must be running Apple hardware in order to legally virtualize macOS (<a href="https://www.apple.com/legal/sla/docs/macOS1014.pdf#page=2">macOS 10.14 EULA, §2B(iii)</a>). Once macOS is set up, <a href="https://brew.sh">install Homebrew per the usual instructions</a>.</p>
  <p>If you’re already running macOS with a working Homebrew installation, you can also just build a bottle on your own system by following the same instructions.</p>
  <h3>Bottling for a single system</h3>
  <p>We’ll be using the <a href="https://github.com/Homebrew/homebrew-test-bot/">Homebrew Test Bot</a> for these commands. This is the same script that Homebrew uses for its own continuous integration infrastructure. You’ll need to specify a few arguments:</p>
  <p><code>--root-url</code> will point to your Bintray repository URL. This will usually start with <code>dl.bintray.com</code>; for example, mine is <code>https://dl.bintray.com/jonchang/bottles-biology</code>.</p>
  <p><code>--bintray-org</code> is your Bintray organization name; in this case, your Github name.</p>
  <p><code>--tap</code> is the short form of your repository; that is, if your repository is located at <a href="https://github.com/jonchang/homebrew-biology">https://github.com/jonchang/homebrew-biology</a>, the tap is <code>jonchang/biology</code>.</p>
  <p>The last argument will be the formula that you are building bottles for. You can specify multiple formulae here, but you should ensure that the fully-qualified name of your formula is used.</p>
  <p>To speed things up, you can also pass <code>--skip-setup</code> option to bypass some build environment tests.</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">brew test-bot <span class="nt">--root-url</span><span class="o">=</span>BINTRAY_ROOT <span class="nt">--bintray-org</span><span class="o">=</span>BINTRAY_USER <span class="nt">--tap</span><span class="o">=</span>USER/REPO USER/REPO/FORMULA
</code></pre>
    </div>
  </div>
  <p>The <code>test-bot</code> command will create several files related to the formula you are building: a <code>.bottle.tar.gz</code> bottle and a <code>.json</code> metadata file, named as <code>FORMULA--VERSION.OS</code>.</p>
  <h3>Bottling for multiple systems</h3>
  <p>If you are bottling for multiple systems (e.g., multiple versions of macOS, or macOS and Linux), first follow the <a href="#bottling-for-a-single-system">single system steps</a> for each system you’re packaging for. Then, copy over all <code>.json</code> files and <code>.bottle.tar.gz</code> files to a folder on a single machine. This should be a new folder created for this specific purpose and should only have the JSON and bottle tarballs.</p>
  <p>To copy from Docker, run:</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">docker <span class="nb">cp </span>brew:/home/linuxbrew/. <span class="nb">.</span>
</code></pre>
    </div>
  </div>
  <p>To copy from VirtualBox, run:</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">scp <span class="nt">-P2222</span> user@localhost:<span class="k">*</span>.<span class="o">{</span>json,tar.gz<span class="o">}</span> <span class="nb">.</span>
</code></pre>
    </div>
  </div>
  <p>You’ll want to check with <code>ls</code> that all of the files have the same name, save for the extension and the OS. One common problem here is that bottles will have different rebuild numbers. For example, a file might be named <code>formula--0.1.sierra.bottle.1.tar.gz</code>, implying that it is on <code>rebuild 1</code>, while another might be named <code>formula--0.1.mojave.bottle.tar.gz</code>, implying that it is on <code>rebuild 0</code>. You can fix this by renaming the file, and editing the JSON file to correct the rebuild field.</p>
  <h3>Uploading the bottles</h3>
  <p>Once you’re satified everything is correct, it’s time to upload your bottles to Bintray. You’ll need to specify your Bintray username and access token by setting the <code>HOMEBREW_BINTRAY_USER</code> and <code>HOMEBREW_BINTRAY_KEY</code> environment variables. Your key can be found by clicking <em>Edit Profile</em> -&gt; <em>API Key</em> when you are logged in to Bintray. Then, in the folder where your bottles and JSON files are, run:</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">brew pr-upload <span class="nt">--bintray-org</span><span class="o">=</span>BINTRAY_USER <span class="nt">--root-url</span><span class="o">=</span>BINTRAY_ROOT
</code></pre>
    </div>
  </div>
  <p>If you don’t want to publish the bottles right away, you can additionally pass <code>--no-publish</code> to <code>brew pr-upload</code> and Bintray will upload the files but keep them private. Once you’ve checked the bottles, you can go to your package on the Bintray website and hit “Publish”.</p>
  <p><img src="/uploads/2019/homebrew/3-bintray-publish.png" alt="Bintray unpublished package notice banner." srcset="/uploads/2019/homebrew/3-bintray-publish.png 2x"></p>
  <p>Now, from your tap repository, run <code>git log -u</code> to ensure that the <code>bottle do</code> block looks correct, then <code>git push</code> so your users will see the new bottles. Finally, <code>brew install FORMULA</code> to check that your bottle is getting downloaded and installed properly.</p>
  <h3>Cleaning up</h3>
  <ul>
    <li>Remove the temporary folder where you stored bottles</li>
    <li><code>docker rm brew</code> if you used Docker</li>
    <li><code>VBoxManage controlvm 'macOS Sierra' savestate</code> if you used VirtualBox</li>
  </ul>
  <h3>Updates</h3>
  <p>You’ll need to update your bottles in a few situations. If you release a new version of your software, update the <code>url</code> and <code>sha256</code> in your formula and, if needed, the <code>version</code> field. <a href="https://jonathanchang.org/blog/updating-homebrew-formulae-when-your-software-gets-a-new-version/">See <code>brew bump-formula-pr</code> for a quick way to do this</a>. When a library your formula <code>depends_on</code> has breaking changes (e.g., Boost), you’ll need to increment the <code>revision</code> number; if this isn’t already present in your formula, just add <code>revision 1</code>. You can remove this when your software gets a new version.</p>
  <p>In both cases, you’ll also want to remove the <code>bottle do</code> block (since your old bottles are now invalid), then <code>git commit</code> and <code>git push</code> your changes. At this point you can build new bottles as above.</p>
  <h3>Future work</h3>
  <p>There’s some work planned to get <code>brew test-bot</code> to support more third-party repositories with GitHub Actions, but this will take some time to get off the ground. Hopefully, this bottling experience will be much better in the future, but for now I think this is the solution that trades off ease of setup and ease of use.</p>
  <p>There’s a lot you can automate here; I recommend also writing down an “update playbook” that you place in the README of your tap so you don’t forget these steps when releasing new versions of your software. This blog post is, in fact, the guide for my personal workflow. Happy bottling! 🍺</p>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[Introducing fishtree and fishtreeoflife.org]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/announcing-fishtree-an-r-package-to-access-phylogenetic-data-for-ray-finned-fishes/"/>
  <id>https://jonathanchang.org/blog/announcing-fishtree-an-r-package-to-access-phylogenetic-data-for-ray-finned-fishes</id>
  <published>2019-03-29T00:00:00+00:00</published>
  <updated>2019-03-29T00:00:00+00:00</updated>
  <content type="html"><![CDATA[
     <p>In our recent publication (<a href="https://doi.org/10.1038/s41586-018-0273-1">Rabosky et al. 2018</a>) we assembled a huge phylogeny of ray-finned fishes: the most comprehensive to date! While all of our data are <a href="https://doi.org/10.5061/dryad.fc71cp4">accessible via Dryad</a>, we felt like we could go the extra mile to make it easy to repurpose and reuse our work. I’m pleased to report that this effort has resulted in two resources for the community: the <a href="https://fishtreeoflife.org">Fish Tree of Life website</a>, and the <a href="https://cran.r-project.org/package=fishtree"><strong>fishtree</strong> R package</a>. The package is available on CRAN now, and you can install it with:</p>
  <div class="language-r highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="r"><span class="n">install.packages</span><span class="p">(</span><span class="s2">"fishtree"</span><span class="p">)</span><span class="w">
</span></code></pre>
    </div>
  </div>
  <p>The source is on Github in the repository <a href="https://github.com/jonchang/fishtree">jonchang/fishtree</a>. The manuscript describing these resources has been published in <em>Methods in Ecology and Evolution</em> (<a href="https://doi.org/10.1111/2041-210X.13182">Chang et al. 2019</a>).</p>
  <picture>
    <source type="image/svg+xml" srcset="/uploads/2019/fishtree-manuscript-fig-s1.svg">
    <img src="/uploads/2019/fishtree-package.png" alt="Figure S1 from our manuscript showing areas on the fish tree of life of long branch attraction." srcset="/uploads/2019/fishtree-manuscript-fig-s1.png 2x">
  </picture>
  <h2>Website: fishtreeoflife.org</h2>
  <p>The Fish Tree of Life website is intended to serve as a quick resource for when you need to look up information about ray-finned fishes. There are two primary types of pages on the website: <a href="https://fishtreeoflife.org/taxonomy/"><em>taxonomy</em> pages</a> and <a href="https://fishtreeoflife.org/fossils/"><em>fossil</em> pages</a>.</p>
  <p><img src="/uploads/2019/fishtree-website.png" alt="A portion of the backbone phylogeny leading to the taxonomy pages." srcset="/uploads/2019/fishtree-website.png 2x"></p>
  <p>For example, the taxonomy page for <a href="https://fishtreeoflife.org/taxonomy/family/Acanthuridae/">Acanthuridae</a>, the surgeonfishes, indicates that our phylogeny sampled most of the species in this family, and that one fossil calibration was used to date this group. You’ll also see that all associated taxonomic ranks (both more inclusive and less inclusive, if applicable) are listed.</p>
  <p>The download links lead you to subsetted versions of the phylogeny and character matrix constructed for this group. If you’re only interested in the surgeonfishes, you don’t have to download the entire phylogeny to get what you’re interested in.</p>
  <p>The fossil section links to a single species, <a href="https://fishtreeoflife.org/fossils/proacanthurus-tenuis/"><em>Proacanthurus tenuis</em>†</a>, that was used to calibrate the crown age of Acanthuridae. Fossil pages will all list what taxon they calibrate, as well as the minimum age that fossil informs, the computed maximum age, the placement authority reference, the age authority reference, and the fossil locality.</p>
  <p>The computed maximum age is based on the WHETA algorithm and the outgroup sequence listed. If you’re interested in the details, consult the <a href="https://fishtreeoflife.org/methods/#fossil-calibrations">Methods § Fossil Calibration</a> page.</p>
  <p>One thing that we’re especially proud of is that the website is completely static and nearly all text, and loads lightning fast. This is to support researchers working in areas where fast Internet is not available. Our only concession to vanity is on the home page, derived from Figure 3 of the <em>Nature</em> manuscript. The fishes were illustrated by <a href="https://www.lifesciencestudios.com/">Julie Johnson</a>; clicking on the different fish will take you to the appropriate taxon page.</p>
  <h2>R package: fishtree</h2>
  <p>In addition to the website, we’ve also developed an R package that interfaces with the underlying data.</p>
  <div class="language-console?lang=r&output=plaintext&prompt=>&comments=true highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="console?lang=r&output=plaintext&prompt=>&comments=true"><span class="gp">&gt;</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">fishtree</span><span class="p">)</span><span class="w">
</span><span class="gp">&gt;</span><span class="w"> </span><span class="n">library</span><span class="p">(</span><span class="n">ape</span><span class="p">)</span><span class="w">
</span><span class="gp">&gt;</span><span class="w"> </span><span class="n">phy</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">fishtree_phylogeny</span><span class="p">(</span><span class="n">rank</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Acanthuridae"</span><span class="p">)</span><span class="w">
</span><span class="gp">&gt;</span><span class="w"> </span><span class="n">phy</span><span class="w">
</span>
Phylogenetic tree with 67 tips and 66 internal nodes.
<p>Tip labels:
Acanthurus_mata, Acanthurus_blochii, Acanthurus_xanthopterus, Acanthurus_bariene, Acanthurus_dussumieri, Acanthurus_leucocheilus, …</p>
<p>Rooted; includes branch lengths.</p>
<p><span class="gp">&gt;</span><span class="w"> </span><span class="n">par</span><span class="p">(</span><span class="n">mfrow</span><span class="o">=</span><span class="nf">c</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w">
</span><span class="gp">&gt;</span><span class="w"> </span><span class="n">plot</span><span class="p">(</span><span class="n">phy</span><span class="p">,</span><span class="w"> </span><span class="n">show.tip.label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span><span class="gp">&gt;</span><span class="w"> </span><span class="n">ltt.plot</span><span class="p">(</span><span class="n">phy</span><span class="p">)</span><span class="w">
</span></code></pre>
    </div>
  </div>
  <picture></p>
  <source type="image/svg+xml" srcset="/uploads/2019/fishtree-package.svg">
  <img src="/uploads/2019/fishtree-package.png" alt="A phylogeny of Acanthuridae with a lineage-through-time plot." srcset="/uploads/2019/fishtree-package.png 2x">
</picture>
<p>The R package permits easy access to downloads of the phylogeny, sequence alignments, and taxonomic information for the ray-finned fishes. Not only are the pre-computed per-taxon subsets available, but the relevant functions also accept a list of species and will subset the larger dataset to return a sequence matrix or phylogeny including only those species.</p>
<p>Our intent with the R package is to enable more complex analyses with this broad dataset. One example that we showcase in <a href="https://doi.org/10.1111/2041-210X.13182">our manuscript</a> reanalyzes portions of the fish tree of life with <a href="https://cme.h-its.org/exelixis/web/software/raxml/index.html">RAxML</a> and <a href="http://stat.sys.i.kyoto-u.ac.jp/prog/consel/">CONSEL</a> and other programs to search for areas that might have been affected by long branch attraction. The code and data for this analysis are <a href="https://doi.org/10.5061/dryad.6vg974n">available on Dryad</a>.</p>
<p>Two other analyses are presented in the supplement of the manuscript; a more accessible web version is available in the <a href="https://cran.r-project.org/package=fishtree">Vignettes section on CRAN</a>. These cover <a href="https://cran.r-project.org/web/packages/fishtree/vignettes/comparative-analysis.html">a comparative analysis</a> that replicates a previous experiment in the tetradontid fishes (<a href="https://doi.org/10.1111/jeb.12112">Santini et al. 2013</a>), and a <a href="https://cran.r-project.org/web/packages/fishtree/vignettes/community-analysis.html">community phylogenetics analysis</a> looking at community structure in reef-associated fishes.</p>
<p>You may have noticed that each taxonomy page on the website <a href="https://fishtreeoflife.org/api/taxonomy/family/Acanthuridae.json">links to a JSON API file</a>. The <strong>fishtree</strong> package consumes these JSON files under the hood; you are welcome to use these directly, but we can’t guarantee that they won’t change in the future.</p>
<h2>The future</h2>
<p>There’s still plenty to work on for both the R package and the website, as well as ray-finned fish phylogenetics in general. Our plans for the R package are to extend its functionality to include the fossil data we have on the website.</p>
<p>The website is essentially feature-complete: there’s a lot of polish that could be done, and certainly other data sources we could incorporate, or come up with new ways to slice the data for easy consumption. The most important feature on our list is the ability to switch between alternate topologies; for example, how would the taxon pages look if we used older fish phylogenies (e.g., <a href="https://doi.org/10.1038/ncomms2958">Rabosky et al. 2013</a>) instead?</p>
<p>These and other features will have to wait. I have a ton of papers to write and jobs to apply for. My promise to the community is that this website and R package will continue to be maintained as long as I’m involved in scientific research. If you’re interested in helping out, pull requests are welcome on either of the GitHub repositories!</p>
<h2>Feedback</h2>
<p>If you encounter any problems with the R package, please <a href="https://github.com/jonchang/fishtree/issues/new/choose">open an issue on Github</a>. If you spot any bugs with the website, please <a href="https://github.com/jonchang/fishtreeoflife.org/issues/new">open an issue on Github for the website</a>. Feature requests are welcome as well, but I can’t guarantee I’ll get around to implementing your suggestions.</p>
<p>Finally, if you spot errors in the phylogeny (e.g., a rogue taxon or something like that), please report it in this <a href="https://docs.google.com/forms/d/e/1FAIpQLSeyE_NT5WiQA3Er62ZJzIHrRnOP0ASzPYrh294Nr5pOm4kTDg/viewform">Google Form</a> so I can collate these kind of fixes in a future update.</p>
<p>I’m excited to see what kinds of research these resources will enable! If you publish a paper or other analysis with this, please send me an email or <a href="https://twitter.com/chang_jon">tweet at me</a> so I can check it out!</p>
]]></content>
</entry>
<entry>
  <title><![CDATA[What R package for phylogenetics is the most popular?]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/what-r-package-for-phylogenetics-is-the-most-popular/"/>
  <id>https://jonathanchang.org/blog/what-r-package-for-phylogenetics-is-the-most-popular</id>
  <published>2018-11-19T23:49:00+00:00</published>
  <updated>2018-11-19T23:49:00+00:00</updated>
  <content type="html"><![CDATA[
     <p>While writing my first R package and its associated manuscript, I needed to talk about some other R packages in the phylogenetics research community. The most obvious choice would be to just cite the ones that I actually use, but that doesn’t necessarily mean that other practicing phylogeneticists do the same. I needed to get some stats on which phylogenetics packages were actually popular, but luckily the R ecosystem has the tools to make this easy.</p>
  <p><a href="#table">Skip straight to the popularity table</a>.</p>
  <h2>Instructions</h2>
  <p>First, let’s install the <a href="https://cran.r-project.org/package=ctv">CRAN Task Views</a> package and the <a href="https://github.com/metacran/cranlogs">CRAN-logs</a> API package. We’ll use the development version of <code>cranlogs</code> since it hasn’t been updated on CRAN in a while and some stuff has changed.</p>
  <div class="language-r highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="r"><span class="n">install.packages</span><span class="p">(</span><span class="s2">"ctv"</span><span class="p">)</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"metacran/cranlogs"</span><span class="p">)</span><span class="w">
<p></span><span class="n">library</span><span class="p">(</span><span class="n">ctv</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">cranlogs</span><span class="p">)</span><span class="w">
</span></code></pre>
    </div>
  </div>
</p>
<p>We can get a list of all the CRAN task views using the <code>available.views</code> function. Annoyingly, there’s no way to filter and extract JUST the <a href="https://cran.r-project.org/view=Phylogenetics">Phylogenetics task view</a>, so we’ll have to write a short filter to extract it.<sup class="footnote-ref"><a href="#fn1" id="fnref1">1</a></sup></p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">available.views</span><span class="p">()</span><span class="w">
<p></span><span class="n">phylo_ctv</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">Filter</span><span class="p">(</span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">)</span><span class="w"> </span><span class="n">x</span><span class="o">$</span><span class="n">name</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">“Phylogenetics”</span><span class="p">,</span><span class="w"> </span><span class="n">available.views</span><span class="p">())[[</span><span class="m">1</span><span class="p">]]</span><span class="w">
</span></code></pre>
  </div>
</div>
</p>
<p>Now we can extract the list of packages that are associated with the “Phylogenetics” task view, and using that list of packages, query the CRAN-logs server to figure out the most popular phylogenetics packages in the last year.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">phylo_ctv</span><span class="o">$</span><span class="n">packagelist</span><span class="w">
</span><span class="n">phylo_packages</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">phylo_ctv</span><span class="o">$</span><span class="n">packagelist</span><span class="o">$</span><span class="n">name</span><span class="w">
<p></span><span class="n">output</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">cran_downloads</span><span class="p">(</span><span class="n">to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">“2018-10-01”</span><span class="p">,</span><span class="w"> </span><span class="n">from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">“2017-10-01”</span><span class="p">,</span><span class="w"> </span><span class="n">packages</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">phylo_packages</span><span class="p">)</span><span class="w">
</span><span class="n">head</span><span class="p">(</span><span class="n">output</span><span class="p">)</span><span class="w">
</span></code></pre>
  </div>
</div>
</p>
<p>Of course, what about the phylogenetics packages that aren’t in the Phylogenetics task view? Another way to view it is to consider if a package depends on the <code>ape</code> package. Pretty much every phylogenetics package will use ape in some form or another, so it might also be a good proxy of what a “phylo” package is.</p>
<p>Get a list of all the reverse dependencies of <code>ape</code> using <code>devtools</code>.<sup class="footnote-ref"><a href="#fn2" id="fnref2">2</a></sup></p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">revdep_packages</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">devtools</span><span class="o">::</span><span class="n">revdep</span><span class="p">(</span><span class="s2">"ape"</span><span class="p">)</span><span class="w">
</span><span class="n">all_phylo_packages</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">unique</span><span class="p">(</span><span class="nf">c</span><span class="p">(</span><span class="n">phylo_packages</span><span class="p">,</span><span class="w"> </span><span class="n">revdep_packages</span><span class="p">))</span><span class="w">
</span><span class="n">output2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">cran_downloads</span><span class="p">(</span><span class="n">to</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"2018-10-01"</span><span class="p">,</span><span class="w"> </span><span class="n">from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"2017-10-01"</span><span class="p">,</span><span class="w"> </span><span class="n">packages</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">all_phylo_packages</span><span class="p">)</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>By default, the output from cranlogs is the number of downloads for a given package on a given date. We want to sum up all of these counts so we have a total number of downloads per package.</p>
<p>Aggregate these data using <code>dplyr</code>.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">dplyr</span><span class="p">)</span><span class="w">
<p></span><span class="n">output2</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">group_by</span><span class="p">(</span><span class="n">package</span><span class="p">)</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">summarise</span><span class="p">(</span><span class="n">downloads</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">sum</span><span class="p">(</span><span class="n">count</span><span class="p">))</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">arrange</span><span class="p">(</span><span class="o">-</span><span class="n">downloads</span><span class="p">)</span><span class="w">
</span></code></pre>
  </div>
</div>
</p>
<h2>Exercises</h2>
<ol>
  <li>
    <p>Filter the table to only include packages in both the CRAN task view and reverse dependencies list. (This will exclude e.g., <code>ggplot2</code> and other arguably-peripheral packages.)</p>
  </li>
  <li>
    <p>Use the <a href="https://r4ds.had.co.nz/dates-and-times.html"><code>lubridate</code> package</a> to find the ten most popular packages by year. (The CRAN logs go back to October 2012.)</p>
  </li>
  <li>
    <p>Check out Brian O’Meara’s <a href="https://github.com/bomeara/summarizetaskview/blob/fd990fb7a19cf03cb0c5da9d87c1c808534658cc/README.md">excellent work showing how popular phylogenetics packages changed over time</a>!</p>
  </li>
</ol>
<h2>Table</h2>
<p>Here’s the full version of the table. There’s some packages in here that are only peripherally associated with phylogenetics, but it gives a good picture of what the state of the field looks like. I’ve also annotated each package with which list it came from, the CRAN Task View list or the reverse dependencies list.</p>
<!--
output2 %>% group_by(package) %>% summarise(downloads = sum(count)) %>% mutate(in_ctv = package %in% phylo_packages, in_revdep = package %in% revdep_packages) %>% arrange(-downloads) %>% transmute(rank = row_number(), package = paste0("[", package, "](https://cran.r-project.org/package=", package, ")"), downloads, in_ctv, in_revdep) %>% knitr::kable()
-->
<table>
  <thead>
    <tr>
      <th align="right"></th>
      <th align="left">Package</th>
      <th align="right">Downloads</th>
      <th align="left">CTV?</th>
      <th align="left">Revdep?</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td align="right">1</td>
      <td align="left"><a href="https://cran.r-project.org/package=ggplot2">ggplot2</a></td>
      <td align="right">5624177</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">2</td>
      <td align="left"><a href="https://cran.r-project.org/package=igraph">igraph</a></td>
      <td align="right">1248409</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">3</td>
      <td align="left"><a href="https://cran.r-project.org/package=dendextend">dendextend</a></td>
      <td align="right">440772</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">4</td>
      <td align="left"><a href="https://cran.r-project.org/package=ape">ape</a></td>
      <td align="right">433337</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">5</td>
      <td align="left"><a href="https://cran.r-project.org/package=vegan">vegan</a></td>
      <td align="right">426398</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">6</td>
      <td align="left"><a href="https://cran.r-project.org/package=ade4">ade4</a></td>
      <td align="right">336755</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">7</td>
      <td align="left"><a href="https://cran.r-project.org/package=brms">brms</a></td>
      <td align="right">90632</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">8</td>
      <td align="left"><a href="https://cran.r-project.org/package=phangorn">phangorn</a></td>
      <td align="right">78319</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">9</td>
      <td align="left"><a href="https://cran.r-project.org/package=adegenet">adegenet</a></td>
      <td align="right">68176</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">10</td>
      <td align="left"><a href="https://cran.r-project.org/package=metafor">metafor</a></td>
      <td align="right">64462</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">11</td>
      <td align="left"><a href="https://cran.r-project.org/package=data.tree">data.tree</a></td>
      <td align="right">60224</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">12</td>
      <td align="left"><a href="https://cran.r-project.org/package=Seurat">Seurat</a></td>
      <td align="right">55905</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">13</td>
      <td align="left"><a href="https://cran.r-project.org/package=MCMCglmm">MCMCglmm</a></td>
      <td align="right">48496</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">14</td>
      <td align="left"><a href="https://cran.r-project.org/package=phytools">phytools</a></td>
      <td align="right">43721</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">15</td>
      <td align="left"><a href="https://cran.r-project.org/package=HSAUR2">HSAUR2</a></td>
      <td align="right">40079</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">16</td>
      <td align="left"><a href="https://cran.r-project.org/package=HSAUR">HSAUR</a></td>
      <td align="right">38960</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">17</td>
      <td align="left"><a href="https://cran.r-project.org/package=taxize">taxize</a></td>
      <td align="right">34790</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">18</td>
      <td align="left"><a href="https://cran.r-project.org/package=rncl">rncl</a></td>
      <td align="right">31910</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">19</td>
      <td align="left"><a href="https://cran.r-project.org/package=aqp">aqp</a></td>
      <td align="right">30681</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">20</td>
      <td align="left"><a href="https://cran.r-project.org/package=pegas">pegas</a></td>
      <td align="right">29424</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">21</td>
      <td align="left"><a href="https://cran.r-project.org/package=RNeXML">RNeXML</a></td>
      <td align="right">29278</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">22</td>
      <td align="left"><a href="https://cran.r-project.org/package=rotl">rotl</a></td>
      <td align="right">28502</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">23</td>
      <td align="left"><a href="https://cran.r-project.org/package=picante">picante</a></td>
      <td align="right">26858</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">24</td>
      <td align="left"><a href="https://cran.r-project.org/package=geiger">geiger</a></td>
      <td align="right">26292</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">25</td>
      <td align="left"><a href="https://cran.r-project.org/package=phylobase">phylobase</a></td>
      <td align="right">25190</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">26</td>
      <td align="left"><a href="https://cran.r-project.org/package=HSAUR3">HSAUR3</a></td>
      <td align="right">24462</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">27</td>
      <td align="left"><a href="https://cran.r-project.org/package=FD">FD</a></td>
      <td align="right">23007</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">28</td>
      <td align="left"><a href="https://cran.r-project.org/package=EpiModel">EpiModel</a></td>
      <td align="right">21486</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">29</td>
      <td align="left"><a href="https://cran.r-project.org/package=adephylo">adephylo</a></td>
      <td align="right">20348</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">30</td>
      <td align="left"><a href="https://cran.r-project.org/package=poppr">poppr</a></td>
      <td align="right">19874</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">31</td>
      <td align="left"><a href="https://cran.r-project.org/package=vcfR">vcfR</a></td>
      <td align="right">19346</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">32</td>
      <td align="left"><a href="https://cran.r-project.org/package=geomorph">geomorph</a></td>
      <td align="right">18793</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">33</td>
      <td align="left"><a href="https://cran.r-project.org/package=adespatial">adespatial</a></td>
      <td align="right">18084</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">34</td>
      <td align="left"><a href="https://cran.r-project.org/package=ggimage">ggimage</a></td>
      <td align="right">16587</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">35</td>
      <td align="left"><a href="https://cran.r-project.org/package=BoSSA">BoSSA</a></td>
      <td align="right">15993</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">36</td>
      <td align="left"><a href="https://cran.r-project.org/package=asnipe">asnipe</a></td>
      <td align="right">14109</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">37</td>
      <td align="left"><a href="https://cran.r-project.org/package=hierfstat">hierfstat</a></td>
      <td align="right">13967</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">38</td>
      <td align="left"><a href="https://cran.r-project.org/package=caper">caper</a></td>
      <td align="right">13847</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">39</td>
      <td align="left"><a href="https://cran.r-project.org/package=DDD">DDD</a></td>
      <td align="right">11690</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">40</td>
      <td align="left"><a href="https://cran.r-project.org/package=tidygraph">tidygraph</a></td>
      <td align="right">11612</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">41</td>
      <td align="left"><a href="https://cran.r-project.org/package=DHARMa">DHARMa</a></td>
      <td align="right">11596</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">42</td>
      <td align="left"><a href="https://cran.r-project.org/package=paleotree">paleotree</a></td>
      <td align="right">11359</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">43</td>
      <td align="left"><a href="https://cran.r-project.org/package=betapart">betapart</a></td>
      <td align="right">11111</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">44</td>
      <td align="left"><a href="https://cran.r-project.org/package=polysat">polysat</a></td>
      <td align="right">10831</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">45</td>
      <td align="left"><a href="https://cran.r-project.org/package=phyclust">phyclust</a></td>
      <td align="right">10631</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">46</td>
      <td align="left"><a href="https://cran.r-project.org/package=MVA">MVA</a></td>
      <td align="right">10280</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">47</td>
      <td align="left"><a href="https://cran.r-project.org/package=GUniFrac">GUniFrac</a></td>
      <td align="right">9007</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">48</td>
      <td align="left"><a href="https://cran.r-project.org/package=enveomics.R">enveomics.R</a></td>
      <td align="right">8696</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">49</td>
      <td align="left"><a href="https://cran.r-project.org/package=AbSim">AbSim</a></td>
      <td align="right">8432</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">50</td>
      <td align="left"><a href="https://cran.r-project.org/package=stylo">stylo</a></td>
      <td align="right">8336</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">51</td>
      <td align="left"><a href="https://cran.r-project.org/package=phylolm">phylolm</a></td>
      <td align="right">8172</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">52</td>
      <td align="left"><a href="https://cran.r-project.org/package=apTreeshape">apTreeshape</a></td>
      <td align="right">8101</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">53</td>
      <td align="left"><a href="https://cran.r-project.org/package=BioGeoBEARS">BioGeoBEARS</a></td>
      <td align="right">8007</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">54</td>
      <td align="left"><a href="https://cran.r-project.org/package=expands">expands</a></td>
      <td align="right">7764</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">55</td>
      <td align="left"><a href="https://cran.r-project.org/package=mvMORPH">mvMORPH</a></td>
      <td align="right">7579</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">56</td>
      <td align="left"><a href="https://cran.r-project.org/package=BAMMtools">BAMMtools</a></td>
      <td align="right">7344</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">57</td>
      <td align="left"><a href="https://cran.r-project.org/package=sand">sand</a></td>
      <td align="right">7014</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">58</td>
      <td align="left"><a href="https://cran.r-project.org/package=diversitree">diversitree</a></td>
      <td align="right">6946</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">59</td>
      <td align="left"><a href="https://cran.r-project.org/package=homals">homals</a></td>
      <td align="right">6933</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">60</td>
      <td align="left"><a href="https://cran.r-project.org/package=tidytree">tidytree</a></td>
      <td align="right">6378</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">61</td>
      <td align="left"><a href="https://cran.r-project.org/package=convevol">convevol</a></td>
      <td align="right">6351</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">62</td>
      <td align="left"><a href="https://cran.r-project.org/package=ecospat">ecospat</a></td>
      <td align="right">6320</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">63</td>
      <td align="left"><a href="https://cran.r-project.org/package=entropart">entropart</a></td>
      <td align="right">6299</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">64</td>
      <td align="left"><a href="https://cran.r-project.org/package=phylotools">phylotools</a></td>
      <td align="right">6182</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">65</td>
      <td align="left"><a href="https://cran.r-project.org/package=rmetasim">rmetasim</a></td>
      <td align="right">6064</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">66</td>
      <td align="left"><a href="https://cran.r-project.org/package=rphast">rphast</a></td>
      <td align="right">5890</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">67</td>
      <td align="left"><a href="https://cran.r-project.org/package=corHMM">corHMM</a></td>
      <td align="right">5836</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">68</td>
      <td align="left"><a href="https://cran.r-project.org/package=apex">apex</a></td>
      <td align="right">5750</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">69</td>
      <td align="left"><a href="https://cran.r-project.org/package=bayou">bayou</a></td>
      <td align="right">5686</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">70</td>
      <td align="left"><a href="https://cran.r-project.org/package=cati">cati</a></td>
      <td align="right">5572</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">71</td>
      <td align="left"><a href="https://cran.r-project.org/package=ouch">ouch</a></td>
      <td align="right">5520</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">72</td>
      <td align="left"><a href="https://cran.r-project.org/package=hisse">hisse</a></td>
      <td align="right">5267</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">73</td>
      <td align="left"><a href="https://cran.r-project.org/package=phyloclim">phyloclim</a></td>
      <td align="right">5223</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">74</td>
      <td align="left"><a href="https://cran.r-project.org/package=rdryad">rdryad</a></td>
      <td align="right">5188</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">75</td>
      <td align="left"><a href="https://cran.r-project.org/package=dartR">dartR</a></td>
      <td align="right">5169</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">76</td>
      <td align="left"><a href="https://cran.r-project.org/package=SYNCSA">SYNCSA</a></td>
      <td align="right">5102</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">77</td>
      <td align="left"><a href="https://cran.r-project.org/package=OutbreakTools">OutbreakTools</a></td>
      <td align="right">5057</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">78</td>
      <td align="left"><a href="https://cran.r-project.org/package=TreeSim">TreeSim</a></td>
      <td align="right">4988</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">79</td>
      <td align="left"><a href="https://cran.r-project.org/package=ALA4R">ALA4R</a></td>
      <td align="right">4943</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">80</td>
      <td align="left"><a href="https://cran.r-project.org/package=ips">ips</a></td>
      <td align="right">4873</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">81</td>
      <td align="left"><a href="https://cran.r-project.org/package=PCPS">PCPS</a></td>
      <td align="right">4858</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">82</td>
      <td align="left"><a href="https://cran.r-project.org/package=metacoder">metacoder</a></td>
      <td align="right">4796</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">83</td>
      <td align="left"><a href="https://cran.r-project.org/package=OUwie">OUwie</a></td>
      <td align="right">4792</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">84</td>
      <td align="left"><a href="https://cran.r-project.org/package=aphid">aphid</a></td>
      <td align="right">4791</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">85</td>
      <td align="left"><a href="https://cran.r-project.org/package=brranching">brranching</a></td>
      <td align="right">4670</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">86</td>
      <td align="left"><a href="https://cran.r-project.org/package=warbleR">warbleR</a></td>
      <td align="right">4645</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">87</td>
      <td align="left"><a href="https://cran.r-project.org/package=MPSEM">MPSEM</a></td>
      <td align="right">4639</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">88</td>
      <td align="left"><a href="https://cran.r-project.org/package=adhoc">adhoc</a></td>
      <td align="right">4599</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">89</td>
      <td align="left"><a href="https://cran.r-project.org/package=distory">distory</a></td>
      <td align="right">4578</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">90</td>
      <td align="left"><a href="https://cran.r-project.org/package=Momocs">Momocs</a></td>
      <td align="right">4443</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">91</td>
      <td align="left"><a href="https://cran.r-project.org/package=phyloTop">phyloTop</a></td>
      <td align="right">4426</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">92</td>
      <td align="left"><a href="https://cran.r-project.org/package=ggmuller">ggmuller</a></td>
      <td align="right">4403</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">93</td>
      <td align="left"><a href="https://cran.r-project.org/package=paleoTS">paleoTS</a></td>
      <td align="right">4392</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">94</td>
      <td align="left"><a href="https://cran.r-project.org/package=BIEN">BIEN</a></td>
      <td align="right">4380</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">95</td>
      <td align="left"><a href="https://cran.r-project.org/package=HTSSIP">HTSSIP</a></td>
      <td align="right">4198</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">96</td>
      <td align="left"><a href="https://cran.r-project.org/package=strap">strap</a></td>
      <td align="right">4153</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">97</td>
      <td align="left"><a href="https://cran.r-project.org/package=nodiv">nodiv</a></td>
      <td align="right">4103</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">98</td>
      <td align="left"><a href="https://cran.r-project.org/package=BPEC">BPEC</a></td>
      <td align="right">4095</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">99</td>
      <td align="left"><a href="https://cran.r-project.org/package=scrm">scrm</a></td>
      <td align="right">4084</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">100</td>
      <td align="left"><a href="https://cran.r-project.org/package=FinePop">FinePop</a></td>
      <td align="right">4031</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">101</td>
      <td align="left"><a href="https://cran.r-project.org/package=idendr0">idendr0</a></td>
      <td align="right">4020</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">102</td>
      <td align="left"><a href="https://cran.r-project.org/package=HMPTrees">HMPTrees</a></td>
      <td align="right">4015</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">103</td>
      <td align="left"><a href="https://cran.r-project.org/package=PHYLOGR">PHYLOGR</a></td>
      <td align="right">4011</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">104</td>
      <td align="left"><a href="https://cran.r-project.org/package=evobiR">evobiR</a></td>
      <td align="right">4000</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">105</td>
      <td align="left"><a href="https://cran.r-project.org/package=outbreaker">outbreaker</a></td>
      <td align="right">3931</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">106</td>
      <td align="left"><a href="https://cran.r-project.org/package=nLTT">nLTT</a></td>
      <td align="right">3925</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">107</td>
      <td align="left"><a href="https://cran.r-project.org/package=kmer">kmer</a></td>
      <td align="right">3891</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">108</td>
      <td align="left"><a href="https://cran.r-project.org/package=markophylo">markophylo</a></td>
      <td align="right">3885</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">109</td>
      <td align="left"><a href="https://cran.r-project.org/package=DAMOCLES">DAMOCLES</a></td>
      <td align="right">3879</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">110</td>
      <td align="left"><a href="https://cran.r-project.org/package=jaatha">jaatha</a></td>
      <td align="right">3861</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">111</td>
      <td align="left"><a href="https://cran.r-project.org/package=TESS">TESS</a></td>
      <td align="right">3860</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">112</td>
      <td align="left"><a href="https://cran.r-project.org/package=SigTree">SigTree</a></td>
      <td align="right">3823</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">113</td>
      <td align="left"><a href="https://cran.r-project.org/package=strataG">strataG</a></td>
      <td align="right">3786</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">114</td>
      <td align="left"><a href="https://cran.r-project.org/package=treeplyr">treeplyr</a></td>
      <td align="right">3732</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">115</td>
      <td align="left"><a href="https://cran.r-project.org/package=phylogram">phylogram</a></td>
      <td align="right">3726</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">116</td>
      <td align="left"><a href="https://cran.r-project.org/package=treebase">treebase</a></td>
      <td align="right">3724</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">117</td>
      <td align="left"><a href="https://cran.r-project.org/package=pmc">pmc</a></td>
      <td align="right">3717</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">118</td>
      <td align="left"><a href="https://cran.r-project.org/package=surface">surface</a></td>
      <td align="right">3701</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">119</td>
      <td align="left"><a href="https://cran.r-project.org/package=gamclass">gamclass</a></td>
      <td align="right">3648</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">120</td>
      <td align="left"><a href="https://cran.r-project.org/package=TreePar">TreePar</a></td>
      <td align="right">3647</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">121</td>
      <td align="left"><a href="https://cran.r-project.org/package=PBD">PBD</a></td>
      <td align="right">3591</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">122</td>
      <td align="left"><a href="https://cran.r-project.org/package=RAM">RAM</a></td>
      <td align="right">3546</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">123</td>
      <td align="left"><a href="https://cran.r-project.org/package=Rphylip">Rphylip</a></td>
      <td align="right">3506</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">124</td>
      <td align="left"><a href="https://cran.r-project.org/package=expoTree">expoTree</a></td>
      <td align="right">3447</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">125</td>
      <td align="left"><a href="https://cran.r-project.org/package=HyPhy">HyPhy</a></td>
      <td align="right">3419</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">126</td>
      <td align="left"><a href="https://cran.r-project.org/package=adiv">adiv</a></td>
      <td align="right">3393</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">127</td>
      <td align="left"><a href="https://cran.r-project.org/package=coalescentMCMC">coalescentMCMC</a></td>
      <td align="right">3384</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">128</td>
      <td align="left"><a href="https://cran.r-project.org/package=kdetrees">kdetrees</a></td>
      <td align="right">3330</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">129</td>
      <td align="left"><a href="https://cran.r-project.org/package=adaptiveGPCA">adaptiveGPCA</a></td>
      <td align="right">3325</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">130</td>
      <td align="left"><a href="https://cran.r-project.org/package=MAGNAMWAR">MAGNAMWAR</a></td>
      <td align="right">3324</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">131</td>
      <td align="left"><a href="https://cran.r-project.org/package=phylocanvas">phylocanvas</a></td>
      <td align="right">3302</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">132</td>
      <td align="left"><a href="https://cran.r-project.org/package=iteRates">iteRates</a></td>
      <td align="right">3301</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">133</td>
      <td align="left"><a href="https://cran.r-project.org/package=BBMV">BBMV</a></td>
      <td align="right">3295</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">134</td>
      <td align="left"><a href="https://cran.r-project.org/package=CommEcol">CommEcol</a></td>
      <td align="right">3246</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">135</td>
      <td align="left"><a href="https://cran.r-project.org/package=netdiffuseR">netdiffuseR</a></td>
      <td align="right">3191</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">136</td>
      <td align="left"><a href="https://cran.r-project.org/package=pastis">pastis</a></td>
      <td align="right">3155</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">137</td>
      <td align="left"><a href="https://cran.r-project.org/package=AnnotationBustR">AnnotationBustR</a></td>
      <td align="right">3148</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">138</td>
      <td align="left"><a href="https://cran.r-project.org/package=phyloland">phyloland</a></td>
      <td align="right">3129</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">139</td>
      <td align="left"><a href="https://cran.r-project.org/package=phyext2">phyext2</a></td>
      <td align="right">3119</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">140</td>
      <td align="left"><a href="https://cran.r-project.org/package=Canopy">Canopy</a></td>
      <td align="right">2992</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">141</td>
      <td align="left"><a href="https://cran.r-project.org/package=RPANDA">RPANDA</a></td>
      <td align="right">2952</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">142</td>
      <td align="left"><a href="https://cran.r-project.org/package=BMhyb">BMhyb</a></td>
      <td align="right">2935</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">143</td>
      <td align="left"><a href="https://cran.r-project.org/package=phylopath">phylopath</a></td>
      <td align="right">2912</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">144</td>
      <td align="left"><a href="https://cran.r-project.org/package=GLSME">GLSME</a></td>
      <td align="right">2846</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">145</td>
      <td align="left"><a href="https://cran.r-project.org/package=phylotate">phylotate</a></td>
      <td align="right">2846</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">146</td>
      <td align="left"><a href="https://cran.r-project.org/package=phylosignal">phylosignal</a></td>
      <td align="right">2820</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">147</td>
      <td align="left"><a href="https://cran.r-project.org/package=shazam">shazam</a></td>
      <td align="right">2754</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">148</td>
      <td align="left"><a href="https://cran.r-project.org/package=harrietr">harrietr</a></td>
      <td align="right">2707</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">149</td>
      <td align="left"><a href="https://cran.r-project.org/package=prioritizr">prioritizr</a></td>
      <td align="right">2705</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">150</td>
      <td align="left"><a href="https://cran.r-project.org/package=BarcodingR">BarcodingR</a></td>
      <td align="right">2657</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">151</td>
      <td align="left"><a href="https://cran.r-project.org/package=msaR">msaR</a></td>
      <td align="right">2552</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">152</td>
      <td align="left"><a href="https://cran.r-project.org/package=mvSLOUCH">mvSLOUCH</a></td>
      <td align="right">2522</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">153</td>
      <td align="left"><a href="https://cran.r-project.org/package=bcRep">bcRep</a></td>
      <td align="right">2472</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">154</td>
      <td align="left"><a href="https://cran.r-project.org/package=colordistance">colordistance</a></td>
      <td align="right">2448</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">155</td>
      <td align="left"><a href="https://cran.r-project.org/package=treeman">treeman</a></td>
      <td align="right">2417</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">156</td>
      <td align="left"><a href="https://cran.r-project.org/package=sharpshootR">sharpshootR</a></td>
      <td align="right">2409</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">157</td>
      <td align="left"><a href="https://cran.r-project.org/package=BMhyd">BMhyd</a></td>
      <td align="right">2408</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">158</td>
      <td align="left"><a href="https://cran.r-project.org/package=aptg">aptg</a></td>
      <td align="right">2385</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">159</td>
      <td align="left"><a href="https://cran.r-project.org/package=qlcData">qlcData</a></td>
      <td align="right">2383</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">160</td>
      <td align="left"><a href="https://cran.r-project.org/package=sensiPhy">sensiPhy</a></td>
      <td align="right">2344</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">161</td>
      <td align="left"><a href="https://cran.r-project.org/package=GrammR">GrammR</a></td>
      <td align="right">2229</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">162</td>
      <td align="left"><a href="https://cran.r-project.org/package=treespace">treespace</a></td>
      <td align="right">2219</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">163</td>
      <td align="left"><a href="https://cran.r-project.org/package=dispRity">dispRity</a></td>
      <td align="right">2217</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">164</td>
      <td align="left"><a href="https://cran.r-project.org/package=metricTester">metricTester</a></td>
      <td align="right">2205</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">165</td>
      <td align="left"><a href="https://cran.r-project.org/package=evolqg">evolqg</a></td>
      <td align="right">2181</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">166</td>
      <td align="left"><a href="https://cran.r-project.org/package=geomedb">geomedb</a></td>
      <td align="right">2160</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">167</td>
      <td align="left"><a href="https://cran.r-project.org/package=PhyloMeasures">PhyloMeasures</a></td>
      <td align="right">2139</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">168</td>
      <td align="left"><a href="https://cran.r-project.org/package=CNull">CNull</a></td>
      <td align="right">2107</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">169</td>
      <td align="left"><a href="https://cran.r-project.org/package=taxlist">taxlist</a></td>
      <td align="right">2098</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">170</td>
      <td align="left"><a href="https://cran.r-project.org/package=pez">pez</a></td>
      <td align="right">2079</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">171</td>
      <td align="left"><a href="https://cran.r-project.org/package=phyreg">phyreg</a></td>
      <td align="right">2061</td>
      <td align="left">✅</td>
      <td align="left">🚫</td>
    </tr>
    <tr>
      <td align="right">172</td>
      <td align="left"><a href="https://cran.r-project.org/package=structSSI">structSSI</a></td>
      <td align="right">2047</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">173</td>
      <td align="left"><a href="https://cran.r-project.org/package=MiSPU">MiSPU</a></td>
      <td align="right">2039</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">174</td>
      <td align="left"><a href="https://cran.r-project.org/package=dcGOR">dcGOR</a></td>
      <td align="right">2031</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">175</td>
      <td align="left"><a href="https://cran.r-project.org/package=lefse">lefse</a></td>
      <td align="right">2020</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">176</td>
      <td align="left"><a href="https://cran.r-project.org/package=SeqFeatR">SeqFeatR</a></td>
      <td align="right">2006</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">177</td>
      <td align="left"><a href="https://cran.r-project.org/package=HAP.ROR">HAP.ROR</a></td>
      <td align="right">1970</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">178</td>
      <td align="left"><a href="https://cran.r-project.org/package=symmoments">symmoments</a></td>
      <td align="right">1932</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">179</td>
      <td align="left"><a href="https://cran.r-project.org/package=genBaRcode">genBaRcode</a></td>
      <td align="right">1923</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">180</td>
      <td align="left"><a href="https://cran.r-project.org/package=PhylogeneticEM">PhylogeneticEM</a></td>
      <td align="right">1912</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">181</td>
      <td align="left"><a href="https://cran.r-project.org/package=windex">windex</a></td>
      <td align="right">1900</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">182</td>
      <td align="left"><a href="https://cran.r-project.org/package=phylocurve">phylocurve</a></td>
      <td align="right">1875</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">183</td>
      <td align="left"><a href="https://cran.r-project.org/package=MonoPhy">MonoPhy</a></td>
      <td align="right">1865</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">184</td>
      <td align="left"><a href="https://cran.r-project.org/package=TreeSimGM">TreeSimGM</a></td>
      <td align="right">1844</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">185</td>
      <td align="left"><a href="https://cran.r-project.org/package=spider">spider</a></td>
      <td align="right">1840</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">186</td>
      <td align="left"><a href="https://cran.r-project.org/package=Rsampletrees">Rsampletrees</a></td>
      <td align="right">1837</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">187</td>
      <td align="left"><a href="https://cran.r-project.org/package=Rphylopars">Rphylopars</a></td>
      <td align="right">1830</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">188</td>
      <td align="left"><a href="https://cran.r-project.org/package=graphscan">graphscan</a></td>
      <td align="right">1829</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">189</td>
      <td align="left"><a href="https://cran.r-project.org/package=recluster">recluster</a></td>
      <td align="right">1811</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">190</td>
      <td align="left"><a href="https://cran.r-project.org/package=paco">paco</a></td>
      <td align="right">1804</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">191</td>
      <td align="left"><a href="https://cran.r-project.org/package=phylosim">phylosim</a></td>
      <td align="right">1768</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">192</td>
      <td align="left"><a href="https://cran.r-project.org/package=ecolottery">ecolottery</a></td>
      <td align="right">1723</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">193</td>
      <td align="left"><a href="https://cran.r-project.org/package=outbreaker2">outbreaker2</a></td>
      <td align="right">1708</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">194</td>
      <td align="left"><a href="https://cran.r-project.org/package=STEPCAM">STEPCAM</a></td>
      <td align="right">1697</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">195</td>
      <td align="left"><a href="https://cran.r-project.org/package=primerTree">primerTree</a></td>
      <td align="right">1691</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">196</td>
      <td align="left"><a href="https://cran.r-project.org/package=PhySortR">PhySortR</a></td>
      <td align="right">1675</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">197</td>
      <td align="left"><a href="https://cran.r-project.org/package=gquad">gquad</a></td>
      <td align="right">1673</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">198</td>
      <td align="left"><a href="https://cran.r-project.org/package=gromovlab">gromovlab</a></td>
      <td align="right">1669</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">199</td>
      <td align="left"><a href="https://cran.r-project.org/package=indelmiss">indelmiss</a></td>
      <td align="right">1666</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">200</td>
      <td align="left"><a href="https://cran.r-project.org/package=phybreak">phybreak</a></td>
      <td align="right">1662</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">201</td>
      <td align="left"><a href="https://cran.r-project.org/package=msap">msap</a></td>
      <td align="right">1659</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">202</td>
      <td align="left"><a href="https://cran.r-project.org/package=rase">rase</a></td>
      <td align="right">1629</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">203</td>
      <td align="left"><a href="https://cran.r-project.org/package=rdiversity">rdiversity</a></td>
      <td align="right">1629</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">204</td>
      <td align="left"><a href="https://cran.r-project.org/package=perspectev">perspectev</a></td>
      <td align="right">1617</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">205</td>
      <td align="left"><a href="https://cran.r-project.org/package=ML.MSBD">ML.MSBD</a></td>
      <td align="right">1614</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">206</td>
      <td align="left"><a href="https://cran.r-project.org/package=sidier">sidier</a></td>
      <td align="right">1611</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">207</td>
      <td align="left"><a href="https://cran.r-project.org/package=pcrcoal">pcrcoal</a></td>
      <td align="right">1588</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">208</td>
      <td align="left"><a href="https://cran.r-project.org/package=StructFDR">StructFDR</a></td>
      <td align="right">1586</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">209</td>
      <td align="left"><a href="https://cran.r-project.org/package=idar">idar</a></td>
      <td align="right">1579</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">210</td>
      <td align="left"><a href="https://cran.r-project.org/package=PIGShift">PIGShift</a></td>
      <td align="right">1534</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">211</td>
      <td align="left"><a href="https://cran.r-project.org/package=jrich">jrich</a></td>
      <td align="right">1511</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">212</td>
      <td align="left"><a href="https://cran.r-project.org/package=TotalCopheneticIndex">TotalCopheneticIndex</a></td>
      <td align="right">1509</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">213</td>
      <td align="left"><a href="https://cran.r-project.org/package=subniche">subniche</a></td>
      <td align="right">1508</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">214</td>
      <td align="left"><a href="https://cran.r-project.org/package=Plasmidprofiler">Plasmidprofiler</a></td>
      <td align="right">1495</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">215</td>
      <td align="left"><a href="https://cran.r-project.org/package=TKF">TKF</a></td>
      <td align="right">1487</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">216</td>
      <td align="left"><a href="https://cran.r-project.org/package=rwty">rwty</a></td>
      <td align="right">1485</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">217</td>
      <td align="left"><a href="https://cran.r-project.org/package=TreeSearch">TreeSearch</a></td>
      <td align="right">1474</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">218</td>
      <td align="left"><a href="https://cran.r-project.org/package=PhyInformR">PhyInformR</a></td>
      <td align="right">1402</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">219</td>
      <td align="left"><a href="https://cran.r-project.org/package=skeleSim">skeleSim</a></td>
      <td align="right">1400</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">220</td>
      <td align="left"><a href="https://cran.r-project.org/package=insect">insect</a></td>
      <td align="right">1383</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">221</td>
      <td align="left"><a href="https://cran.r-project.org/package=treeDA">treeDA</a></td>
      <td align="right">1344</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">222</td>
      <td align="left"><a href="https://cran.r-project.org/package=multilaterals">multilaterals</a></td>
      <td align="right">1337</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">223</td>
      <td align="left"><a href="https://cran.r-project.org/package=CollessLike">CollessLike</a></td>
      <td align="right">1304</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">224</td>
      <td align="left"><a href="https://cran.r-project.org/package=vhica">vhica</a></td>
      <td align="right">1299</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">225</td>
      <td align="left"><a href="https://cran.r-project.org/package=motmot.2.0">motmot.2.0</a></td>
      <td align="right">1115</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">226</td>
      <td align="left"><a href="https://cran.r-project.org/package=treedater">treedater</a></td>
      <td align="right">926</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">227</td>
      <td align="left"><a href="https://cran.r-project.org/package=ratematrix">ratematrix</a></td>
      <td align="right">813</td>
      <td align="left">✅</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">228</td>
      <td align="left"><a href="https://cran.r-project.org/package=PVR">PVR</a></td>
      <td align="right">772</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">229</td>
      <td align="left"><a href="https://cran.r-project.org/package=P2C2M">P2C2M</a></td>
      <td align="right">749</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">230</td>
      <td align="left"><a href="https://cran.r-project.org/package=metaboGSE">metaboGSE</a></td>
      <td align="right">747</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">231</td>
      <td align="left"><a href="https://cran.r-project.org/package=RRphylo">RRphylo</a></td>
      <td align="right">692</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">232</td>
      <td align="left"><a href="https://cran.r-project.org/package=ggrasp">ggrasp</a></td>
      <td align="right">677</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">233</td>
      <td align="left"><a href="https://cran.r-project.org/package=CommT">CommT</a></td>
      <td align="right">634</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">234</td>
      <td align="left"><a href="https://cran.r-project.org/package=FossilSim">FossilSim</a></td>
      <td align="right">529</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">235</td>
      <td align="left"><a href="https://cran.r-project.org/package=POUMM">POUMM</a></td>
      <td align="right">513</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">236</td>
      <td align="left"><a href="https://cran.r-project.org/package=rhierbaps">rhierbaps</a></td>
      <td align="right">391</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">237</td>
      <td align="left"><a href="https://cran.r-project.org/package=RPS">RPS</a></td>
      <td align="right">390</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">238</td>
      <td align="left"><a href="https://cran.r-project.org/package=balance">balance</a></td>
      <td align="right">0</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">239</td>
      <td align="left"><a href="https://cran.r-project.org/package=hillR">hillR</a></td>
      <td align="right">0</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">240</td>
      <td align="left"><a href="https://cran.r-project.org/package=kmeRs">kmeRs</a></td>
      <td align="right">0</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">241</td>
      <td align="left"><a href="https://cran.r-project.org/package=phylocomr">phylocomr</a></td>
      <td align="right">0</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">242</td>
      <td align="left"><a href="https://cran.r-project.org/package=rr2">rr2</a></td>
      <td align="right">0</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
    <tr>
      <td align="right">243</td>
      <td align="left"><a href="https://cran.r-project.org/package=slouch">slouch</a></td>
      <td align="right">0</td>
      <td align="left">🚫</td>
      <td align="left">✅</td>
    </tr>
  </tbody>
</table>
<section class="footnotes">
  <ol>
    <li id="fn1">
      <p>Note that we can’t use the typical filtering mechanism using the single square bracket <code>[</code> because of the way lists work. In particular, there’s no good destructuring syntax for lists-of-lists as there are for simple vectors. See <code>?Extract</code> for more details. <a href="#fnref1" class="footnote-backref">↩</a></p>
    </li>
    <li id="fn2">
      <p>The builtin package <code>tools</code> also has it, but it only returns packages that you have currently installed. <a href="#fnref2" class="footnote-backref">↩</a></p>
    </li>
  </ol>
</section>
]]></content>
</entry>
<entry>
  <title><![CDATA[Animating and labeling figures with ImageMagick]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/animating-and-labeling-figures-with-imagemagick/"/>
  <id>https://jonathanchang.org/blog/animating-and-labeling-figures-with-imagemagick</id>
  <published>2018-08-07T18:52:00+00:00</published>
  <updated>2018-08-07T18:52:00+00:00</updated>
  <content type="html"><![CDATA[
     <p><a href="https://www.imagemagick.org">ImageMagick</a> is an incredible command-line tool that lets you edit and convert images of all sorts. Suppose you wanted to <a href="https://fishtreeoflife.org/rabosky-et-al-2018-update/">compare some figures that you’ve generated</a>, and label the figures with their respective filenames so that you know which is which. Here’s a quick worked example in R and Terminal that gets you going.</p>
  <p>First, let’s generate some figures to compare. The <a href="https://www.tidyverse.org/articles/2018/07/ggplot2-3-0-0/">latest version of ggplot2 (3.0)</a> recently added the <a href="https://cran.r-project.org/web/packages/viridis/vignettes/intro-to-viridis.html">new viridis color palettes</a> as an option, and by default, that are ordered factors use the viridis palettes when assigned to the color or fill aesthetic. Let’s plot the default <code>diamonds</code> dataset to compare the two:</p>
  <div class="language-r highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">ggplot2</span><span class="p">)</span><span class="w">
</span><span class="c1"># Uses viridis by default</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">carat</span><span class="p">,</span><span class="w"> </span><span class="n">price</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">clarity</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">geom_point</span><span class="p">()</span><span class="w">
</span><span class="n">ggsave</span><span class="p">(</span><span class="s2">"diamonds_ordered.png"</span><span class="p">)</span><span class="w">
<p></span><span class="c1"># Unorder the factor to use the unordered palette</span><span class="w">
</span><span class="n">ggplot</span><span class="p">(</span><span class="n">diamonds</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">carat</span><span class="p">,</span><span class="w"> </span><span class="n">price</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">clarity</span><span class="p">,</span><span class="w"> </span><span class="n">ordered</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">labs</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">“clarity”</span><span class="p">)</span><span class="w">
</span><span class="n">ggsave</span><span class="p">(</span><span class="s2">“diamonds_unordered.png”</span><span class="p">)</span><span class="w">
</span></code></pre>
    </div>
  </div>
</p>
<p>Next, let’s fire up ImageMagick to convert this to an animated GIF. Sadly we can’t use this in papers yet but it can easily be blogged or tweeted about. <code>brew install imagemagick</code> if you don’t have it already, then:</p>
<div class="language-bash highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="bash">convert <span class="nt">-resize</span> 33% <span class="nt">-gravity</span> north <span class="nt">-undercolor</span> <span class="s1">'#ffffff80'</span> <span class="nt">-pointsize</span> 16 <span class="nt">-annotate</span> 0 <span class="s2">"%f"</span> diamonds_<span class="k">*</span>.png <span class="nt">-set</span> delay 100 <span class="nt">-loop</span> 0 diamonds_flicker.gif
</code></pre>
  </div>
</div>
<p>This does a few things:</p>
<ul>
  <li>resizes the image to 1/3 the original size</li>
  <li>adds the filename to the top of the image, with a slightly transparent white background</li>
  <li>sets a delay of 100 milliseconds between frames</li>
</ul>
<p>Here’s the end result:</p>
<p><img src="/uploads/2018/diamonds_flicker.gif" alt="A comparison of the two figure drawing methods" srcset="/uploads/2018/diamonds_flicker.gif 2x"></p>
<p>There’s a lot of cool stuff that ImageMagick can do (basically every kind of image manipulation imaginable), so <a href="https://www.imagemagick.org/Usage/">check out the manual</a> and start hacking!</p>
]]></content>
</entry>
<entry>
  <title><![CDATA[Removing PAUP's expiration date and version check]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/removing-paup-s-expiration-date-and-version-check/"/>
  <id>https://jonathanchang.org/blog/removing-paup-s-expiration-date-and-version-check</id>
  <published>2018-06-05T07:18:00+00:00</published>
  <updated>2018-06-05T07:18:00+00:00</updated>
  <content type="html"><![CDATA[
     <h2>Bottom line</h2>
  <p>PAUP* phones home for a version check every time it is started. This function is undocumented and the software itself does not alert the user that this occurs.</p>
  <p>PAUP* also has an expiration date, set to July 2018 at the time of this writing, after which the program cannot be used.</p>
  <p><em>Note: PAUP (as of version 4a166) now no longer expires. The version check still exists, though.</em></p>
  <p>I would prefer to remove these features.</p>
  <p>Copy and paste this into Terminal:</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">curl <span class="nt">-L</span> http://phylosolutions.com/paup-test/paup4a161_osx.gz | gzcat | perl <span class="nt">-pe</span> <span class="s1">'s/p(hylosolutions.com)/_$1/g; s/\x{81}(\x{fa}\x{e2}\x{07})/\x{c3}$1/'</span> <span class="o">&gt;</span> paup4a161_fixed
<span class="nb">chmod </span>a+x paup4a161_fixed
./paup4a161_fixed
</code></pre>
    </div>
  </div>
  <p>This downloads PAUP for macOS and gently applies the Stick of Correction.</p>
  <p>This won’t necessarily work on Linux so you’ll have to follow the detailed steps below to apply the specific correction. If you figure it out for Linux please tweet or email me and I’ll update this article.</p>
  <h2>Introduction</h2>
  <p><a href="http://paup.phylosolutions.com/">PAUP*</a> is a great piece of software that has some misfeatures that I’d like to correct. Its silent version checks and the <a href="https://en.wikipedia.org/wiki/Time_bomb_(software)">time-bomb</a> function that make the software stop working after a certain date are pretty user-hostile. The closed-source nature of PAUP also means that we can’t just go into the source and delete the offending code.</p>
  <p>But given enough know-how we can still modify this closed-source binary executable. The rest of this blog post will detail the general way I approached the problem and how I patched PAUP to get around these mandatory date and version checks.</p>
  <h2>PAUP phones home</h2>
  <p><img src="/uploads/2018/paup_snitch.png" alt="Little Snitch alerts us to PAUP’s version check" /></p>
  <p>I recently bought and installed <a href="https://www.obdev.at/products/littlesnitch/index.html">Little Snitch</a>, which alerts you when software is connecting to the Internet, and lets you allow or deny the connection attempt. I started up PAUP* and it surprisingly asked to connect to the Internet. I forbade the connection and started to dig deeper to find out what was going on.</p>
  <blockquote>
    <p>Note: Attempt to get current version info from server was not successful (no network connection available, or connection failed)</p>
  </blockquote>
  <p>The Little Snitch dialog suggested a connection attempt to phylosolutions.com, which is where the download for PAUP is hosted. Let’s see if we can find that website encoded in the binary using <code>strings</code>, which takes a binary (or any other file) and tries to extract human-readable text out of it.</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">strings paup4a161_osx | <span class="nb">grep </span>phylosolutions
</code></pre>
    </div>
  </div>
  <p>The output of which is:</p>
  <div class="highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code>http://phylosolutions.com/paup-distribution/current_dev_version
## (snipped)
</code></pre>
    </div>
  </div>
  <p>That first output line is interesting, so let’s visit that site with <code>curl</code>:</p>
  <div class="highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code>$ curl http://phylosolutions.com/paup-distribution/current_dev_version
<p>160	4.0a (build 160)
</code></pre>
    </div>
  </div>
</p>
<p>Looks like it’s returning the latest build number and a human readable version string, delimited by tabs.</p>
<p>This suggests that we can simply change the requested URL to an invalid one, so that this network connection is never made.</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>perl -pi -e 's/p(hylosolutions.com)/_$1/g' paup4a161_osx
</code></pre>
  </div>
</div>
<p>This tells Perl to <code>-e</code>xecute a line of code, <code>-p</code>rinting the result of editing the file <code>paup4a161_osx</code> <code>-i</code>n place, using a regular expression to change the first character of <code>phylosolutions.com</code> to an underscore.</p>
<p>Note that the replacement string must have the same number of characters as the input string since we’re patching an executable file. (Try excluding the underscore to see what happens).</p>
<p>We can confirm this worked since running the updated executable results in a new error message and no connection attempt:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>./paup4a161_osx
</code></pre>
  </div>
</div>
<blockquote>
  <p>Note: Attempt to get current version info from server was not successful (network connection failed with error code = 6)</p>
</blockquote>
<h2>PAUP expires</h2>
<p>I was intrigued to see that PAUP announces an expiration date when you start the program:</p>
<blockquote>
  <p>This is an alpha-test version that is still changing rapidly.</p>
  <p>It will expire on 1 Jul 2018.</p>
</blockquote>
<p>When we ran <code>strings</code> above we also saw an expiration message:</p>
<blockquote>
  <p>This version of PAUP has expired.  Visit <a href="http://paup.phylosolutions.com">http://paup.phylosolutions.com</a> to obtain a newer version.</p>
</blockquote>
<p>How does software “expire”? Let’s test this out by artificially setting our system time to August 1, 2018:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>sudo date 0801000018
</code></pre>
  </div>
</div>
<p>PAUP* now announces that it is expired and unceremoniously exits! We can’t  run PAUP in August, but this can be fixed. First we should restore our original system time:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>sudo ntpdate -u time.apple.com
</code></pre>
  </div>
</div>
<p>We could take the same approach as before, looking for strings that correspond to dates. But it seems unlikely that the date would be coded as a string, and I’m not sure what the exact date would be (July 1? June 30? etc.) so I’m going to take a different approach and look for debugging symbols that might correspond to what we’re looking for. Luckily the PAUP binary hasn’t had these stripped out:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>nm paup4a161_osx | grep -i expir
</code></pre>
  </div>
</div>
<p>Output:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>0000000100027cb6 T _checkIfExpired
00000001003ce998 b _gVersionHasExpired
0000000100027c44 T _isExpiredOrUnlicensed
0000000100027c3c T _isVersionExpired
0000000100027c92 T _showExpiredMessage
0000000100027c4c t _testExpiration
</code></pre>
  </div>
</div>
<p>All very promising. The last function, <code>testExpiration</code> seems particularly meaty. I fire up LLDB, a debugger, and ask it to disassemble that function for me:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code> $ lldb paup4a161_osx 
(lldb) target create "paup4a161_osx"
Current executable set to 'paup4a161_osx' (x86_64).
(lldb) di -n testExpiration
paup4a161_osx`testExpiration:
paup4a161_osx[0x100027c4c] &lt;+0&gt;:  xorl   %eax, %eax
paup4a161_osx[0x100027c4e] &lt;+2&gt;:  cmpl   $0x7e2, %edx              ; imm = 0x7E2 
paup4a161_osx[0x100027c54] &lt;+8&gt;:  jl     0x100027c86               ; &lt;+58&gt;
paup4a161_osx[0x100027c56] &lt;+10&gt;: jg     0x100027c86               ; &lt;+58&gt;
paup4a161_osx[0x100027c58] &lt;+12&gt;: je     0x100027c62               ; &lt;+22&gt;
paup4a161_osx[0x100027c5a] &lt;+14&gt;: movb   $0x0, 0x3a6d37(%rip)      ; gExportMissingSymbol + 7
paup4a161_osx[0x100027c61] &lt;+21&gt;: retq   
<h2>(rest of the disassembly snipped)</h2>
<p></code></pre>
  </div>
</div>
</p>
<p>This is the translated machine code that our computer is actually running when PAUP calls this function. The important line is <code>cmpl $0x7e2, %edx</code> followed by the <code>jl jg je</code> commands. This basically says “compare what’s in edx to the constant value 0x7e2, then jump to one memory location or another if they’re equal (<code>je</code>) or not (<code>jl</code> and <code>jg</code>).</p>
<p>I know I’m on the right track now since hexadecimal 0x7e2 is equal to decimal 2018 — the current year.</p>
<p>I’m still a novice at assembly but my interpretation is that PAUP only runs if the current date is between March 1, 2018 and July 1, 2018.</p>
<div class="language-nasm highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="nasm"><span class="nf">xorl</span>   <span class="o">%</span><span class="nb">eax</span><span class="p">,</span> <span class="o">%</span><span class="nb">eax</span>                <span class="c1">;</span>
<span class="nf">cmpl</span>   <span class="kc">$</span><span class="mh">0x7e2</span><span class="p">,</span> <span class="o">%</span><span class="nb">edx</span>              <span class="c1">; 0x7e2 = 2018</span>
<span class="nf">jl</span>     <span class="mh">0x100027c86</span>               <span class="c1">; if edx != 2018: return true</span>
<span class="nf">jg</span>     <span class="mh">0x100027c86</span>               <span class="c1">;</span>
<span class="nf">je</span>     <span class="mh">0x100027c62</span>               <span class="c1">; edx == 2018</span>
<span class="nf">movb</span>   <span class="kc">$</span><span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x3a6d37</span><span class="p">(</span><span class="o">%</span><span class="nv">rip</span><span class="p">)</span>      <span class="c1">;</span>
<span class="nf">retq</span>                             <span class="c1">;</span>
<span class="nf">cmpl</span>   <span class="kc">$</span><span class="mh">0x3</span><span class="p">,</span> <span class="o">%</span><span class="nb">esi</span>                <span class="c1">; 0x3 = 3</span>
<span class="nf">jl</span>     <span class="mh">0x100027c7b</span>               <span class="c1">; if esi &lt; 3: return true</span>
<span class="nf">jne</span>    <span class="mh">0x100027c6f</span>               <span class="c1">;</span>
<span class="nf">testl</span>  <span class="o">%</span><span class="nb">edi</span><span class="p">,</span> <span class="o">%</span><span class="nb">edi</span>                <span class="c1">;</span>
<span class="nf">jle</span>    <span class="mh">0x100027c7b</span>               <span class="c1">;</span>
<span class="nf">jmp</span>    <span class="mh">0x100027c5a</span>               <span class="c1">;</span>
<span class="nf">cmpl</span>   <span class="kc">$</span><span class="mh">0x7</span><span class="p">,</span> <span class="o">%</span><span class="nb">esi</span>                <span class="c1">; 0x7 = 7</span>
<span class="nf">jg</span>     <span class="mh">0x100027c7b</span>               <span class="c1">; if esi &gt; 7: return true</span>
<span class="nf">jne</span>    <span class="mh">0x100027c5a</span>               <span class="c1">;</span>
<span class="nf">cmpl</span>   <span class="kc">$</span><span class="mh">0x1</span><span class="p">,</span> <span class="o">%</span><span class="nb">edi</span>                <span class="c1">; 0x1 = 1</span>
<span class="nf">jle</span>    <span class="mh">0x100027c5a</span>               <span class="c1">; probably tests for day, hour, etc. in a loop</span>
<span class="nf">pushq</span>  <span class="kc">$</span><span class="mh">0x1</span>                      <span class="c1">;</span>
<span class="nf">popq</span>   <span class="o">%</span><span class="nb">rax</span>                      <span class="c1">;</span>
<span class="nf">movb</span>   <span class="kc">$</span><span class="mh">0x1</span><span class="p">,</span> <span class="mh">0x3a6d13</span><span class="p">(</span><span class="o">%</span><span class="nv">rip</span><span class="p">)</span>      <span class="c1">; return true?</span>
<span class="nf">retq</span>                             <span class="c1">;</span>
<span class="nf">pushq</span>  <span class="kc">$</span><span class="mh">0x1</span>                      <span class="c1">;</span>
<span class="nf">popq</span>   <span class="o">%</span><span class="nb">rax</span>                      <span class="c1">;</span>
<span class="nf">movb</span>   <span class="kc">$</span><span class="mh">0x1</span><span class="p">,</span> <span class="mh">0x3a6d08</span><span class="p">(</span><span class="o">%</span><span class="nv">rip</span><span class="p">)</span>      <span class="c1">; return true?</span>
<span class="nf">retq</span>
<span class="nf">nop</span>
</code></pre>
  </div>
</div>
<p>Returning to <code>lldb</code> we ask for the exact disassembled bytes of this function:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>(lldb) dis -b -n testExpiration
paup4a161_osx`testExpiration:
paup4a161_osx[0x100027c4c] &lt;+0&gt;:  33 c0                 xorl   %eax, %eax
paup4a161_osx[0x100027c4e] &lt;+2&gt;:  81 fa e2 07 00 00     cmpl   $0x7e2, %edx
paup4a161_osx[0x100027c54] &lt;+8&gt;:  7c 30                 jl     0x100027c86
paup4a161_osx[0x100027c56] &lt;+10&gt;: 7f 2e                 jg     0x100027c86
paup4a161_osx[0x100027c58] &lt;+12&gt;: 74 08                 je     0x100027c62
paup4a161_osx[0x100027c61] &lt;+21&gt;: c3                    retq
<h2>(snipped)</h2>
<p></code></pre>
  </div>
</div>
</p>
<p>We want to patch out the byte sequence <code>81 fa e2 07</code> since that corresponds to the first critical <code>cmpl</code> instruction. (N.B.: 0x7e2 is stored as <code>e2 07</code>, in <a href="https://en.wikipedia.org/wiki/Endianness">little-endian order</a>). Let’s use <code>xxd</code> to ensure that this sequence is unique:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>xxd -p paup4a161_osx | grep -o '81fae207'
</code></pre>
  </div>
</div>
<p>This confirms that the hex sequence <code>81 fa e2 07</code> only occurs once in the file and is safe to patch out.</p>
<p>Normally <code>xxd</code> will split its output in nice human-readable format (try running <code>xxd paup4a161_osx | less</code> to see), but we turn that off with <code>xxd -p</code> since the hex sequence we’re looking for might get split across multiple lines. We then tell <code>grep</code> to <code>-o</code>nly print out the matching sequence, so we’re not deluged with the entire binary file in hex!</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>perl -pi -e 's/\x{81}(\x{fa}\x{e2}\x{07})/\x{c3}$1/' paup4a161_osx
</code></pre>
  </div>
</div>
<p>This replaces the first byte <code>0x81</code> with <code>0xc3</code>, which is the opcode for <code>ret</code>. Basically it makes the comparison function return as soon as it is called.</p>
<p>We can verify that this worked by setting the date into the future again and running PAUP. Success!</p>
<h2>Conclusion</h2>
<p>I hope this was a useful tutorial on how to confidently navigate closed-source binaries and modify them to suit your own needs. I also hope that this encourages phylogenetics software developers to release their code as open-source, so that others can more easily change the software without resorting to these hacks.</p>
<p>I understand that the developers probably have good reasons to do version checks, though users should be informed and agree to such checks. I think there is less of an argument for including time-bombs that disable the software. Ultimately it’s on the users to use the software appropriately and attempting to restrict user freedoms in this way is both unnecessary and futile.</p>
<!---
96b7b9eea4d05d9aa3c361b3ec9dcd2c9a820874b574e17bb355f64ccfe2f08b  paup4a161_fixed
cf2fc5890a516c2539f864c1269de30211704b9ad491bed5672d199f72d79754  paup4a161_osx
6480de7b813aa5a7ea4258efcd5183ee005063797c8582322f3ea4dc9dbe45fa  paup4a161_osx.gz
--->
]]></content>
</entry>
<entry>
  <title><![CDATA[How to partially rasterize a figure plotted with R]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/how-to-partially-rasterize-a-figure-plotted-with-r/"/>
  <id>https://jonathanchang.org/blog/how-to-partially-rasterize-a-figure-plotted-with-r</id>
  <published>2018-05-25T17:25:00+00:00</published>
  <updated>2018-05-25T17:25:00+00:00</updated>
  <content type="html"><![CDATA[
     <style>img { background: #ccc }</style>
  <p>If you work with datasets that are big enough in R you will eventually encounter situations where your plots are so complex that they do things like crash Preview.app on macOS. For me this happens a lot when I generate huge scatterplots with very dense overplotting. These don’t add much information to the figure but nevertheless must be rendered by your PDF viewer, slowing it down and generally making a mess of things.</p>
  <p>I recently encountered a situation where a journal’s editing office couldn’t handle a particularly complex figure and requested that the figure be converted into a raster format. This is less than ideal compared to a vector format like PDF: you can’t do things like select text from a rasterized PNG and it’s generally just less usable. (<em><a href="http://guides.lib.umich.edu/c.php?g=282942&amp;p=1885352">More info on raster vs. vector images</a></em>). Would it be possible to convert the complex parts of the figure to a raster format while keeping everything else vectorized?</p>
  <p>The answer is yes! And it can all be done in R, with no fiddly conversions by hand and trying to place things precisely in Illustrator.</p>
  <p>Let’s use the built-in <code>mtcars</code> dataset to as an example, and include some colors and a legend:</p>
  <div class="language-r highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="r"><span class="n">plot</span><span class="p">(</span><span class="n">mpg</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">wt</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">cyl</span><span class="p">),</span><span class="w"> </span><span class="n">pch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">19</span><span class="p">)</span><span class="w">
</span><span class="n">legend</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">34</span><span class="p">,</span><span class="w"> </span><span class="n">legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">levels</span><span class="p">(</span><span class="n">factor</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">cyl</span><span class="p">)),</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">),</span><span class="w"> </span><span class="n">pch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">19</span><span class="p">)</span><span class="w">
</span></code></pre>
    </div>
  </div>
  <p><img src="/uploads/2018/mtcars.png" alt="" /></p>
  <p>Note that the legend overlaps the plot area. If you were to simply plot the entire thing as a PNG and then crop out the plot area, you’d either also have to rasterize the legend (and lose the ability to edit the text in Illustrator later) or manually erase the legend (let’s avoid doing things by hand).</p>
  <p>Let’s modify this code step by step. First set up our PDF device, with an output size of 7 by 7 inches.</p>
  <div class="language-r highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="r"><span class="n">pdf</span><span class="p">(</span><span class="s2">"mtcars.pdf"</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">7</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">7</span><span class="p">)</span><span class="w">
</span></code></pre>
    </div>
  </div>
  <p>Next set up the plot axes and legend. These are the same plot commands as before, but here <code>type = &quot;n&quot;</code> is specified, so that only the axes are set up, but no data are actually plotted.</p>
  <div class="language-r highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="r"><span class="n">plot</span><span class="p">(</span><span class="n">mpg</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">wt</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">cyl</span><span class="p">),</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"n"</span><span class="p">)</span><span class="w">
</span><span class="n">legend</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">34</span><span class="p">,</span><span class="w"> </span><span class="n">legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">levels</span><span class="p">(</span><span class="n">factor</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">cyl</span><span class="p">)),</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">),</span><span class="w"> </span><span class="n">pch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">19</span><span class="p">)</span><span class="w">
</span></code></pre>
    </div>
  </div>
  <p>Now we must figure out how big our plot area actually is. To do so, use the <code>par</code> function to extract the plot limits. This returns a 4-element vector, where the first two elements are the x-coordinates and the last two elements are the y-coordinates of the plot area.</p>
  <div class="language-r highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="r"><span class="n">coords</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">par</span><span class="p">(</span><span class="s2">"usr"</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] 4.156 8.044 0.764 7.136</span><span class="w">
</span></code></pre>
    </div>
  </div>
  <p>However, these coordinates are in “user” space, meaning that they don’t correspond to the physical dimensions in the plot device. Use the <code>grconvert</code> functions to convert from user space to plot device space, in inches:</p>
  <div class="language-r highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="r"><span class="n">gx</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">grconvertX</span><span class="p">(</span><span class="n">coords</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="m">2</span><span class="p">],</span><span class="w"> </span><span class="s2">"user"</span><span class="p">,</span><span class="w"> </span><span class="s2">"inches"</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] 0.82 6.58</span><span class="w">
<p></span><span class="n">gy</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">grconvertY</span><span class="p">(</span><span class="n">coords</span><span class="p">[</span><span class="m">3</span><span class="o">:</span><span class="m">4</span><span class="p">],</span><span class="w"> </span><span class="s2">“user”</span><span class="p">,</span><span class="w"> </span><span class="s2">“inches”</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] 1.02 6.18</span><span class="w"></p>
<p></span><span class="n">width</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">gx</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nf">min</span><span class="p">(</span><span class="n">gx</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] 5.76</span><span class="w"></p>
<p></span><span class="n">height</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">gy</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nf">min</span><span class="p">(</span><span class="n">gy</span><span class="p">)</span><span class="w">
</span><span class="c1"># [1] 5.16</span><span class="w">
</span></code></pre>
    </div>
  </div>
</p>
<p>Now set up a raster device with the dimensions computed from the vector (PDF) device. Note that the PDF device is still active at this point.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">png</span><span class="p">(</span><span class="s2">"mtcars_panel.png"</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">width</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">height</span><span class="p">,</span><span class="w"> </span><span class="n">units</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"in"</span><span class="p">,</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">300</span><span class="p">,</span><span class="w"> </span><span class="n">bg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"transparent"</span><span class="p">)</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Since the plot axes are handled in the vector device, it’s unnecessary to set those up. So avoid the high level <code>plot</code> commands and instead set up the plot areas from scratch. <code>plot.window</code> needs the x and y limits computed earlier, but by default R will expand the limits so that a data point right on the edge of the specified limits doesn’t get cut off.</p>
<p>Tell R to turn off this feature by setting <code>xaxs</code> and <code>yaxs</code> to <code>&quot;i&quot;</code>. Also turn off the plot margins with <code>mar = c(0,0,0,0)</code> since that will just be empty space.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">plot.new</span><span class="p">()</span><span class="w">
</span><span class="n">plot.window</span><span class="p">(</span><span class="n">coords</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="m">2</span><span class="p">],</span><span class="w"> </span><span class="n">coords</span><span class="p">[</span><span class="m">3</span><span class="o">:</span><span class="m">4</span><span class="p">],</span><span class="w"> </span><span class="n">mar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">),</span><span class="w"> </span><span class="n">xaxs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"i"</span><span class="p">,</span><span class="w"> </span><span class="n">yaxs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"i"</span><span class="p">)</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Finally, plot the data points as before and close the PNG device.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">points</span><span class="p">(</span><span class="n">mpg</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">wt</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">cyl</span><span class="p">),</span><span class="w"> </span><span class="n">pch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">19</span><span class="p">)</span><span class="w">
</span><span class="n">dev.off</span><span class="p">()</span><span class="w">
</span><span class="c1"># pdf</span><span class="w">
</span><span class="c1">#   2</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Now there are two figures that look like this, one PDF and one PNG:</p>
<p><img src="/uploads/2018/mtcars_axes.png" alt="Iris Axes" />
  <img src="/uploads/2018/mtcars_panel.png" alt="Iris Panel" /></p>
<p>To combine these, read in the generated PNG file using the <code>png</code> library, and then plot it using the <code>rasterImage</code> function. The relevant code looks like this:</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">png</span><span class="p">)</span><span class="w">
</span><span class="n">panel</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">readPNG</span><span class="p">(</span><span class="s2">"mtcars_panel.png"</span><span class="p">)</span><span class="w">
</span><span class="n">rasterImage</span><span class="p">(</span><span class="n">panel</span><span class="p">,</span><span class="w"> </span><span class="n">coords</span><span class="p">[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">coords</span><span class="p">[</span><span class="m">3</span><span class="p">],</span><span class="w"> </span><span class="n">coords</span><span class="p">[</span><span class="m">2</span><span class="p">],</span><span class="w"> </span><span class="n">coords</span><span class="p">[</span><span class="m">4</span><span class="p">])</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Note that the coordinates for <code>rasterImage</code> be specified a different order than for the <code>plot.window</code> function from before.</p>
<p>Wrap up by closing the PDF device.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">dev.off</span><span class="p">()</span><span class="w">
</span><span class="c1"># null device </span><span class="w">
</span><span class="c1">#           1 </span><span class="w">
</span></code></pre>
  </div>
</div>
<p>All together, here is the entire script. It’s a bit different from what’s written above; in particular, I save the rasterized plot area to a temporary file to avoid cluttering up our working directory.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">png</span><span class="p">)</span><span class="w">
<p></span><span class="n">pdf</span><span class="p">(</span><span class="s2">“mtcars.pdf”</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">7</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">7</span><span class="p">)</span><span class="w"></p>
<p></span><span class="c1"># Set up plot axes and legend</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="n">mpg</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">wt</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">cyl</span><span class="p">),</span><span class="w"> </span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">“n”</span><span class="p">)</span><span class="w">
</span><span class="n">legend</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">34</span><span class="p">,</span><span class="w"> </span><span class="n">legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">levels</span><span class="p">(</span><span class="n">factor</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">cyl</span><span class="p">)),</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">1</span><span class="o">:</span><span class="m">3</span><span class="p">),</span><span class="w"> </span><span class="n">pch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">19</span><span class="p">)</span><span class="w"></p>
<p></span><span class="c1"># Extract plot area in both user and physical coordinates</span><span class="w">
</span><span class="n">coords</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">par</span><span class="p">(</span><span class="s2">“usr”</span><span class="p">)</span><span class="w">
</span><span class="n">gx</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">grconvertX</span><span class="p">(</span><span class="n">coords</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="m">2</span><span class="p">],</span><span class="w"> </span><span class="s2">“user”</span><span class="p">,</span><span class="w"> </span><span class="s2">“inches”</span><span class="p">)</span><span class="w">
</span><span class="n">gy</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">grconvertY</span><span class="p">(</span><span class="n">coords</span><span class="p">[</span><span class="m">3</span><span class="o">:</span><span class="m">4</span><span class="p">],</span><span class="w"> </span><span class="s2">“user”</span><span class="p">,</span><span class="w"> </span><span class="s2">“inches”</span><span class="p">)</span><span class="w">
</span><span class="n">width</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">gx</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nf">min</span><span class="p">(</span><span class="n">gx</span><span class="p">)</span><span class="w">
</span><span class="n">height</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">gy</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nf">min</span><span class="p">(</span><span class="n">gy</span><span class="p">)</span><span class="w"></p>
<p></span><span class="c1"># Get a temporary file name for our rasterized plot area</span><span class="w">
</span><span class="n">tmp</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">tempfile</span><span class="p">()</span><span class="w"></p>
<p></span><span class="c1"># Can increase resolution from 300 if higher quality is desired.</span><span class="w">
</span><span class="n">png</span><span class="p">(</span><span class="n">tmp</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">width</span><span class="p">,</span><span class="w"> </span><span class="n">height</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">height</span><span class="p">,</span><span class="w"> </span><span class="n">units</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">“in”</span><span class="p">,</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">300</span><span class="p">,</span><span class="w"> </span><span class="n">bg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">“transparent”</span><span class="p">)</span><span class="w">
</span><span class="n">plot.new</span><span class="p">()</span><span class="w">
</span><span class="n">plot.window</span><span class="p">(</span><span class="n">coords</span><span class="p">[</span><span class="m">1</span><span class="o">:</span><span class="m">2</span><span class="p">],</span><span class="w"> </span><span class="n">coords</span><span class="p">[</span><span class="m">3</span><span class="o">:</span><span class="m">4</span><span class="p">],</span><span class="w"> </span><span class="n">mar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">,</span><span class="m">0</span><span class="p">),</span><span class="w"> </span><span class="n">xaxs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">“i”</span><span class="p">,</span><span class="w"> </span><span class="n">yaxs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">“i”</span><span class="p">)</span><span class="w">
</span><span class="n">points</span><span class="p">(</span><span class="n">mpg</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="n">wt</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">mtcars</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">factor</span><span class="p">(</span><span class="n">mtcars</span><span class="o">$</span><span class="n">cyl</span><span class="p">),</span><span class="w"> </span><span class="n">pch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">19</span><span class="p">)</span><span class="w">
</span><span class="n">dev.off</span><span class="p">()</span><span class="w"></p>
<p></span><span class="c1"># Windows users may have trouble with transparent plot backgrounds; if this is the case,</span><span class="w">
</span><span class="c1"># set bg = “white” above and move the legend plot command below the raster plot command.</span><span class="w">
</span><span class="n">panel</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">readPNG</span><span class="p">(</span><span class="n">tmp</span><span class="p">)</span><span class="w">
</span><span class="n">rasterImage</span><span class="p">(</span><span class="n">panel</span><span class="p">,</span><span class="w"> </span><span class="n">coords</span><span class="p">[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">coords</span><span class="p">[</span><span class="m">3</span><span class="p">],</span><span class="w"> </span><span class="n">coords</span><span class="p">[</span><span class="m">2</span><span class="p">],</span><span class="w"> </span><span class="n">coords</span><span class="p">[</span><span class="m">4</span><span class="p">])</span><span class="w"></p>
<p></span><span class="n">dev.off</span><span class="p">()</span><span class="w">
</span></code></pre>
  </div>
</div>
</p>
<h2>Exercises</h2>
<ol>
  <li>What would you need to change to plot a different type of data, e.g., a line plot or a 3D plot?</li>
  <li>How would you apply this to a multi-panel figure?</li>
  <li>How might this be accomplished with <code>ggplot2</code> graphics? (Hint: <code>annotation_raster</code>, <code>theme_void</code>)</li>
</ol>
<h2>Postscript</h2>
<p>An alternative way to do this would be to write to a null device and use <code>dev.capture</code> to rasterize and copy the the figure to the active device. However, that approach doesn’t appear to work consistently across platforms and devices, so I’ve taken the more portable approach presented here.</p>
]]></content>
</entry>
<entry>
  <title><![CDATA[Painless (almost) multiple-choice exams in LaTeX]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/painless-almost-multiple-choice-exams-in-latex/"/>
  <id>https://jonathanchang.org/blog/painless-almost-multiple-choice-exams-in-latex</id>
  <published>2018-02-24T00:00:00+00:00</published>
  <updated>2018-02-24T00:00:00+00:00</updated>
  <content type="html"><![CDATA[
     <p>Worked example files:</p>
  <ul>
    <li><a href="/uploads/2018/latex/example.tex">example.tex</a></li>
    <li><a href="/uploads/2018/latex/mcexam.sty">Modified mcexam.sty</a></li>
  </ul>
  <p><a href="https://www.overleaf.com?r=606b3729&amp;rm=d&amp;rs=b">Sign up for Overleaf, an online LaTeX editor, with my referral code!</a></p>
  <h2>Requirements</h2>
  <p>If you, like me, are interested in writing exams, there is a serious lack of free, quality tools to write exams. In my experience, most instructors that I’ve worked with basically slap something together in Microsoft Word and call it a day. While this is fine for exams and quizzes for courses with fewer than 80 students or so, it rapidly falls apart once you’re administering a multiple choice exam to 600 students. Some of the pitfalls of the Word approach:</p>
  <ol>
    <li>
      <p>Cannot easily scramble question order, to create multiple versions of an exam.</p>
    </li>
    <li>
      <p>Cannot easily scramble answer order, again for multiple versions of an exam.</p>
    </li>
    <li>
      <p>Cannot easily generate an “answer” version where correct answers are bolded or circled or whatever. I need this to refer back to when writing the exam, and when students have questions about the exam.</p>
    </li>
    <li>
      <p>Cannot easily generate a “short” answer key where question numbers and answers are just listed together. I need this to fill in scantron answer keys.</p>
    </li>
    <li>
      <p>Cannot easily generate a large-print version for students that need this type of accommodation.</p>
    </li>
  </ol>
  <p>I also have an aversion to commercial, GUI-only software using formats that aren’t easily accessible to humans to store questions. If I want to remix an exam or use old questions, I have to laboriously copy and paste questions, make sure the numbering is right, etc. Computers should be able to do this for me.</p>
  <h2>LaTeX to the rescue?</h2>
  <p>For writing text, LaTeX is basically what I turn to. There are many packages that purport to do all of the above, plus a few more with respect to LaTeX specifically:</p>
  <ol>
    <li>
      <p>I need to be able to say that some questions are “grouped” together, so I can create sets of problems that are all asking about a common prompt.</p>
    </li>
    <li>
      <p>Must have an easier syntax for both multiple choice and true false questions. Meaning if I have to write <code>\begin{enumerate} \item[A]</code> etc. for each question I will lose it.</p>
    </li>
    <li>
      <p>When scrambling answer order, I need to be able to specify that some questions shouldn’t have their answers scrambled. For example, questions with “none of the above” or “all of the above” as a possible response should not have the answers scrambled since those options should appear last.</p>
    </li>
  </ol>
  <p>Well, which packages are good? As you can tell from the title, it’s not great. I feel the pain of <a href="https://tex.stackexchange.com/questions/360518/making-exams-with-multiple-choice-questions-in-scrambled-versions">this poster on TeX Exchange</a>; it seems like everything just doesn’t quite fit right. My requirements are basically the same as theirs, after all.</p>
  <h2>Some options</h2>
  <p>Here’s a listing of everything I’ve tried so far:</p>
  <p><a href="https://ctan.org/pkg/exam"><em>exam</em></a> - Does not permit randomization. But, it has an <em>incredible</em> syntax for both multiple choice questions, true/false, matching questions, and short answer/essay type questions. I definitely use this one for shorter in-class quizzes that students complete in 20 minutes or so.</p>
  <p><a href="https://ctan.org/pkg/examdesign"><em>examdesign</em></a> - It <em>almost</em> does everything I need. Randomization, great syntax for multiple choice questions, except it doesn’t have good ergonomics for True/False questions, as True/False questions MUST be in a separate “section” from regular MC questions. Which doesn’t make sense since True/False questions are basically multiple-choice questions, except they have a fixed set of answers. Forcing True/False questions into a separate section has the downside that (1) you can’t randomize the order of MC and TF questions, (2) the TF questions don’t <em>look</em> like the MC questions (bad if you are writing a scantron exam), and (3) if you just write TF questions as multiple-choice questions, you MUST include answer choices A. True and B. False <em>every time</em> even though you know it’s a True/False question.</p>
  <p><a href="https://ctan.org/pkg/esami"><em>esami</em></a> - The documentation is terrible. It’s written by Italians and they have not bothered to translate their macros to English, instead supplying brief lessons about the Italian language. Ok, normally I’m fine with bad documentation—scientists are awful at it too and usually I can figure it out if they provide working examples. But I can’t get their examples to work either, because the error messages are in Italian and also talk about macros that are not defined. Great.</p>
  <p><a href="https://ctan.org/pkg/probsoln"><em>probsoln</em></a> - This one is focused on math, and also has no built-in syntax for MC questions.</p>
  <p><a href="https://www.auto-multiple-choice.net"><em>automultiplechoice</em></a> - Not stricly LaTeX only, but it has such a horrific syntax for multiple choice questions that I just ran away screaming.</p>
  <h2>The solution</h2>
  <p>Is everything terrible?</p>
  <p><strong>No</strong>!! I found something that almost, <em>almost</em> works just right: <a href="https://ctan.org/pkg/mcexam"><em>mcexam</em></a>. It does it all: permutation of both question order and answer order, question grouping, answer permutation customization, and a flexible enough syntax that I can include “image” and “table” answers, and also write a few macros to get True/False questions looking and working correctly. It can generate answer key versions (where answers are placed next to question text), short answer keys, as well as an instructor “concept” version that shows in one document how questions and answers are permuted. It also permits some nifty item analysis using an external R script. Sweet!</p>
  <p>There are still a few pain points though, so I’m going to document how I brutally hacked at <em>mcexam</em> to get it to do what I want. You can also <a href="/uploads/2018/latex/example.tex">download an example file</a> that includes everything below and can be compiled after you make one change to <code>mcexam.sty</code> as detailed below.</p>
  <h3>Useful tips for writing questions</h3>
  <h4>A macro for True/False questions</h4>
  <div class="language-tex highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="tex"><span class="k">\global\def\qtrue</span><span class="p">{</span><span class="nt">\begin{mcanswers}</span>[permutenone]<span class="k">\answer</span><span class="na">[correct]</span><span class="p">{</span>1<span class="p">}{}</span><span class="k">\answer</span><span class="p">{</span>2<span class="p">}{}</span><span class="nt">\end{mcanswers}</span><span class="p">}</span>
<span class="k">\global\def\qfalse</span><span class="p">{</span><span class="nt">\begin{mcanswers}</span>[permutenone]<span class="k">\answer</span><span class="p">{</span>1<span class="p">}{}</span><span class="k">\answer</span><span class="na">[correct]</span><span class="p">{</span>2<span class="p">}{}</span><span class="nt">\end{mcanswers}</span><span class="p">}</span>
</code></pre>
    </div>
  </div>
  <p>Note to get this to work you have to hack at <code>mcexam.sty</code>. Copy that file from the CTAN archive into the same folder as your exam <code>.tex</code> file, then comment out this line with a <code>%</code>:</p>
  <div class="language-tex highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="tex"><span class="k">\xifinlist</span><span class="p">{</span><span class="k">\a</span><span class="p">}{</span><span class="k">\mc</span>@answernumVals<span class="p">}{}{</span><span class="k">\PackageError</span><span class="p">{</span>mcexam<span class="p">}{</span>Question <span class="k">\q</span>: answernum <span class="k">\a\space</span> is not specified.<span class="p">}{}}</span>
</code></pre>
    </div>
  </div>
  <p>You can also just <a href="/uploads/2018/latex/mcexam.sty">download a pre-modified version of this file</a>. Drop it in the same folder that you have your <code>.tex</code> file.</p>
  <h4>A macro to show how many questions are in a question group</h4>
  <div class="language-tex highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="tex"><span class="k">\global\def\numq</span><span class="na">[#1]</span><span class="p">{</span>[Questions <span class="k">\the\numexpr\value</span><span class="p">{</span>setmcquestionsi<span class="p">}</span>+1<span class="k">\relax</span>--<span class="k">\the\numexpr\value</span><span class="p">{</span>setmcquestionsi<span class="p">}</span>+#1<span class="k">\relax</span>]<span class="p">}</span>
</code></pre>
    </div>
  </div>
  <p>With the above two macros, you can easily make a series of True/False questions like this:</p>
  <div class="language-tex highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="tex"><span class="nt">\begin{mcquestioninstruction}</span>
<span class="k">\numq</span><span class="na">[5]</span> Which of the following foods can be eaten on a
         ketogenic diet? Mark A for True and B for False:
<span class="nt">\end{mcquestioninstruction}</span>
<p><span class="k">\question</span>         Blueberries <span class="k">\qtrue</span>
<span class="k">\question</span><span class="na">[follow]</span> Eggs        <span class="k">\qtrue</span>
<span class="k">\question</span><span class="na">[follow]</span> Steak       <span class="k">\qtrue</span>
<span class="k">\question</span><span class="na">[follow]</span> Bread       <span class="k">\qfalse</span>
<span class="k">\question</span><span class="na">[follow]</span> Cupcakes    <span class="k">\qfalse</span>
</code></pre>
    </div>
  </div>
</p>
<p>And there won’t be any extraneous A. True B. False options thanks to that macro!</p>
<h4>Save space with <code>multicol</code></h4>
<p>If you have a multiple choice question with really short options, use the <code>multicol</code> environment to save some vertical space by aligning the options all on one line.</p>
<p>First, add <code>\usepackage{multicol}</code> in your preamble, then in the question:</p>
<div class="language-tex highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="tex"><span class="k">\question</span> Which of the following numbers is prime?
<p><span class="nt">\begin{multicols}</span><span class="p">{</span>4<span class="p">}</span>
<span class="nt">\begin{mcanswerslist}</span>[ordinal]
<span class="k">\answer</span> 12
<span class="k">\answer</span> 16
<span class="k">\answer</span><span class="na">[correct]</span> 17
<span class="k">\answer</span> 20
<span class="nt">\end{mcanswerslist}</span>
<span class="nt">\end{multicols}</span>
</code></pre>
  </div>
</div>
</p>
<p>This example also shows off one of the neat ergonomic features for answer scrambling, option <code>ordinal</code>, which permutes answer options forward and backward but keeps their relative order. So for each version you’ll see 12, 16, 17, 20; or 20, 17, 16, 12. This works well when options have a natural ordering. There’s also <code>fixlast</code> for “None of the above” type options, and also one where you manually specify permissible permutations.</p>
<h3>Other formatting stuff</h3>
<p>By default, <code>mcexam</code> will permit questions and their associated answer choices to be split across multiple pages, which is poor test ergonomics. Ensure that questions and their associated answer choices are printed on the same page:</p>
<div class="language-tex highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="tex"><span class="k">\usepackage</span><span class="p">{</span>calc<span class="p">}</span>
<span class="k">\renewenvironment</span><span class="p">{</span>setmcquestion<span class="p">}</span>
<span class="p">{</span><span class="nt">\begin{minipage}</span>[t]<span class="p">{</span><span class="k">\linewidth</span>-<span class="k">\labelwidth</span><span class="p">}}</span>
<span class="p">{</span><span class="nt">\end{minipage}</span><span class="k">\par</span><span class="p">}</span>
</code></pre>
  </div>
</div>
<p>Similar to the above, force question “instructions” to be printed on the same page:</p>
<div class="language-tex highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="tex"><span class="k">\renewenvironment</span><span class="p">{</span>setmcquestioninstruction<span class="p">}</span>
<span class="p">{</span><span class="nt">\begin{minipage}</span><span class="p">{</span><span class="k">\textwidth</span><span class="p">}}</span>
<span class="p">{</span><span class="nt">\end{minipage}</span><span class="p">}</span>
</code></pre>
  </div>
</div>
<p>This saves some space in the answer formatting by reducing the space between options:</p>
<div class="language-tex highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="tex"><span class="k">\renewenvironment</span><span class="p">{</span>setmcanswers<span class="p">}{}{}</span>
<span class="k">\setlist</span><span class="na">[setmcquestions]</span><span class="p">{</span>label=<span class="k">\mcquestionlabelfmt</span><span class="p">{</span>*<span class="p">}</span>.
                        ,ref=<span class="k">\mcquestionlabelfmt</span><span class="p">{</span>*<span class="p">}</span>
                        ,itemsep=0.5<span class="k">\baselineskip</span>
                        ,topsep=1<span class="k">\baselineskip</span>
                        <span class="p">}</span>
</code></pre>
  </div>
</div>
<p>Use Arabic numerals instead of Roman numerals for test form versioning. Our scantrons don’t use Roman numerals on the version field, so this avoids some headaches when people misinterpret Roman numerals.</p>
<div class="language-tex highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="tex"><span class="k">\renewcommand\mcversionlabelfmt</span><span class="na">[1]</span><span class="p">{</span><span class="k">\arabic</span><span class="p">{</span>#1<span class="p">}}</span>
</code></pre>
  </div>
</div>
<p>Ensure there’s always one blank page at the end of the exam (so if students turn over their exam early they don’t see question text!) You will also need to set the document class to <code>twoside</code>.</p>
<div class="language-tex highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="tex"><span class="k">\mcifoutput</span><span class="p">{</span>exam<span class="p">}{</span>
    <span class="k">\AtEndDocument</span><span class="p">{</span><span class="k">\ifodd\value</span><span class="p">{</span>page<span class="p">}</span>
        <span class="k">\newpage\thispagestyle</span><span class="p">{</span>empty<span class="p">}</span><span class="k">\hbox</span><span class="p">{</span>This page intentionally left blank.<span class="p">}</span><span class="k">\newpage\thispagestyle</span><span class="p">{</span>empty<span class="p">}</span><span class="k">\hbox</span><span class="p">{}</span>
        <span class="k">\else</span>
            <span class="k">\newpage\thispagestyle</span><span class="p">{</span>empty<span class="p">}</span><span class="k">\hbox</span><span class="p">{}</span>
        <span class="k">\fi</span><span class="p">}</span>
<span class="p">}</span>
</code></pre>
  </div>
</div>
<p>Set fancy headers and footers on each page that show the page number and exam version. Ensures that students and proctors can see if there was a printing error that caused them to miss a few pages.</p>
<div class="language-tex highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="tex"><span class="k">\usepackage</span><span class="p">{</span>fancyhdr,lastpage<span class="p">}</span>
<span class="k">\pagestyle</span><span class="p">{</span>fancy<span class="p">}</span>
<span class="k">\fancyhf</span><span class="p">{}</span>
<span class="k">\renewcommand</span><span class="p">{</span><span class="k">\headrulewidth</span><span class="p">}{</span>0pt<span class="p">}</span> 
<span class="k">\renewcommand</span><span class="p">{</span><span class="k">\footrulewidth</span><span class="p">}{</span>1pt<span class="p">}</span>
<span class="k">\lfoot</span><span class="p">{</span><span class="k">\mctheversion</span><span class="p">}</span>
<span class="k">\rfoot</span><span class="p">{</span>Page <span class="k">\thepage\ </span>of <span class="k">\pageref</span><span class="p">{</span>LastPage<span class="p">}}</span>
</code></pre>
  </div>
</div>
<p>Fix question spacing wackiness due to grouped questions.</p>
<div class="language-tex highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="tex"><span class="k">\raggedbottom</span>
</code></pre>
  </div>
</div>
<h3>Generating everything at once</h3>
<p>I’m too lazy to manually go in and generate the concept version, the answers version, etc. by renaming files. So I wrote a simple shell script to do the same with the help of some clever macros. Place this in your preamble:</p>
<div class="language-tex highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="tex"><span class="k">\usepackage</span><span class="p">{</span>etoolbox<span class="p">}</span>
<p><span class="k">\ifdef</span><span class="p">{</span><span class="k">\myoutput</span><span class="p">}{}{</span><span class="k">\def\myoutput</span><span class="p">{</span>concept<span class="p">}}</span>
<span class="k">\ifdef</span><span class="p">{</span><span class="k">\myversion</span><span class="p">}{}{</span><span class="k">\def\myversion</span><span class="p">{</span>1<span class="p">}}</span></p>
<p><span class="k">\usepackage</span>[output=<span class="k">\myoutput</span>
,numberofversions=2
,version=<span class="k">\myversion</span>
,seed=4
,randomizequestions=true
,randomizeanswers=true
,writeRfile=true
]<span class="p">{</span>mcexam<span class="p">}</span></p>
<p></code></pre>
  </div>
</div>
</p>
<p>Basically, what this says is, if the macro <code>\myoutput</code> is not defined, set it to “concept”, and if the macro <code>\myversion</code> isn’t defined, set it to “1”. But we can actually define macros on the command line, using the some clever trickery:</p>
<div class="language-sh highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="sh">xelatex <span class="s1">'\def\myversion{1} \def\myoutput{exam} \input{example.tex}'</span>
</code></pre>
  </div>
</div>
<p>This pre-defines those macros, then puts in the rest of your latex file afterwards. Combined with the <code>-jobname</code> option you can specify what file name each run should have. Here’s my full shell script:</p>
<div class="language-sh highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="sh"><span class="c">#!/bin/bash</span>
<p>xelatex <span class="nt">-jobname</span><span class="o">=</span>EXAM1 <span class="s1">’\def\myversion{1} \def\myoutput{exam} \input{example.tex}’</span> &amp;
xelatex <span class="nt">-jobname</span><span class="o">=</span>EXAM2 <span class="s1">’\def\myversion{2} \def\myoutput{exam} \input{example.tex}’</span> &amp;
xelatex <span class="nt">-jobname</span><span class="o">=</span>CONCEPT <span class="s1">’\def\myoutput{concept} \input{example.tex}’</span> &amp;
xelatex <span class="nt">-jobname</span><span class="o">=</span>KEY <span class="s1">’\def\myoutput{key} \input{example.tex}’</span> &amp;
xelatex <span class="nt">-jobname</span><span class="o">=</span>ANSWERS1 <span class="s1">’\def\myversion{1} \def\myoutput{answers} \input{example.tex}’</span> &amp;
xelatex <span class="nt">-jobname</span><span class="o">=</span>ANSWERS2 <span class="s1">’\def\myversion{2} \def\myoutput{answers} \input{example.tex}’</span> &amp;</p>
<p><span class="nb">wait</span>
</code></pre>
  </div>
</div>
</p>
<p>All six of these jobs will run in parallel. You might have to run it multiple times if you have <code>\LastPage</code> macros or other cross-references. You can then clean up the intermediate files with <code>rm *.log *.aux</code>.</p>
<h3>Randomization seeds</h3>
<p>If you have a lot of grouped questions, it’s possible that your questions will be randomized in such a way that one version of the exam is significantly longer than other questions. To that end, you should iterate over a few seeds and check them by hand to ensure that there isn’t anything weird going on. It also helps you see if you forgot to group a set of questions (common when copying question sets from Word or text documents). Note that if you add or remove questions, or change the grouping of questions, the previous seed will no longer generate the same exam. It is thus <strong>critical</strong> you don’t modify the test or change the seed until after you’ve conducted item analysis.</p>
<h3>Item analysis</h3>
<p>Let’s go over how to use the R script for item analysis. When the option <code>writeRfile</code> is set to true, <code>mcexam</code> will also write an R file of the same name as your tex script. This R file provides a single function, <code>mcprocessanswers</code> that un-permutes the questions and writes an <code>.ana</code> file of the same name, which is used by the <code>mcexam</code> package in latex. This function takes three arguments: a student ID, the exam version they took, and an answers matrix, where 1 = A, 2 = B, etc.</p>
<p>First we load up all of the packages and the <code>mcexam</code> analysis code:</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">readxl</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">magrittr</span><span class="p">)</span><span class="w">
</span><span class="n">source</span><span class="p">(</span><span class="s2">"exam.r"</span><span class="p">)</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Our exam scoring service at UCLA provides the student responses as an Excel sheet, with one student per row, and their answer choices as the columns Q001 through Q200 or so.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">version1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read_xlsx</span><span class="p">(</span><span class="s2">"version1.xlsx"</span><span class="p">)</span><span class="w">
</span><span class="n">version2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">read_xlsx</span><span class="p">(</span><span class="s2">"version2.xlsx"</span><span class="p">)</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Here I’m extracting student identifiers from the sheet. The ID number doesn’t have to be an ID number, it just has to be some kind of unique identifier.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">id1</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">version1</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">transmute</span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">str_c</span><span class="p">(</span><span class="n">IDNum</span><span class="p">,</span><span class="w"> </span><span class="n">LastName</span><span class="p">,</span><span class="w"> </span><span class="n">FirstName</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">" "</span><span class="p">))</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">extract2</span><span class="p">(</span><span class="s2">"id"</span><span class="p">)</span><span class="w">
</span><span class="n">id2</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">version2</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">transmute</span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">str_c</span><span class="p">(</span><span class="n">IDNum</span><span class="p">,</span><span class="w"> </span><span class="n">LastName</span><span class="p">,</span><span class="w"> </span><span class="n">FirstName</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">" "</span><span class="p">))</span><span class="w"> </span><span class="o">%&gt;%</span><span class="w"> </span><span class="n">extract2</span><span class="p">(</span><span class="s2">"id"</span><span class="p">)</span><span class="w">
</span><span class="n">id</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">id1</span><span class="p">,</span><span class="w"> </span><span class="n">id2</span><span class="p">)</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Generate the version numbers based on which spreadsheet the score came from.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">version</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">rep_len</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">id1</span><span class="p">)),</span><span class="w"> </span><span class="n">rep_len</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="nf">length</span><span class="p">(</span><span class="n">id2</span><span class="p">)))</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>This line extracts the columns that correspond to the questions in the exam and combines them into a single data frame.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">answers</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">bind_rows</span><span class="p">(</span><span class="n">select</span><span class="p">(</span><span class="n">version1</span><span class="p">,</span><span class="w"> </span><span class="n">Q001</span><span class="o">:</span><span class="n">Q062</span><span class="p">),</span><span class="w"> </span><span class="n">select</span><span class="p">(</span><span class="n">version2</span><span class="p">,</span><span class="w"> </span><span class="n">Q001</span><span class="o">:</span><span class="n">Q062</span><span class="p">))</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>This line the answers A, B, C, etc. into 1, 2, 3.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">answers</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="n">apply</span><span class="p">(</span><span class="n">answers</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="n">match</span><span class="p">,</span><span class="w"> </span><span class="nb">LETTERS</span><span class="p">)</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>If the student did not answer a question, recode it to an invalid, dummy value “9”.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">answers</span><span class="p">[</span><span class="nf">is.na</span><span class="p">(</span><span class="n">answers</span><span class="p">)]</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="m">9</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Generate the <code>.ana</code> analysis file.</p>
<div class="language-r highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="r"><span class="n">mcprocessanswers</span><span class="p">(</span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">version</span><span class="p">,</span><span class="w"> </span><span class="n">answers</span><span class="p">)</span><span class="w">
</span></code></pre>
  </div>
</div>
<p>Once you have that analysis file, set the <code>output</code> to <code>analysis</code> and check out the item analysis! The best statistics to look at are proportion correct, which indicates question difficulty, and item-rest correlation, which tells you whether people who scored high on the exam also scored high on that question (you want this to be positive).</p>
]]></content>
</entry>
<entry>
  <title><![CDATA[Splitting a concatenated RAxML-style PHYLIP file]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/splitting-a-concatenated-raxml-style-phylip-file/"/>
  <id>https://jonathanchang.org/blog/splitting-a-concatenated-raxml-style-phylip-file</id>
  <published>2017-09-06T00:00:00+00:00</published>
  <updated>2017-09-06T00:00:00+00:00</updated>
  <content type="html"><![CDATA[
     <p>I’ve written a little script in Python 3 that will unconcatenate a RAxML-style PHYLIP + partitions file. I’ve used it recently to do a gene tree-species tree analysis for phylogenetic inference. It’s a little slow since it doesn’t manage file handles well (or at all), but it should use very little memory and therefore be able to handle very large concatenated alignments.</p>
  <script type="text/javascript" src="https://asciinema.org/a/QQ9dQszdX7Rvuq80nvrRXnQk8.js" id="asciicast-QQ9dQszdX7Rvuq80nvrRXnQk8" async></script>
  <p>Here’s a short worked example using the data from the <a href="https://sco.h-its.org/exelixis/web/software/raxml/hands_on.html">RAxML “hands-on” tutorial</a></p>
  <ol>
    <li>
      <p>Download the script and all data files</p>
      <div class="language-bash highlighter-rouge">
        <div class="highlight">
          <pre class="highlight"><code data-lang="bash">curl <span class="nt">-LO</span> https://gist.githubusercontent.com/jonchang/34c2e8e473ec2e8f50574671e62c3365/raw/unconcatenate_phylip.py
curl <span class="nt">-LO</span> https://sco.h-its.org/exelixis/resource/download/hands-on/dna.phy
curl <span class="nt">-LO</span> https://sco.h-its.org/exelixis/resource/download/hands-on/simpleDNApartition.txt
</code></pre>
        </div>
      </div>
    </li>
    <li>
      <p>Run the script</p>
      <div class="language-bash highlighter-rouge">
        <div class="highlight">
          <pre class="highlight"><code data-lang="bash">python3 unconcatenate_phylip.py dna.phy simpleDNApartition.txt
</code></pre>
        </div>
      </div>
    </li>
    <li>
      <p>Examine the output</p>
      <div class="highlighter-rouge">
        <div class="highlight">
          <pre class="highlight"><code>INFO: Working on 10 taxa
INFO: Wrote to dna_DNA_p1.phylip
INFO: Wrote to dna_DNA_p2.phylip
</code></pre>
        </div>
      </div>
      <p>By default, the output is written in the PHYLIP format and is named like <code>{INPUT_FILE}_{PARTITION_NAME}.{FORMAT}</code>. For FASTA output, pass <code>--type=fasta</code>. You can also specify a prefix to add to the output file names with <code>--prefix=subdir/</code>, and optionally trim gaps and drop sequences consisting of only gaps with <code>--trim</code>. Check the video above for a quick demonstration of all the options.</p>
      <p>You can also enable detailed output with <code>--verbose</code>.</p>
    </li>
  </ol>
  <p><strong>This currently doesn’t support partition formats that partition by e.g., 1st + 2nd postition and 3rd position.</strong></p>
  <p><a href="https://gist.github.com/jonchang/34c2e8e473ec2e8f50574671e62c3365">See the source on GitHub</a>.</p>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[Pushing to someone else's pull request on GitHub]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/pushing-to-a-pull-request-on-github/"/>
  <id>https://jonathanchang.org/blog/pushing-to-a-pull-request-on-github</id>
  <published>2016-12-30T00:00:00+00:00</published>
  <updated>2016-12-30T00:00:00+00:00</updated>
  <content type="html"><![CDATA[
     <p>If you’re a maintainer of an open source repository on GitHub, you often want to make a small change to a pull request but don’t want to wait for the original author to make changes or open a brand new pull request. <a href="https://help.github.com/articles/committing-changes-to-a-pull-request-branch-created-from-a-fork/">GitHub now allows upstream maintainers, with permission, to push to downstream forks</a>, but the provided instructions don’t meet my needs. Specifically, I don’t want to have to re-clone an entire repository for every contributor, and I don’t want to pollute my local repository with a bunch of remotes pointing to contributors’ forks.</p>
  <p>It took a little while to figure out some problems with credentials, but basically you can just push directly to SSH URLs, assuming you have your SSH keys set up properly.</p>
  <ol>
    <li>
      <p>Open Terminal and install <code>hub</code> if you haven’t already, then navigate to your git repository. If this is your first time using <code>hub</code> you might be prompted for your GitHub credentials.</p>
      <div class="language-sh highlighter-rouge">
        <div class="highlight">
          <pre class="highlight"><code data-lang="sh">brew <span class="nb">install </span>hub
<span class="nb">cd </span>my-git-repo
</code></pre>
        </div>
      </div>
    </li>
    <li>
      <p>Fetch the changes from the downstream fork, add a tracking branch, and switch to that branch. You’ll need the GitHub pull request number and the origin set to the proper GitHub repository:</p>
      <div class="language-sh highlighter-rouge">
        <div class="highlight">
          <pre class="highlight"><code data-lang="sh">hub <span class="nb">pr </span>checkout PR_NUMBER
</code></pre>
        </div>
      </div>
    </li>
    <li>
      <p>Make changes as necessary and commit them.</p>
    </li>
    <li>
      <p>Push your changes to the downstream fork:</p>
      <div class="language-sh highlighter-rouge">
        <div class="highlight">
          <pre class="highlight"><code data-lang="sh">hub push
</code></pre>
        </div>
      </div>
    </li>
  </ol>
  <p>You should now see the changes reflected in the pull request online!</p>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[Updating Homebrew formulae when your software gets a new version]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/updating-homebrew-formulae-when-your-software-gets-a-new-version/"/>
  <id>https://jonathanchang.org/blog/updating-homebrew-formulae-when-your-software-gets-a-new-version</id>
  <published>2016-12-05T00:00:00+00:00</published>
  <updated>2016-12-05T00:00:00+00:00</updated>
  <content type="html"><![CDATA[
     <p>To get bioinformatics software on my Mac I rely on the <a href="http://brew.sh">Homebrew package manager</a>. Unfortunately, the <a href="https://github.com/brewsci/homebrew-bio">Brewsci/bio tap</a>, where most biology software lives, doesn’t always update right after a new version is released. While you <em>could</em> wait for someone to update it for you, it’s faster and more fun to do it yourself, and you also get to learn about contributing to Homebrew!</p>
  <p>I’ve written a brief tutorial / account that goes over my experience updating the <a href="http://www.drive5.com/muscle/">muscle</a> aligner from version 3.8.31 to 3.8.1551. It also covers some things I commonly experience when updating Homebrew formulae, and assumes some basic familiarity with the macOS Terminal.</p>
  <p><strong>NOTE</strong>: This is the full tutorial which you may find useful, but for uncomplicated revisions (“Basic Steps” only) you can use:</p>
  <div class="highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code>brew install hub &amp;&amp; brew bump-formula-pr muscle
</code></pre>
    </div>
  </div>
  <p>with <code>--url=...</code> and <code>--sha256=...</code> or <code>--tag=...</code> and <code>--revision=...</code> arguments to <code>brew bump-formula-pr</code>.</p>
  <h2>Basic steps</h2>
  <ol>
    <li>
      <p>If you don’t have one already, <a href="https://github.com/join">sign up for a GitHub account</a>! All Homebrew development is done over GitHub.</p>
    </li>
    <li>
      <p>Install <a href="https://hub.github.com/">hub</a> via <code>brew install hub</code> in the Terminal.</p>
    </li>
    <li>
      <p><code>cd $(brew --repo brewsci/bio)</code></p>
    </li>
    <li>
      <p>Fork the Brewsci/bio repository with <code>hub fork</code>. If this is your first time using <code>hub</code> you will be prompted for your GitHub username and password.</p>
    </li>
    <li>
      <p>Create a branch to work on the updated formula with <code>git checkout -b muscle-3.8.1551</code>.</p>
    </li>
    <li>
      <p>Edit <code>muscle.rb</code> in your preferred editor. <code>brew edit muscle</code> will default to <code>vim</code>. If you have <a href="https://www.sublimetext.com/">Sublime Text</a> or <a href="https://macromates.com/">TextMate</a> installed it might open in those editors. You can change this to e.g., <code>nano</code> with <code>VISUAL=nano brew edit muscle</code>.</p>
    </li>
    <li>
      <p>In <code>muscle.rb</code>, set the <code>url</code> to the URL of the latest release. In the case of <code>muscle</code> I also had to update the <code>version</code> field.</p>
    </li>
    <li>
      <p>Back in the Terminal (outside of your editor), run <code>brew fetch muscle</code> to get the latest release. <code>brew</code> will complain about a hash mismatch as the formula still contains the old hash. Don’t worry about this – just copy the provided hash into the <code>sha256</code> field of <code>muscle.rb</code> in your editor.</p>
    </li>
    <li>
      <p>The formula has a <code>revision</code> field, so remove that line as it’s a new version.</p>
    </li>
  </ol>
  <p>Now run <code>brew install -vsd --git muscle</code>. Hopefully the new release compiles!</p>
  <p>The <code>v</code> turns on verbose mode so you can see what’s happening, <code>s</code> says to build from source instead of using binary bottles, <code>d</code> enables debugging so you can enter the build directory in case anything goes wrong, and <code>--git</code> turns the build directory into a git repository so you can make changes and put them in to the formula as patches to fix minor build problems.</p>
  <h2>When things go wrong</h2>
  <p>It didn’t build! Homebrew errors out with a message <code>No such file or directory - src/globalsosx.cpp</code>. If we drop into a shell (thanks to the handy debug option) we can see pretty readily that the <code>src/</code> directory no longer exists in this updated version of <code>muscle</code>. Exit out of the debugging shell by pressing <code>Ctrl-D</code> twice.</p>
  <p>Looking at the <code>muscle</code> formula, we can see that the the error is caused by a patch that fixes build failures on newer versions of macOS:</p>
  <div class="language-ruby highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="ruby">  <span class="k">def</span> <span class="nf">install</span>
    <span class="c1"># This patch makes 3.8.31 build on OSX &gt;= Lion.</span>
    <span class="c1"># It has been reported upstream but not fixed yet.</span>
    <span class="n">inreplace</span> <span class="s2">"src/globalsosx.cpp"</span><span class="p">,</span>
              <span class="s2">"#include &lt;mach/task_info.h&gt;"</span><span class="p">,</span>
              <span class="s2">"#include &lt;mach/vm_statistics.h&gt;</span><span class="se">\n</span><span class="s2">#include &lt;mach/task_info.h&gt;"</span>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;span class="c1"&gt;# This patch makes 3.8.31 build on RHEL 7.x&lt;/span&gt;
&lt;span class="c1"&gt;# It ONLY affects Linux (in an "if Linux" clause in the 'mk' script)&lt;/span&gt;
&lt;span class="c1"&gt;# It is unnecessary to create a static binary&lt;/span&gt;
&lt;span class="n"&gt;inreplace&lt;/span&gt; &lt;span class="s2"&gt;"src/mk"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s2"&gt;"LINK_OPTS=-static"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
          &lt;span class="s2"&gt;"LINK_OPTS="&lt;/span&gt;

&lt;span class="n"&gt;cd&lt;/span&gt; &lt;span class="s2"&gt;"src"&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt;
  &lt;span class="nb"&gt;system&lt;/span&gt; &lt;span class="s2"&gt;"make"&lt;/span&gt;
  &lt;span class="n"&gt;bin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"muscle"&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
</code></pre>
    </div>
  </div>
  <p><span class="k">end</span>
  </code></pre>
</div>
</div>
</p>
<p>Usually when adding patches, you would also report the bug and submit any relevant patches to the upstream software developers. Here we can see that the bug was reported, so let’s see if it’s been fixed by removing all the patches. We should also change the <code>cd &quot;src&quot; do</code> line as the <code>src/</code> directory no longer exists.</p>
<div class="language-ruby highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="ruby">  <span class="k">def</span> <span class="nf">install</span>
    <span class="nb">system</span> <span class="s2">"make"</span>
    <span class="n">bin</span><span class="p">.</span><span class="nf">install</span> <span class="s2">"muscle"</span>
  <span class="k">end</span>
<p></code></pre>
  </div>
</div>
</p>
<p>Rerun <code>brew install -vsd --git muscle</code> to see if this fixes things. Looks like there’s still a build error:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>ld: library not found for -lcrt0.o
clang: error: linker command failed with exit code 1 (use -v to see invocation)
</code></pre>
  </div>
</div>
<p>Build errors can be exotic and difficult to debug, but we’re in luck here: if you examine the <code>Makefile</code> provided by <code>muscle</code> (by dropping into the debug shell and running <code>cat Makefile</code>) you can see that the developer of <code>muscle</code> added a handy tip for macOS users:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code># On OSX, using -static gives the error "ld: can't locate file for: -lcrt0.o",
# this is fixed by deleting "-static" from the LDLIBS line.
</code></pre>
  </div>
</div>
<p>Let’s include this patch in our new version of <code>muscle</code>. Homebrew provides a function <code>inreplace</code> that lets you make simple changes to files. For more complicated patches, it’s better to run <code>brew install --interactive --git muscle</code> and follow the instructions to include a full diff file into the formula.</p>
<p>Remove all instances of <code>-static</code> and try building it again with <code>brew install -vsd --git muscle</code>.</p>
<div class="language-ruby highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="ruby">  <span class="k">def</span> <span class="nf">install</span>
    <span class="c1"># Fix build per Makefile instructions</span>
    <span class="n">inreplace</span> <span class="s2">"Makefile"</span><span class="p">,</span> <span class="s2">"-static"</span><span class="p">,</span> <span class="s2">""</span>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>&lt;span class="nb"&gt;system&lt;/span&gt; &lt;span class="s2"&gt;"make"&lt;/span&gt;
&lt;span class="n"&gt;bin&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;install&lt;/span&gt; &lt;span class="s2"&gt;"muscle"&lt;/span&gt;
</code></pre>
  </div>
</div>
<p><span class="k">end</span>
</code></pre>
</div>
</div>
</p>
<h2>When things (finally) go right</h2>
<p>Success! Test that everything went well with <code>brew test -v muscle</code> and check that the style and syntax of the formula are correct with <code>brew audit --strict --online muscle</code>. <code>test</code> works great, but <code>audit</code> complained and said:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>brewsci/bio/muscle:
  * Stable: version 3.8.1551 is redundant with version scanned from URL
</code></pre>
  </div>
</div>
<p>Remove the <code>version</code> line and rerun <code>audit</code> to ensure that there are no further issues.</p>
<h2>Submitting your changes</h2>
<p>Commit the changes with <code>git commit muscle.rb -m &quot;muscle 3.8.1551&quot;</code> and push them up to your fork with <code>git push -u &lt;USERNAME&gt;</code>, replacing <code>&lt;USERNAME&gt;</code> with your own GitHub username. Then run <code>hub pull-request</code> and fill out the details of the pull request template and a short message explaining your changes (if relevant).</p>
<p>You’ll now need to wait while the build infrastructure compiles the changes to your proposed formula. A maintainer will eventually come along and either merge your pull request or ask you to make some changes. Once your new formula is merged, all other users of Homebrew get to enjoy your work! Pat yourself on the back for helping others out and contributing to a important piece of open-source software.</p>
<h2>Closing notes</h2>
<p>There’s a lot of edge cases to building software, which is why package managers exist so regular folk don’t have to deal with these problems. I’ve spent <a href="https://twitter.com/chang_jon/status/504859294303805440">too</a> <a href="https://twitter.com/chang_jon/status/505082689037545472">much</a> <a href="https://twitter.com/chang_jon/status/515599346381762561">time</a> diagnosing arcane linker errors; I contribute to Homebrew so no one else has to needlessly suffer.</p>
<p>If you regularly use scientific software that isn’t already in Homebrew or Brewsci, please consider <a href="https://github.com/Homebrew/homebrew-core/blob/master/CONTRIBUTING.md">contributing a formula</a>, or, if your software is too unstable or specialized, <a href="https://docs.brew.sh/How-to-Create-and-Maintain-a-Tap">starting a Homebrew tap for your own tools</a>. I maintain <a href="https://github.com/jonchang/homebrew-biology">my own tap</a> that I add to whenever I come across bio software that doesn’t already exist in Homebrew-science. Consult the <a href="https://docs.brew.sh/Formula-Cookbook">Formula Cookbook</a> for tips on crafting Homebrew formulae.</p>
]]></content>
</entry>
<entry>
  <title><![CDATA[Setting up Samba home folder shares for a CentOS 7 server]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/setting-up-samba-home-folder-shares-for-a-centos-7-server/"/>
  <id>https://jonathanchang.org/blog/setting-up-samba-home-folder-shares-for-a-centos-7-server</id>
  <published>2016-02-29T21:53:39+00:00</published>
  <updated>2016-02-29T21:53:39+00:00</updated>
  <content type="html"><![CDATA[
     <p>CentOS 7 has made life so much easier compared to the <a href="/blog/setting-up-samba-home-folder-shares-for-a-centos-6-server-and-mac-os-x-client">last time</a>. All of the following commands need to be run as the superuser.</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">yum <span class="nb">install </span>samba samba-client samba-common
systemctl <span class="nb">enable </span>smb
systemctl <span class="nb">enable </span>nmb
setsebool <span class="nt">-P</span> samba_enable_home_dirs on
firewall-cmd <span class="nt">--permanent</span> <span class="nt">--zone</span><span class="o">=</span>public <span class="nt">--add-service</span><span class="o">=</span>samba
firewall-cmd <span class="nt">--reload</span>
</code></pre>
    </div>
  </div>
  <p>You might also need to set your Samba password depending on how your system is setup. Thus,</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">smbpasswd <span class="nt">-a</span> YOUR_USER_NAME
</code></pre>
    </div>
  </div>
  <p>If you have symlinks in your home directory to other bits of the system you will also need to edit <code>/etc/samba/smb.conf</code>. In the <code>[homes]</code> section, add:</p>
  <div class="language-conf highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="conf"><span class="n">follow</span> <span class="n">symlinks</span> = <span class="n">yes</span>
<span class="n">wide</span> <span class="n">links</span> = <span class="n">yes</span>
</code></pre>
    </div>
  </div>
  <p>Then, in the <code>[global]</code> section, add:</p>
  <div class="language-conf highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="conf"><span class="n">unix</span> <span class="n">extensions</span> = <span class="n">no</span>
</code></pre>
    </div>
  </div>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[Snow Leopard Server with Mavericks clients, and why to avoid Mac Server]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/snow-leopard-server-with-mavericks-clients/"/>
  <id>https://jonathanchang.org/blog/snow-leopard-server-with-mavericks-clients</id>
  <published>2015-12-01T06:54:08+00:00</published>
  <updated>2015-12-01T06:54:08+00:00</updated>
  <content type="html"><![CDATA[
     <p><em>Note: the below text was written in 2014 but never posted. I have kept the same tense since I don’t want to rewrite it.</em></p>
  <p>Our lab has a setup where we have multiple workstation iMacs that connect to a Mac Mini server, with network accounts stored on the server. Previously, both server and clients were running Snow Leopard, but I’ve just upgraded two of the clients to Mavericks while keeping the server on Snow Leopard. (Upgrading to Mavericks server costs money). I’ve been running this setup for a few weeks now and nothing major seems to have broken. For details of our lab’s setup, and why you should avoid Mac Server, read on…</p>
  <p>In 2011 my advisor purchased an expensive Mac Mini Snow Leopard server for the lab, to store data generated by the lab and do other things that only servers can do. The lab had several iMacs that served as workstations for lab members, as well as a Mac Pro used for data analysis. Data were originally stored on the Mac Pro as well, but with the new server we could separate those tasks and use the Mac Pro only for analysis work.</p>
  <p>At the time, I didn’t do too much research into Mac Server and assumed that things would “just work” out of the box. I also figured that the best way to handle the “data storage” aspect was to store everything on the server, including user accounts. In essence, this means that user home directories were stored server-side and all home folder access and user authentication is done over the network.</p>
  <p>The network account system has a couple of advantages. Every member of the lab could have their own account, files, and settings, that would work regardless of which computer they signed in on. It’s actually really cool to be able to just sign out of one computer and then sit down at another and keep going right where you left off. Central control of user data and user accounts also meant that backups were also centralized, hopefully insuring against data being lost because it was on an old machine that was rarely used.</p>
  <p>However, the drawbacks are pretty significant. The major one is that the server is a single point of failure for every single person. Originally, if the file server was down, you couldn’t get access to the shared lab data, but your own data were still intact on your workstation. Now, if either the server or the network were down, you couldn’t login to your own computer, never mind access your files.</p>
  <p>Other quality of life issues include:</p>
  <ul>
    <li>Time Machine backups won’t work on the server due to bad interactions with Open Directory databases. We use rsnapshot to the other internal drive and CCC to an external drive.</li>
    <li>Home directories served over AFP prevent multiple accounts being signed into the same workstation. This is because the first user signed in will mount the AFP share under /Users with their own permissions, so the second user will try to mount the same share, fail due to permission issues, and throw a cryptic error message confusing everyone.</li>
    <li>Home directories served under NFS don’t play nice with many programs, especially ones that assume that all user directories on Macs is stored under /Users. I also encountered a bug with Dropbox not working since it was trying to acquire a file lock and failing (probably because Apple’s NFS implementation was bad, or perhaps that NFS in general is bad). To Dropbox’s credit, I emailed their support team and they eventually fixed the bug.</li>
    <li>Every client computer’s DNS must point to the server computer. This actually isn’t necessary if your upstream DNS assigns a fixed hostname for your server. Ours does, but the problem is that I originally configured the server to have a different hostname than its actual name. Once past the initial setup, changing the server’s name is so onerous that most guides I consulted recommended doing a clean install of Mac Server with the correct hostname. By running your own DNS and pointing all client computers to your server, the server can lie to itself and its connecting clients about its own name and thus everyone is happy. Note that it’s actually possible to change the hostname: during initial setup I made a typo in one of the boxes that you must enter the hostname into and in doing so created a difficult to track down bug that caused client computers to sometimes stall during login for minutes at a time. Tracking down this one involved looking at a lot of log files and noticing attempts to connect to an incorrectly-spelled host. I fixed this one by looking through all the configuration files for the misspelled name and correcting it.</li>
    <li>You will be terrified of restarting the server for updates. Every time I have updated the server, it refuses to work correctly in spectacular and exciting new ways. This one has probably shortened my life expectancy by a couple of years, at least.</li>
  </ul>
  <p><strong>Update 2015</strong>: Basically, I don’t think using OS X server is Worth It. Thankfully Apple seems to have scaled back their server offerings so hopefully no one else is falling into this trap. Our new setup uses a dedicated network appliance (Synology NAS) to serve files, combined with several (quite beefy) shared Linux workstations. All devices can connect to the Synology NAS over NFS. The issue of having shared desktops but personal accounts is no longer a problem, since all analyses are performed on the workstations, and everyone has their own personal laptop that they used to remotely login to the shared workstations. Although this is less secure in the sense that it’s possible for people to trample over each other’s files, in practice this hasn’t really been that big of an issue. There are separate horrors associated with accessing an NFS share on Linux systems with automount, but I’ll save that for a separate post.</p>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[SICB 2015 presentation slides!]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/sicb-2015-presentation-slides/"/>
  <id>https://jonathanchang.org/blog/sicb-2015-presentation-slides</id>
  <published>2015-02-08T00:11:07+00:00</published>
  <updated>2015-02-08T00:11:07+00:00</updated>
  <content type="html"><![CDATA[
     <p>I presented my crowdsourced morphometrics work with the Encyclopedia of Life at the Society for Integrative and Comparative Biology last month! I got some great feedback, both in person and via Twitter. Check out the slides via the link below!</p>
  <p>Chang, Jonathan (2015): Crowdsourced morphometric data are as accurate as traditionally collected data. figshare. <a href="http://dx.doi.org/10.6084/m9.figshare.1284494">http://dx.doi.org/10.6084/m9.figshare.1284494</a></p>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[Setting up Samba home folder shares for a CentOS 6 server and Mac OS X client]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/setting-up-samba-home-folder-shares-for-a-centos-6-server-and-mac-os-x-client/"/>
  <id>https://jonathanchang.org/blog/setting-up-samba-home-folder-shares-for-a-centos-6-server-and-mac-os-x-client</id>
  <published>2014-11-26T21:59:28+00:00</published>
  <updated>2014-11-26T21:59:28+00:00</updated>
  <content type="html"><![CDATA[
     <p><em>Update</em>: <a href="/blog/setting-up-samba-home-folder-shares-for-a-centos-7-server/">Setting up home directory shares is now <strong>much</strong> easier on CentOS 7</a>.</p>
  <p>On Mac OS X if you want to share your home folder over the network with authentication, you only have to tick a check box in System Preferences and It Just Works™. On CentOS Linux? Well…</p>
  <p>Today I wanted to access my home folder on our Linux analysis machine over the network on a Mac OS X client. Although I could have just done everything in a Terminal, I like the pretty graphics of Finder and being able to see my files without typing <code>ls -l</code>. The Linux machine in question is one I installed CentOS 6 on a while back. (Which, by the way, was a big mistake, since CentOS apparently does not maintain packages for things younger than a decade).</p>
  <p>I first looked into using NFS since apparently that’s the thing you use for Linux machines, but if you don’t want NFS to share your files with the entire world, you need to set up a Kerberos key distribution service. That is unappealing given that I just want to access my own files over the network. So I settled on Samba instead. (Apple Filing Protocol, the only other option for an OS X client, is 100% out of the question because it is awful and I’m pretty sure it’s not supported on Linux).</p>
  <h2>Configuration files</h2>
  <p>There are roughly a dozen configuration files you need to edit in order for Samba to work properly. I don’t actually know which files I need to edit, I just kept doing things I found on the Internet until Samba started working.</p>
  <p><strong>/etc/samba/smb.conf</strong>: We want to let each user access their own home directory over Samba.</p>
  <div class="language-conf highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="conf">[<span class="n">global</span>]
   <span class="n">workgroup</span> = <span class="n">WORKGROUP</span>
   <span class="n">server</span> <span class="n">string</span> = <span class="n">Samba</span> <span class="n">Server</span>
   <span class="n">netbios</span> <span class="n">name</span> = <span class="n">SAMBA</span>
   <span class="c"># change hosts allow to the subnet you want to share files across
</span>   <span class="n">hosts</span> <span class="n">allow</span> = <span class="m">192</span>.<span class="m">168</span>.<span class="m">0</span>.
   <span class="n">log</span> <span class="n">file</span> = /<span class="n">var</span>/<span class="n">log</span>/<span class="n">samba</span>/<span class="n">log</span>.%<span class="n">m</span>
   <span class="n">max</span> <span class="n">log</span> <span class="n">size</span> = <span class="m">50</span>
   <span class="n">security</span> = <span class="n">user</span>
   <span class="n">map</span> <span class="n">to</span> <span class="n">guest</span> = <span class="n">bad</span> <span class="n">user</span>
   <span class="n">passdb</span> <span class="n">backend</span> = <span class="n">tdbsam</span>
<p><span class="c"># this will let people log into their own home directories
</span>[<span class="n">homes</span>]
<span class="n">comment</span> = <span class="n">Home</span> <span class="n">Directories</span>
<span class="n">browseable</span> = <span class="n">no</span>
<span class="n">writable</span> = <span class="n">yes</span>
<span class="n">valid</span> <span class="n">users</span> = %<span class="n">S</span>
<span class="n">create</span> <span class="n">mask</span> = <span class="m">0700</span>
<span class="n">directory</span> <span class="n">mask</span> = <span class="m">0700</span>
</code></pre>
    </div>
  </div>
</p>
<p><strong>/etc/sysconfig/iptables</strong>: I don’t know if these are all needed but it seems to work. Again change the subnet with the one you want to actually share across (to match hosts allow in smb.conf)</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>-A INPUT -s 192.168.0.0/24 -m state --state NEW -p udp --dport 137 -j ACCEPT
-A INPUT -s 192.168.0.0/24 -m state --state NEW -p tcp --dport 137 -j ACCEPT
-A INPUT -s 192.168.0.0/24 -m state --state NEW -p udp --dport 138 -j ACCEPT
-A INPUT -s 192.168.0.0/24 -m state --state NEW -p tcp --dport 138 -j ACCEPT
-A INPUT -s 192.168.0.0/24 -m state --state NEW -p tcp --dport 139 -j ACCEPT
-A INPUT -s 192.168.0.0/24 -m state --state NEW -p udp --dport 139 -j ACCEPT
-A INPUT -s 192.168.0.0/24 -m state --state NEW -p tcp --dport 445 -j ACCEPT
-A INPUT -s 192.168.0.0/24 -m state --state NEW -p udp --dport 445 -j ACCEPT
</code></pre>
  </div>
</div>
<p>Then restart the services with</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code># service iptables restart
iptables: Flushing firewall rules:                         [  OK  ]
iptables: Setting chains to policy ACCEPT: filter          [  OK  ]
iptables: Unloading modules:                               [  OK  ]
iptables: Applying firewall rules:                         [  OK  ]
# service smb restart
Shutting down SMB services:                                [  OK  ]
Starting SMB services:                                     [  OK  ]
</code></pre>
  </div>
</div>
<p>We also need to let SELinux know that we’re not doing any terrorist activities on our Samba share:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code># setsebool -P samba_enable_home_dirs on
</code></pre>
  </div>
</div>
<p>Now to test it locally with <code>smbclient</code>:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>$ smbclient //localhost/myuser -U myuser
Enter myuser's password:
Domain=[WORKGROUP] OS=[Unix] Server=[Samba 3.6.23-12.el6]
tree connect failed: NT_STATUS_ACCESS_DENIED
</code></pre>
  </div>
</div>
<p>OK, so that bit in <code>smb.conf</code> about <code>passdb backend = tdbsam</code> requiring “no further configuration” is apparently a total lie. Luckily there exists <code>smbpasswd</code> for backwards compatibility, so let’s just use that:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code># smbpasswd -a myuser
New SMB password:
Retype new SMB password:
Added user myuser.
</code></pre>
  </div>
</div>
<p>Trying <code>smbclient</code> again yields:</p>
<div class="highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code>$ smbclient //localhost/myuser -U myuser
Enter myuser's password:
Domain=[WORKGROUP] OS=[Unix] Server=[Samba 3.6.23-12.el6]
smb: \&gt;
</code></pre>
  </div>
</div>
<p>Victory! Now to test it on OS X. In Finder, click Go =&gt; Connect to Server… and enter <code>smb://192.168.0.101</code> (or whatever your server is called) and type in your credentials. Hopefully it works!</p>
<h2>Footnote</h2>
<p>I don’t know why it is so complicated. I suspect that ultimately it’s my own fault for installing CentOS but honestly I’m inclined to think that everything involving Linux is awful and terrible.</p>
]]></content>
</entry>
<entry>
  <title><![CDATA[Using RMarkdown, knitr, and pandoc in TexShop on Mac]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/using-rmarkdown-knitr-and-pandoc-in-texshop-on-mac/"/>
  <id>https://jonathanchang.org/blog/using-rmarkdown-knitr-and-pandoc-in-texshop-on-mac</id>
  <published>2014-09-26T07:18:36+00:00</published>
  <updated>2014-09-26T07:18:36+00:00</updated>
  <content type="html"><![CDATA[
     <p>Most people will use RStudio for this sort of workflow, but I use <a href="http://pages.uoregon.edu/koch/texshop/">TeXShop</a> because I prefer the side-by-side editing view with plain text on the left and the formatted version on the right. I don’t think RStudio supports it. Also, TeXShop feels like a native OS X application. RStudio can’t really shed its Qt roots no matter how hard it tries.</p>
  <p>Here’s how to get TexShop working with your RMarkdown workflow.</p>
  <p>I assume you’ve already installed <a href="http://johnmacfarlane.net/pandoc/installing.html">pandoc</a> and <a href="https://tug.org/mactex/">MacTeX</a>. If you don’t already have RMarkdown installed, load up an R session:</p>
  <div class="language-r highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="r"><span class="n">install.packages</span><span class="p">(</span><span class="s2">"devtools"</span><span class="p">)</span><span class="w">
</span><span class="n">devtools</span><span class="o">::</span><span class="n">install_github</span><span class="p">(</span><span class="s2">"rstudio/rmarkdown"</span><span class="p">,</span><span class="w"> </span><span class="n">dependencies</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span></code></pre>
    </div>
  </div>
  <p>First, you’ll need to add a custom RMarkdown engine for TexShop. These are located in <code>~/Library/TeXShop/Engines</code> and are simple executable script files with the extension <code>.engine</code>. Let’s add an rmarkdown engine. Open up the Terminal:</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">vim ~/Library/TeXShop/Engines/rmarkdown.engine        <span class="c"># or nano, etc.</span>
<span class="nb">chmod </span>a+x ~/Library/TeXShop/Engines/rmarkdown.engine
</code></pre>
    </div>
  </div>
  <p>Inside that <code>rmarkdown.engine</code> file you can just paste in these contents:</p>
  <div class="language-sh highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="sh"><span class="c">#!/bin/bash</span>
<p>Rscript <span class="nt">-e</span> <span class="s2">“rmarkdown::render(</span><span class="se">&quot;</span><span class="nv">$1</span><span class="se">&quot;</span><span class="s2">, encoding=‘UTF-8’)”</span>
</code></pre>
    </div>
  </div>
</p>
<p>TeXShop will pass in the name of your Rmarkdown file as the first argument to your script, so you can pass it to R inside the variable <code>$1</code>. Note that you might have to change the encoding argument to <code>rmarkdown::render</code> if you have TeXShop saving files in something other than UTF8. It’s important to get this right, otherwise non-ASCII characters will cause random paragraphs to turn into <code>NA</code>s. (Fixing this bug is Someone Else’s Problem because the workaround is adequate and there are only so many hours in the day.)</p>
<p>Finally you need to get TeXShop to recognize <code>.Rmd</code> files. By default TeXShop will refuse to let you “typeset” files with extensions that it doesn’t recognize. Though TeXShop does support plain <code>.md</code> files, the RMarkdown package will not knit these and will bypass any R code found in plain Markdown files. So you must write with the <code>.Rmd</code> extension. Fortunately there’s a hidden preference that you can tweak. Simply open Terminal and type:</p>
<div class="language-sh highlighter-rouge">
  <div class="highlight">
    <pre class="highlight"><code data-lang="sh">defaults write TeXShop OtherTeXExtensions <span class="nt">-array-add</span> <span class="s2">"Rmd"</span>
defaults write TeXShop OtherTeXExtensions <span class="nt">-array-add</span> <span class="s2">"rmd"</span>
</code></pre>
  </div>
</div>
<p>Now when you open <code>.Rmd</code> files, simply select the “rmarkdown” engine from the drop down list in the toolbar and type away.</p>
<p><em>Other hidden preferences can be found in TeXShop’s extensive help files. I actually started grepping the source code to hack in this functionality until I figured out that the documentation for TeXShop was actually quite good. I’ve been spoiled by scientific software for too long.</em></p>
]]></content>
</entry>
<entry>
  <title><![CDATA[Fixing pandoc "out of memory" errors on Windows]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/fixing-pandoc-out-of-memory-errors-on-windows/"/>
  <id>https://jonathanchang.org/blog/fixing-pandoc-out-of-memory-errors-on-windows</id>
  <published>2014-09-21T21:37:29+00:00</published>
  <updated>2014-09-21T21:37:29+00:00</updated>
  <content type="html"><![CDATA[
     <p><img src="/uploads/2014/pandoc_oom_1.jpg" alt="pandoc crashes due to address space exhaustion." /></p>
  <p>Recently I’ve been using the rmarkdown + knitr + pandoc workflow to write manuscripts. Markdown with Pandoc is roughly a million times easier to use than the equivalent LaTeX workflow. With the addition of RMarkdown and knitr included in RStudio, you can also weave R plots and output directly into your manuscripts. I was inspired to do this by Carl Boettiger’s <a href="http://www.carlboettiger.info/">online lab notebook</a> and Rich FitzJohn’s “<a href="https://github.com/richfitz/wood">how much of the world is woody</a>” reproducible research GitHub repository.</p>
  <p>I was working with a Markdown document in RStudio when, after adding a bunch of citations, <code>pandoc.exe</code> was crashing with an out-of-memory error. My Windows PC has 8 gigabytes of RAM and I found it unlikely that pandoc could consume <em>that</em> much memory. After checking the Task Manager, it was clear that pandoc was only consuming about 1.8 GB of memory, suggesting that it was not a true out-of-memory error, but rather virtual memory address space exhaustion†.</p>
  <p>Luckily for us there is a utility that comes with <a href="http://www.visualstudio.com/en-us/products/visual-studio-express-vs.aspx">Microsoft Visual Studio</a> (it’s free!) that allows us to poke around in the executable file’s headers and forcefully enable a special flag that should help alleviate this issue. Once you have VS installed, start up the developer command prompt in elevated mode (Shift-Right-click – Run as Administrator) and type into the terminal:</p>
  <div class="highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code>editbin /LARGEADDRESSAWARE "C:\Program Files\RStudio\bin\pandoc\pandoc.exe"
editbin /LARGEADDRESSAWARE "C:\Program Files\RStudio\bin\pandoc\pandoc-citeproc.exe"
</code></pre>
    </div>
  </div>
  <p>You can then use <code>dumpbin /headers &quot;C:\Program Files\RStudio\bin\pandoc\pandoc.exe&quot; | more</code> and look for “Application can handle large (&gt;2GB) addresses” to confirm that the fix worked.</p>
  <p><strong>Update</strong>: <a href="https://www.techpowerup.com/forums/threads/large-address-aware.112556/">The Large Address Aware app</a> can do the same with a GUI and no need to download the entirety of VS.</p>
  <p><img src="/uploads/2014/pandoc_oom_2.jpg" alt="pandoc successfully using &gt;2GB of memory." /></p>
  <p>†<em>Technical details:</em> 32-bit Windows systems can address up to 4GB of RAM, but all versions of Windows limit the program to 2GB, since the other 2GB of address space is reserved by the kernel. Windows XP introduced the <code>/LARGEADDRESSAWARE</code> flag that allowed 32-bit programs to address up to 3GB of RAM on 32-bit systems, and 4GB on 64-bit systems. There was also Physical Address Extensions which allowed &gt;4GB addressing, but I don’t think anyone outside of the server realm ever used it.</p>
  <p>All that is really necessary to make 32-bit programs use the full 4GB of address space is to set a special linker flag and make sure that your code doesn’t have faulty assumptions about how Windows lays out its memory. For example, if you knew that the kernel always reserved the upper half of the address space for itself, you can <a href="https://en.wikipedia.org/wiki/Tagged_pointer">smuggle data into the upper two bytes of user-space pointers</a>.</p>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[How accurate are crowdsourced morphometricians?]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/how-accurate-are-crowdsourced-morphometricians/"/>
  <id>https://jonathanchang.org/blog/how-accurate-are-crowdsourced-morphometricians</id>
  <published>2014-08-23T02:55:46+00:00</published>
  <updated>2014-08-23T02:55:46+00:00</updated>
  <content type="html"><![CDATA[
     <p><em>Previously: <a href="/blog/building-a-web-based-image-markup-system">Building a web-based image markup system</a></em></p>
  <p>One of the main goals of my Encyclopedia of Life project is to speed up the collection of phenotypic data through crowdsourcing. However, we cannot expect that the typical crowdsourced worker has the same domain-specific knowledge that an expert scientist has. But does this make a difference when digitizing the shape of fishes?</p>
  <p>To look for a difference, I constructed an experiment where crowdsourced Amazon Mechanical Turk workers would digitize the same set of 5 images 5 times each. I then asked some expert fish morphologists to digitize the same images using the same instructions. This setup allowed me to examine how consistent marks were for each group of workers, and also compare the two to see if their marks differed on average. The results are below:</p>
  <p><img src="/uploads/2014/08/1_compare.jpg" alt="Landmarks by MTurk workers" /></p>
  <p><img src="/uploads/2014/08/2_compare.jpg" alt="Landmarks by experts" /></p>
  <p>Can you spot the difference between the two images? The top image shows landmarks averaged across several MTurk workers, while the bottom image is from a fish morphologist following the same protocol. The length of each line indicates the amount of error in each x,y direction.</p>
  <p>Many landmarks are qualitatively identically marked. However, there is a difference, especially in the fin landmarks. The expert consistently uses the most anterior and posterior fin rays and marks it accordingly; however, turkers will instead tend towards the point that more intuitively defines the shape of the fin.</p>
  <p>Both approaches are correct in a sense, though they are looking at very different aspects of fish morphology. This discrepancy is in part due to a difference in how turkers interpret the protocol. I am currently working to further refine this protocol in order to reduce this difference and get results that are nearly indistinguishable from traditionally collected data sets.</p>
  <p>All of our protocols and code are open source, available on GitHub: <a href="https://github.com/jonchang/eol-mturk-landmark">1</a> <a href="https://github.com/jonchang/fake-mechanical-turk">2</a></p>
  <p><em>Many thanks to the Mechanical Turk workers and Tina Marcroft for digitizing images, and Matt McGee, Adam Summers, and Brian Sidlauskas for helping to clarify the protocol.</em></p>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[Count missing characters in FASTA files with a shell one-liner]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/count-missing-characters-in-fasta-files-with-a-shell-one-liner/"/>
  <id>https://jonathanchang.org/blog/count-missing-characters-in-fasta-files-with-a-shell-one-liner</id>
  <published>2014-05-13T21:32:19+00:00</published>
  <updated>2014-05-13T21:32:19+00:00</updated>
  <content type="html"><![CDATA[
     <p>One of the best things about working with FASTA and PHYLIP files is they are relatively simple file formats and thus are easy to parse with command-line tools. There are certainly a lot of negatives to these file types but it is handy for certain types of tasks.</p>
  <p>I needed to find out how much missing data were in our FASTA and PHYLIP multiple sequence alignments. While it would be straightforward to write a one-off Python or R script, for these simple tasks the power of a full programming environment isn’t strictly necessary. Here I show you how to build up a small shell pipeline to count missing characters.</p>
  <p>To count the total number of characters in a file:</p>
  <div class="language-sh highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="sh"><span class="nb">wc</span> <span class="nt">-c</span> file.fasta
</code></pre>
    </div>
  </div>
  <p>Count the total number of <em>sequence</em> characters in a file:</p>
  <div class="language-sh highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="sh"><span class="nb">grep</span> <span class="nt">-v</span> <span class="s2">"^&gt;"</span> file.fasta | <span class="nb">wc</span> <span class="nt">-c</span>
</code></pre>
    </div>
  </div>
  <p>Do this for all sequence files:</p>
  <div class="language-sh highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="sh">find <span class="nb">.</span> <span class="nt">-name</span> <span class="s2">"*.fasta"</span> <span class="nt">-exec</span> sh <span class="nt">-c</span> <span class="s1">'grep -v "^&gt;" "$1" | wc -c'</span> <span class="nt">--</span> <span class="o">{}</span> <span class="se">\;</span>
</code></pre>
    </div>
  </div>
  <p>…and add them up:</p>
  <div class="highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code>find . -name "*.fasta" -exec sh -c 'grep -v "^&gt;" "$1" | wc -c' -- {} \; | paste -s -d+ - | bc
</code></pre>
    </div>
  </div>
  <p>Count the number of gap characters (<code>-</code>) in a file (assumes no hyphens in names):</p>
  <div class="language-sh highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="sh">fgrep <span class="nt">-o</span> - file.fasta | <span class="nb">wc</span> <span class="nt">-c</span>
</code></pre>
    </div>
  </div>
  <p>Get the proportion of missing data (gaps divided by total number of characters) in a file:</p>
  <div class="language-sh highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="sh"><span class="o">(</span>fgrep <span class="nt">-o</span> - file.fasta | <span class="nb">wc</span> <span class="nt">-c</span> <span class="o">&amp;&amp;</span> <span class="nb">grep</span> <span class="nt">-v</span> <span class="s2">"^&gt;"</span> file.fasta | <span class="nb">wc</span> <span class="nt">-c</span><span class="o">)</span> | <span class="nb">paste</span> <span class="nt">-s</span> <span class="nt">-d</span>/ - | bc -
</code></pre>
    </div>
  </div>
  <p>Do the previous, but for all files in the current directory:</p>
  <div class="language-sh highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="sh">find <span class="nb">.</span> <span class="nt">-name</span> <span class="s2">"*.fasta"</span> <span class="nt">-exec</span> sh <span class="nt">-c</span> <span class="s1">'(fgrep -o - "$1" | wc -c &amp;&amp; grep -v "^&gt;" "$1" | wc -c) | paste -s -d/ - | bc -l'</span> <span class="nt">--</span> <span class="o">{}</span> <span class="se">\;</span>
</code></pre>
    </div>
  </div>
  <p>Add up the number of missing characters for all files in a certain directory:</p>
  <div class="language-sh highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="sh">find folder/ <span class="nt">-name</span> <span class="s2">"*.fasta"</span> <span class="nt">-exec</span> sh <span class="nt">-c</span> <span class="s1">'fgrep -o - "$1" | wc -c'</span> <span class="nt">--</span> <span class="o">{}</span> <span class="se">\;</span> | <span class="nb">paste</span> <span class="nt">-s</span> <span class="nt">-d</span>+ - | bc
</code></pre>
    </div>
  </div>
  <p>Add up the number of missing characters for all files in all directories in the current directory, in comma-separated format:</p>
  <div class="language-sh highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="sh"><span class="k">while </span><span class="nb">read </span>f<span class="p">;</span> <span class="k">do </span><span class="nb">echo</span> <span class="nt">-n</span> <span class="nv">$f</span>,<span class="p">;</span> find <span class="nv">$f</span> <span class="nt">-name</span> <span class="s2">"*.fasta"</span> <span class="nt">-exec</span> sh <span class="nt">-c</span> <span class="s1">'fgrep -o - "$1" | wc -c'</span> <span class="nt">--</span> <span class="o">{}</span> <span class="se">\;</span> | <span class="nb">paste</span> <span class="nt">-s</span> <span class="nt">-d</span>+ - | bc<span class="p">;</span> <span class="k">done</span> &lt; &lt;<span class="o">(</span>find <span class="nb">.</span> <span class="nt">-depth</span> 1 <span class="nt">-type</span> d<span class="o">)</span>
</code></pre>
    </div>
  </div>
  <p>Other variations are left as an exercise to the reader. I hope this has been an enjoyable journey through the wonderful world of shell pipelines.</p>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[Building a web-based image markup system]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/building-a-web-based-image-markup-system/"/>
  <id>https://jonathanchang.org/blog/building-a-web-based-image-markup-system</id>
  <published>2014-02-14T20:11:35+00:00</published>
  <updated>2014-02-14T20:11:35+00:00</updated>
  <content type="html"><![CDATA[
     <p><img src="/uploads/2014/02/ipad.jpg" alt="Turker working on an iPad" /></p>
  <p>A web-based service like Amazon Mechanical Turk needs a web-based interface for turkers to crowdsource data on fish shape. Existing software to digitize images requires a separate download, and most of it runs only on Windows. Distributing this software to hundreds of crowdsourced workers and ensuring it works on their computers can be quite a challenging task.</p>
  <p>To that end, we’ve had to develop an image digitization interface using only technologies that work in your browser. Specifically, we use the HTML5 <code>canvas</code> element, which is flexible enough to allow arbitrary graphics to be drawn in your browser window, but also gives us the power to use Javascript to record the marks that turkers then submit to our servers.</p>
  <p>The elegance of using the web as a platform to drive our crowdsourcing effort is that as long as you have a way to browse the internet, you should be able to contribute your work to our research. Our interface is agnostic to technology choice: we’ve tested it on an iPad and it works quite well.</p>
  <p><a href="https://jonchang.github.io/eol-mturk-landmark/">Click here</a> to see what the interface looks like, or <a href="https://github.com/jonchang/eol-mturk-landmark/">visit GitHub</a> to peek at the source code.</p>
  <p><em>Up next: <a href="/blog/how-accurate-are-crowdsourced-morphometricians/">I discuss the accuracy of crowdsourced landmarks</a>.</em></p>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[Evolution 2013 presentation, notes and slides]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/evolution-2013-presentation-notes-and-slides/"/>
  <id>https://jonathanchang.org/blog/evolution-2013-presentation-notes-and-slides</id>
  <published>2013-10-03T04:53:03+00:00</published>
  <updated>2013-10-03T04:53:03+00:00</updated>
  <content type="html"><![CDATA[
     <p><img src="/uploads/2013/10/fish-landmarks.png" alt="Fish landmarks" /></p>
  <p>Our pilot study worked! Over a dozen Mechanical Turk workers helped to digitize our pilot study sample of fishes. For more details and results, <a href="/uploads/2013/06/chang.evolution.2013.pdf">check out the slides I presented at Evolution</a>.</p>
  <p>This was my first year attending Evolution, and it was exciting showing my work with the Encyclopedia of Life to an amazing group of scientists.  I got lots of great feedback from people interested in my idea and methods. Based on this feedback, the next larger experiment is going to incorporate a semi-landmark approach, where workers will outline a shape, such as the curve of the dorsal fin that then gets subsampled down into a series of points. Additionally, the study will include a much larger number of landmarks, particularly those that more closely correspond to functional and performance aspects of fish anatomy. These are steps that I plan to take after discussions with several people, including Matt McGee and Thomas Claverie of the Wainwright lab, and Bruno Frederich, a visiting scholar in the Alfaro lab from the University of Liege.</p>
  <p>I’m currently working hard on writing the code and protocols for the next study. As always, all of the study materials are freely available <a href="https://github.com/jonchang/eol-mturk-landmark">online at GitHub</a></p>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[Encyclopedia of Life Rubenstein Fellowship: Crowdsourced morphology]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/encyclopedia-of-life-rubenstein-fellowship-crowdsourced-morphology/"/>
  <id>https://jonathanchang.org/blog/encyclopedia-of-life-rubenstein-fellowship-crowdsourced-morphology</id>
  <published>2013-06-19T15:50:34+00:00</published>
  <updated>2013-06-19T15:50:34+00:00</updated>
  <content type="html"><![CDATA[
     <p><img src="/uploads/2013/06/landmarks.jpg" alt="Landmarks!" /></p>
  <p>This is a rather late announcement, but my project proposal, “<a href="http://eol.org/users/72305">Using massively crowd-sourced data to examine morphological impacts of extinction risk in ray-finned fishes</a>”, was selected for the <a href="http://eol.org/info/fellows">2013 Rubenstein Fellows Program</a>. I’m happy to announce that the first pilot study of my project is starting today. These first steps are the result of much preparation with my collaborators, Dan Rabosky at Michigan and Michael Alfaro here at UCLA.</p>
  <p>The overall goal of this project is to quantify morphological disparity across the radiation of ray-finned fishes, which account for around 50% of all vertebrate diversity. We recently showed that the rate of speciation and body size evolution are correlated across fishes (Rabosky et al 2013, <em>Nat Commun</em>). However, size is just one facet of phenotypic diversity. In fact, shape is likely to be able to tell us quite a lot more about both morphological and ecological diversity, so with shape information we may be able to explain the factors that drive fish biodiversity, and also help identify patterns useful for fish conservation.</p>
  <p><img src="/uploads/2013/06/rabosky2013circular.png" alt="Figure from Rabosky et al. 2013 Nat Commun showing correlated rate of diversification and body size evolution" /></p>
  <p>We will then be able to test for a relationship between shape and human exploitation, i.e., are there fishes with certain body shapes that tend to be more or less vulnerable to overfishing? This type of data and analysis can also allow us to get at many deep questions in macroevolution, including if there is convergence in body shape in certain environments or habitats, or if certain groups of fishes enjoy a faster rate of shape evolution.</p>
  <p>There are tens of thousands of high-quality photographs of fishes on the Internet, in well-curated collections like the Encyclopedia of Life. Numerous expert editors have taken the time to identify photographs down to the species level. However, collecting landmark data for geometric morphometrics is the rate limiting step for this type of analysis. My hope is that crowdsourcing will help solve this bottleneck!</p>
  <p>The pilot study that I’m starting today is only a small piece of the puzzle. We’ve chosen the triggerfishes, a group that we already have high-quality photographs of due to a previously published study (Dornburg et al 2011, <em>Syst Biol</em>). This is so we can readily identify areas we need to improve for our larger analysis. The source code for the web-facing portion of this study is <a href="https://github.com/jonchang/eol-mturk-landmark">available on github</a>, and you can <a href="https://jonchang.github.io/eol-mturk-landmark/">play around with a version of it online</a>.</p>
  <p><em>(Top figure: An unsuspecting Pseudobalistes naufragium is viciously landmarked by the author; Bottom figure: A phylogeny of the ray-finned fishes, with warm colors corresponding with fast body size change and longer branches with rapid speciation. Courtesy of Dan Rabosky)</em></p>
  ]]></content>
</entry>
<entry>
  <title><![CDATA[Some utilities for dealing with character data]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/some-utilities-for-dealing-with-character-data/"/>
  <id>https://jonathanchang.org/blog/some-utilities-for-dealing-with-character-data</id>
  <published>2013-03-20T20:00:08+00:00</published>
  <updated>2013-03-20T20:00:08+00:00</updated>
  <content type="html"><![CDATA[
     <h2>Concatenation</h2>
  <p><img src="/uploads/2013/03/sequencematrix.png" alt="SequenceMatrix" /></p>
  <p>Concatenating character matrices seems like it ought to be easy, but I’ve had far too many issues with data formats to naively make that kind of statement. Mesquite never seemed to give output that could be consumed by other programs, Geneious had import issues, and all of my own hand-rolled techniques would always seem to hit edge cases.</p>
  <p><a href="http://code.google.com/p/sequencematrix/">SequenceMatrix</a> is quite a robust little piece of software that will generally “just work”. It’s a cross-platform GUI program (anywhere that Java runs) and will fill your missing data with gaps, make sure that all taxa are represented in your concatenated matrix, fix species names by removing Genbank junk, and so on. Best of all it supports drag and drop.</p>
  <h2>Conversion</h2>
  <p>I haven’t yet found an equivalently easy GUI program that will convert character data from one format to another, so I wrote my own. This script converts character data (e.g., DNA, RNA, morphology) from one format to another. Currently the supported formats are:</p>
  <ul>
    <li>FASTA</li>
    <li>Nexus</li>
    <li>Phylip (relaxed)</li>
  </ul>
  <p>The script uses <a href="https://dendropy.org">DendroPy</a> for most of its heavy lifting, and will also automatically parallelize the operation using Python multiprocessing.
    Here’s how you might use it:</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash"><span class="c"># convert all fasta files in the current directory to nexus files</span>
./convert_characters.py <span class="k">*</span>.fasta <span class="nt">--input-format</span><span class="o">=</span>fasta <span class="nt">--output-format</span><span class="o">=</span>nexus <span class="nt">--type</span><span class="o">=</span>dna
<p><span class="c"># convert all fasta files in a subdirectory to nexus files in a different subdirectory</span>
<span class="c"># this uses the short versions of the input and output commands</span></p>
<p>./convert_characters.py alignments/<span class="k">*</span>.fasta <span class="nt">-ifasta</span> <span class="nt">-onexus</span> <span class="nt">-tdna</span> <span class="nt">–prefix</span><span class="o">=</span>nexus/<span class="p">;</span> <span class="nt">–basename</span>
</code></pre>
    </div>
  </div>
</p>
<p><a href="https://gist.github.com/jonchang/5151081/raw/convert_characters.py">Download the script from GitHub</a>.
  (<a href="https://gist.github.com/jonchang/5151081">gist link</a>)</p>
]]></content>
</entry>
<entry>
  <title><![CDATA[Git: Moving old commits to a new branch]]></title>
  <link rel="alternate" type="text/html" href="https://jonathanchang.org/blog/git-moving-old-commits-to-a-new-branch/"/>
  <id>https://jonathanchang.org/blog/git-moving-old-commits-to-a-new-branch</id>
  <published>2012-10-08T07:26:20+00:00</published>
  <updated>2012-10-08T07:26:20+00:00</updated>
  <content type="html"><![CDATA[
     <p>Before I started judiciously applying the <a href="/blog/git-topic-branch-workflow/">topic branch workflow</a> in Git, I did a lot of commits against <code>master</code> and my pull requests were really messed up. I needed to clean this up the other day. The idea is that I wanted to be able to move a couple of really old commits off into their own topic branch, but my local repository and upstream had diverged significantly.</p>
  <p>For some reason, it wasn’t possible to just cherry-pick everything, and exporting the commits as patches to apply later didn’t seem to work either. Instead I found a pretty good solution with rebase, one of my favorite tools.</p>
  <p>First, make sure you’re on <code>master</code> (or whatever the default upstream branch is) and save a reference to your current local <code>HEAD</code> as <code>master-backup</code>:</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">git checkout <span class="nt">-b</span> master-backup master
git checkout master
</code></pre>
    </div>
  </div>
  <p>Add the upstream repository as a remote and fetch upstream’s commits.</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">git remote add upstream https://github.com/..../whatever.git
git fetch upstream
</code></pre>
    </div>
  </div>
  <p>Revert the local HEAD to the upstream HEAD. Now <code>master</code> should match <code>upstream/master</code> and <code>master-backup</code> should still retain your original commits.</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">git reset <span class="nt">--hard</span> upstream/master
</code></pre>
    </div>
  </div>
  <p>Create a new topic branch for your fix, based on our old master branch.</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">git checkout <span class="nt">-b</span> topic-branch master-backup
</code></pre>
    </div>
  </div>
  <p>Rebase the new topic branch against master.</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">git rebase master
</code></pre>
    </div>
  </div>
  <p>Then you can just force push everything up to GitHub and open a pull request like normal against the topic branch. In my case I had several different commits tangled up into <code>master</code>, so I had to repeat the last two steps a few times.</p>
  <p>Once you’re done, you can delete the old backup master using:</p>
  <div class="language-bash highlighter-rouge">
    <div class="highlight">
      <pre class="highlight"><code data-lang="bash">git branch <span class="nt">-D</span> master-backup
</code></pre>
    </div>
  </div>
  ]]></content>
</entry>
</feed>