Fixing pandoc "out of memory" errors on Windows

pandoc crashes due to address space exhaustion.

Recently I’ve been using the rmarkdown+knitr+pandoc workflow to write manuscripts. Markdown with Pandoc is roughly a million times easier to use than the equivalent LaTeX workflow. With the addition of RMarkdown and knitr included in RStudio, you can also weave R plots and output directly into your manuscripts. I was inspired to do this by Carl Boettiger’s online lab notebook and Rich FitzJohn’s “how much of the world is woody” reproducible research GitHub repository.

I was working with a Markdown document in RStudio when, after adding a bunch of citations, pandoc.exe was crashing with an out-of-memory error. My Windows PC has 8 gigabytes of RAM and I found it unlikely that pandoc could consume that much memory. After checking the Task Manager, it was clear that pandoc was only consuming about 1.8 GB of memory, suggesting that it was not a true out-of-memory error, but rather virtual memory address space exhaustion†.

Luckily for us there is a utility that comes with Microsoft Visual Studio (it’s free!) that allows us to poke around in the executable file’s headers and forcefully enable a special flag that should help alleviate this issue. Once you have VS installed, start up the developer command prompt in elevated mode (Shift-Right-click – Run as Administrator) and type into the terminal:

editbin /LARGEADDRESSAWARE "C:\Program Files\RStudio\bin\pandoc\pandoc.exe"
editbin /LARGEADDRESSAWARE "C:\Program Files\RStudio\bin\pandoc\pandoc-citeproc.exe"

You can then use dumpbin /headers "C:\Program Files\RStudio\bin\pandoc\pandoc.exe" | more and look for “Application can handle large (>2GB) addresses” to confirm that the fix worked.

Update: The Large Address Aware app can do the same with a GUI and no need to download the entirety of VS.

pandoc successfully using >2GB of memory.

Technical details: 32-bit Windows systems can address up to 4GB of RAM, but all versions of Windows limit the program to 2GB, since the other 2GB of address space is reserved by the kernel. Windows XP introduced the /LARGEADDRESSAWARE flag that allowed 32-bit programs to address up to 3GB of RAM on 32-bit systems, and 4GB on 64-bit systems. There was also Physical Address Extensions which allowed >4GB addressing, but I don’t think anyone outside of the server realm ever used it.

All that is really necessary to make 32-bit programs use the full 4GB of address space is to set a special linker flag and make sure that your code doesn’t have faulty assumptions about how Windows lays out its memory. For example, if you knew that the kernel always reserved the upper half of the address space for itself, you can smuggle data into the upper two bytes of user-space pointers.