Zotero and Org-roam academic research workflow

If you write it down, it’s capital-S Science. At least, that’s what one of my grade-school teachers told me (faced with my “samples” collection sourced from field and river near my house).

Science isn’t a lot more complicated than that, but professional academia definitely seems to be. When I decided I wanted to make a go of serious Astro again, it unceremoniously dunked me into a sudden deluge of journal papers, voluminous and near-inscrutable. It’s, well… intimidating… and a seemingly, strange rarefied, almost impenetrable world.

I suddenly, desperately, need to track and note what I was extracting out of these papers in a way I could separate and distill ideas, and link them to other ideas for later reference, cite conventionally while writing, and somehow keep it all organized and from exploding on me. And this all before tackling the 100+ papers a day flooding through my ArchiX feed.

The Goal

My system needed to:

  1. Source, store, organize, manage, and reference papers
  2. Capture notes on those papers. Link them to other ideas.
  3. Integrate with my existing workflows to not create a silo
  4. Easy citations when I’m writing

First off, I have to thank the Redditors of r/emacs and various Mastodon peeps, since posting questions about how real academics go about this garnered a large amount of genuinely helpful opinions and advice from people doing it in situ*. In fact, the variety of ways to approach this problem was initially a problem. There were a dizzying number of ways to skin the cat in reference management, and solutions ranged from roll-your-own, bespoke emacs library systems to numerous packages to ease the affair, and a range of hybrid solutions.

Interestingly, I noticed a lot of packages and solutions focused on the citations portion of the problem, whereas for me, I was much more interested in the management of the knowledge I was gleaning from the papers and how I’d integrate everything together (to the point I’d actually be able to write something as ambitious as a paper someday.).
This more than anything, convinced me I needed to adapt my already in-place the personal knowledge maangement system I’ve crrafted (based on org-roam ) and how I could integrate scientific reference management and citations alongside my other personal knowledge management.

So, since the knowledge (and, by extension, their notes) were the critical portion for me, the reference management was something I was happy to use a purpose-specific application for. This led me to the excellent, open source, free, and extensible reference manager, Zotero . Using that as the focus of my scientific library, and focusing on a few packages to handle integrating with org-roam and handling citations and bibliographies, had me end up with a facile, seamless system that plugged right into my existing workflows and worked shockingly efficiently.

Here’s how it all fits together.

The Setup

Zotero

Zotero first. Just grab it. Trust me.

It rocks and is purpose-built for scientific reference management from the ground up, as well as being simple to use, open source, free, extensible, and cross platform. Don’t bother trying to reinvent the wheel in emacs, roll your own ref management system, or use one of the reference managers locked into the scientific journal monoplies of Elsevier etc. Zotero has been a delghtful no-brainer and good decision whose wisdom seems to be compounding daily the more I use it.

Zotero screen shot

Zotero has a web connector which is where much of its power comes from. Install it when it asks in your main brwoser. It works with all major browsers.

This allows you, when you run across papers in various online journals and sources, to let Zotero grab any paper by clikcing on the browser extensoion in the menu bar, and throw a copy into your local library with all the bibliographic metadata you could possibly want (I should also mention there are plugins for both Word and LibreOffice available if you author in those apps, though being more a plain-text markup guy, I can’t comment I’m more a plain-text markup guy when I author.).

It’s a pleasure to use and allows you to even markup the copies of your pdfs in-library to refer to them later while you’re making notes.

To make Zotero work with the system I’ve set up here, there is one plugin you need ot install though it’s an easy setup. This plugin exports a txtual representation of your Zotero bibliographic information in a format known as BibTex which is a LaTeX standard for bibliographic information widely used by a lot of programs and used commonly with emacs.

The exported BibTex file (which automatically updates whenever you update Zotero) acts as a plain text representation of your Zotero database that emacs can refer to and interact with.

For example, an actual example of an entry from my Zotero:

@article{meechEarlyPhotometryComet1986,
  title = {Early Photometry of Comet p/{{Halley}}: {{Development}} of the {{Coma}}},
  shorttitle = {Early Photometry of Comet p/{{Halley}}},
  author = {Meech, K. J. and Jewitt, D. and Ricker, G. R.},
  year = {1986},
  month = jun,
  journal = {Icarus},
  volume = {66},
  pages = {561--574},
  issn = {0019-1035},
  doi = {10.1016/0019-1035(86)90091-6},
  urldate = {2023-03-21},
  abstract = {Broadband charge-coupled device photometry of Comet p/Halley at heliocentric distances R = 5.9 AU (1984 October) and R = 5.1 AU (1985 January) is presented. The mean brightness at R = 5.1 AU is greater than expected from an asteroidal brightness model fitted to earlier photometry. It is likely that this brightness increase is due to the release of dust grains from the nucleus beginning at about R = 5.9 AU. Simple thermal equilibrium sublimation models of a water-ice nucleus are shown to be consistent with weak activity even at R = 5.9 AU, provided the nucleus is dark (Bond albedo A {$<$} 0.15) and slowly rotating. The brightness of the comet varies on time scales from hours to days, with a range of nearly 1.0 mag at R = 5.9 AU, reduced to about 0.3 mag at R = 5.1 AU. The decrease in the range of the short-term variations is explained by the increased contribution from the coma to the total brightness of the comet. We find no convincing evidence for a dominant period in the short-term variations.},
  keywords = {Astronomical Photometry,Astrophysics,Charge Coupled Devices,Comet Nuclei,Halley'S Comet,Ice,Thermodynamic Equilibrium},
  annotation = {ADS Bibcode: 1986Icar...66..561M},
  file = {/Users/daryl/Documents/Zotero/storage/2PK36S5L/Meech et al. - 1986 - Early photometry of comet pHalley Development of.pdf}
}

In Zotero, install the “Better BibTeX” plugin which is an easy download and then installed as a file. Install it, set it to export to a directory in your emacs org-mode that makes sense and set it to continually update the export.

This provides the key file for org-roam and friends to interact with once done. In my setup, which takes a riff off Tiago Forte’s Second Brain PARA setup, the BibTeX file (which I simply call biblio.bibtex) exports to ~/Documents/org/refs directory (which is the parent directory for most “reference” matetial in my knowldedge system - so, astro research, the CRM, and resonance calendar material - if you’re interested in emulating the whole system, there’s a long post on my setup .).

Another handy thing to add in, depending on how many papers you may need to look at are between ridonkulous paywalls is to have your default resolution go grab your papers from the excellent SciHub, dedicated to open science (and depending on if you see this as pirating vs. liberating science.). This doesn’t impact me much as virtually all my papers appear on ArchiX (cause astronomy and astrophysics), but genetics and medical peeps may appreciate this more as monopoly power in the journals industry increases and extorts monopoly rents for useful research.

In Zotero, go to the Advanced menu items in Settings.

  1. Go to the Config Editor,
  2. accept that you are bypassing the safety protocols, and
  3. Search for the key extensions.zotero.findPDFs.resolvers.
  4. Right click and pick modify

Then in the text input, clear our everything and put in the following:

{
  "name": "Sci-Hub",
  "method": "GET",
  "url": "https://sci-hub.se/{doi}",
  "mode": "html",
  "selector": "#pdf",
  "attribute": "src",
  "automatic": true
}

This should make Zotero look for your paper first in SciHub and put it in your library. Enjoy.

Oh, one big bug bear. Zotero really needs a dark mode theme. You can apparently bend it to make this happen, but this is something that should be native now.

Org-roam

The second pillar of the system is org-roam . Org-roam is an org-mode clone of the bidirectional, zettlekasten, and PKM (Personal Knowledge Management) ideas behind tools like Roam Research, Obsidian, and Logseq and built for taking notes and linking ideas together with those concepts. Originally designed with the idea in mind of being able to collect concepts together, publish and author more easily. I use it in ways not intended, from managing a bespoke CRM system to keep track of folks, to enhance the natural TODO system of org-agenda, to actually taking notes on things I consume (often through my Resonance Calendar ) and link those to other ideas.

The additions to the existing org-roam needs three elements:

  1. Org-roam-Bibtex which has org-roam interact with the bibtex file mentioned above
  2. org-ref with ivy-bibtex, and
  3. A research “template” in org-roam which I use specifically for taking academic research information and to deal with separately

I should also note that with all thre systems I looked at, this was also by far the “easiest” system to setup in terms of altering my init.el and having it all work together. Very few changes. As mentioned, it also slotted in almost seamlessly to the way I already work.

The way this generally works for me is that I log a link in my daily notes for the information I am about to start consuming (backed when I create it with a relevant org-roam capture template - so a new contact get the peep template, a new resonance calendar item get the rez template etc). So, in my daily page, I trigger M-x orb-insert-link, and this fires off org-roam-bibtex to look at the list of all the references in my bibtex fileand allow me to narrow down by typing to the one I want to create an entry for.

I select it with return, and then org-roam will ask me which template I want to use on it. I choose the “papers” tempalte and rg-roam create the file with an appropriate id that also has a cite reference link to the bibtex entry (and thereby to the paper’s pdf in Zotero).

So, it creates a file with this template:

 ("p" "Paper" plain "%?"
	 :target (file+head "~/Documents/org/refs/astro/${citekey}.org"
 "#+TITLE: ${title}
 #+CREATED: %u
 #+MODIFIED:

 * ${title}
:PROPERTIES:
:Tags:
:Start:
:Fin:
:Killed:
:END:

 ** Actions

 ** Key Ideas

 ** Notes

"):unnarrowed t)

It’s important to note here that orb createa the link with the citations key that is in Zotero. THis is the file name and generally what you will end up searching on when looking for material so make sure you configure that to taste in Zotero if you do not like the default. You can think of the citation key, which is inserted with citations you insert, as the “primary key” to lookup things in Zotero and how the whole system holds together in a plain text perspective.

For example, on a recent paper I was reading on a comet sublimation model, the notes looked something like this when it was completed:

Org Roam Bibtex assisted Entry

The beautiful thing about this, is that I can just use org-roam insert links while I am typing the note to link to other ideas, papers, and even related information from other fields in these notes. I find it very powerful. )

Here’s the actual init.el setup in case you want to duplicate the above for yourself:

(use-package org-ref
  :straight t
  :config
  (setq bibtex-completion-bibliography '("~/Documents/org/refs/biblio.bib")
        bibtex-completion-notes-path "~/Documents/org/refs/astro"
        bibtex-completion-pdf-field "file"
        bibtex-completion-pdf-opn-function
        (lambda (fpath)
          (call-process "open" nil 0 nil fpath))))

(use-package ivy-bibtex
  :straight t
  :after org-ref)

(use-package org-roam-bibtex
  :straight t
  :after org-roam
  :hook (org-roam-mode . org-roam-bibtex-mode)
  :config
  (require 'org-ref)
  )

Credit where credit is due. A significant about of the setup here is taken from this youtube that someone posted in response to my Mastodon question.

And that’s it. Other than making sure the bibtex-completion-bibliography matches the export path from Zotero for the Better Bibtex plugin from the Zotero section, you’re all done and this should work pretty much flawlessly.

Writing Papers

I default to writing in markdown or org-mode these days. So, for my academic writing (such as it is), I plan on most drafts to start in org-mode, and write in raw LaTeX for things like mathematical formulas or the like (see my setup - quite pleased with how it works) and then use emacs’ powerful export capabilities to export the entire thing to LaTeX for typesetting and submission to journals (at least theoretically).

When writing your paper, citations are pretty simple. You use org-ref-insert-cite-link and you will have a proper citation link inserted into the written document. Easy peasy. If you need to refer back to the papaer you can open up the pdfs and such, though I find most of the heavy lifting I do in my article notes and that the paper drafts end up being more the solidification and collection of the various org-roam pieces I’ve collected for a topic.

Once you export, that link will be turned into the citation format you specify in your export settings, and everything is then good to go.

I’m planning on looping back and revisiting this section of the post once I’ve worked on pumping the actual paper out. Move the post from not just actual construction, but actual goal and output of the process.

Fin

And that’s the blow-by-blow on how to set up a note-friendly research management system that will make even your PI happy. The thing that I’m happiest about here, is I feel this adds “academic superpowers” to my existing PKM system and slots into the way I already work. I’d argue it’s work in progress till more papers get pumped out into the scientific community, but so far it seems to be strongly delivering on the original goals I had.

I hope this post was raelly useful for you. I have to say I spent a considerable amount of time investigating numerous approaches (there was a viable alternative with org-cite, citar, org-roam, and vertico which also seemed legit but involved considerable changes to my setup.).

Please let me know if you end up using this setup or incorporating some of it into your own setup. Also, let me know if you think there’s a better tool or package (or setup) I should try that would even better address the academic needs while still interacting with existing workflows. Or, any shortfalls you see with my system. Interested in what works and is working for people.

Feel free to mention or ping me on the elephant-site @awws on mastodon or email me at hola@wakatara.com .