Three ways to collaborate on writing
Document-sharing tools for scientists.
10 January 2020
Jeffrey M. Perkel
Federico Caputo / Alamy Stock Photo
Science is a collaborative exercise, and whether they are writing papers or grant applications, or soliciting feedback from their thesis committee, researchers frequently must author, edit, and share documents with their colleagues.
Most familiar in this toolbox, perhaps, are the Google productivity applications: Docs, Sheets, and Slides, as well as Microsoft Office, which also can be used in a collaborative, joint-editing mode. But a rich suite of alternatives also exists.
One of them is the collaborative authoring platform HackMD. For example, as part of her weekly lab meetings, Kirstie Whitaker, a Research Fellow at the Alan Turing Institute in London, gives her research team a bit of homework. “Everybody in the lab has to share a piece of information for everybody to learn from, and everybody has to ask a question,” she says.
“We all have something that we can give to a conversation, and we can always be learning from the others,” explains Whitaker. “And we can ask questions to try and build those connections and build community.”
To collect the answers, Whitaker creates a shareable document on HackMD. She shares the file’s URL on the team Slack, and members populate the document with their replies. Later, she condenses the meeting minutes, also collected on HackMD, into a public blog post - a public record of her lab’s activities that is in keeping with Whitaker’s open-science ethos.
Collaborative authoring tools like HackMD are generally cloud-based, and allow multiple authors to work on a document without mailing it from person to person (and sometimes simultaneously).
Privacy features allow authors to determine who can and cannot read and edit their work, while versioning tools let them review how the document has changed over time.
And, because these systems generally use plain-text file formats, the resulting documents can easily be downloaded and edited offline with any basic text editor, and/or synched to version control systems like GitHub.
Which to use depends on your technical skill, and that of your collaborators, as well as privacy considerations. Here, we consider three options.
HackMD (free for personal use; paid accounts start at $5/month) is a browser-based editor for files written in Markdown.
Markdown is a simple plain-text file format in which formatting options - bold text, italics, hyperlinks, highlights, and so on - are encoded in the text itself. That makes it simple to track changes, including formatting, using version control software such as Git.
“Markdown is a really nice way of having raw text with just a little bit of formatting that allows the page to render nicely,” Whitaker explains. “But it's very lightweight and it's really, really good for version control.”
The HackMD editing interface includes two side-by-side panes. One the left, the Markdown file is presented within a simple editing interface; on the right, the file is rendered as a formatted web page.
Syntax highlighting in the editing interface provides visual cues, while a toolbar provides point-and-click formatting options, such as boldface, checkboxes, and hyperlinks.
HackMD files can be public or private, and multiple collaborators can work on the files simultaneously. “It's like Google Docs - I could do simultaneous editing by multiple people live and see the results,” says Titus Brown, a bioinformatician at the University of California, Davis, who uses HackMD. An optional browser plug-in called ‘HackMD-it’ allows users to search and open files, and to edit GitHub files on HackMD.
Brown uses HackMD for authoring blog posts, and for soliciting input as he fleshes out ideas.
“I often use HackMD for things that I'm gonna show other people while I'm working on them,” he says.
As for Whitaker, in addition to her lab meetings she also uses HackMD to facilitate in-person workshops. At the end of the workshop, she invites participants to provide anonymous feedback in an exercise she calls “Plusses and Deltas”, in which users are asked to write a sentence indicating what they liked (‘plusses’) and another for what they would change (‘deltas’).
“That's just a really fun experience,” she says, “to look at the end of the session and see this document being, whoosh! - being populated all in one go.”
When multiple authors edit a file, keeping track of the changes can become a burden. The “track changes” feature of Microsoft Word and Google Docs shows how a file has changed.
But as more and more collaborators make changes to a file, it becomes harder to work out what’s happening: It isn’t possible, for instance, to view only the changes made by one individual, or at one specific point in time.
At the University of Pennsylvania in Philadelphia, bioinformatician Casey Greene was anticipating precisely this problem as he set about coordinating a massive review of deep learning in biology. So, Daniel Himmelstein on his team, in collaboration with Anthony Gitter at the University of Wisconsin, Madison, developed a solution: Manubot.
Manubot (manubot.org) “is a series of software packages, as well as some of the glue to tie them together, that lets you turn a GitHub repository into a self-published manuscript,” Greene says.
Simply clone the Manubot ‘rootstock’ repository on GitHub. Then, write your manuscript in Markdown, push it to your rootstock clone, and Manubot will automatically error-check the document, create a bibliography, pull in (and number) figures, and output a formatted HTML, Word, or PDF file.
Anybody can contribute to or edit the text by finding the repository on GitHub, editing the page in their web browser, and “pushing” their changes back to GitHub as a “pull request”, which invites the document maintainers to evaluate what they’ve done. Once those changes are approved, the Manubot software kicks in, ensuring that errors (for instance, unresolveable citations) have not been introduced.
Manubot provides tremendous flexibility for creating bibliographies, Greene says, allowing users to specify references with a DOI, PubMed or arXiv ID, URL, or ISBN number.
“I hate having to maintain a library that has some alternative identifiers that aren't DOIs,” he explains. “So for me, cite-by-persistent-identifier is just a killer feature.”
That said, Manubot does require that authors be comfortable working with GitHub - a bar that some of Greene’s collaborators are unwilling to hurdle. Indeed, of the tools described here, Manubot is probably the most technically demanding. Still, he says, “I would say if it's an open, collaborative paper, I don't know of a better tool.”
For researchers in the physics, mathematics, and computer science worlds, serious document authoring involves LaTeX, a typesetting language akin to software code.
So it’s perhaps no surprise that when researchers at CERN, a particle physics laboratory located near Geneva, launched a pilot study in 2016 to test assess researchers’ enthusiasm for an organization-wide collaborative authoring system, they overwhelmingly opted for one based on LaTeX.
Led by computing engineer Nikos Kasioumis and Valeria Brancolini, then a publisher at CERN, the study asked 45 employees across a range of disciplines and departments to consider three collaborative writing platforms: Overleaf, a web-based LaTeX and what-you-see-is-what-you-get “rich text” editor; Authorea, a web-based platform that supports LaTeX, Markdown, and rich text; and doDOC.
Overleaf (free personal plans; student plans start at $8/month) led the pack with 63% of the vote; Authorea got 23%, followed by doDOC with 14%.
And since the trial ended in 2017, the larger CERN community has embraced its choice, Kasioumis says, with Overleaf adoption jumping 11-fold at CERN, to about 4,500 users. (Overleaf is owned by Digital Science, a firm operated by the Holtzbrinck Publishing Group, which has a share in Nature Index’s publisher, Springer Nature.)
Markus Aicheler, an engineer on CERN’s Compact Linear Collider (CLIC) project, was one participant in the trial. Several years ago he used used LaTeX to collaboratively edit an 800-page “conceptual design report” for the CLIC experiment.
But since each editor had a different computational setup, with different LaTeX libraries installed, moving the text from person to person was painful, he says. Plus, they were using a version-control system called SVN, with which none of the editors was proficient.
“It worked out pretty much okay in the end, but we lost a lot of time and sweat and blood,” he says.
Searching for a smoother path
So, when it came time to put together the CLIC “project implementation plan” in 2017, Aicheler and his coeditors were looking to make the process easier while still sticking with LaTeX.
Like HackMD, Overleaf provides a two-paned interface: a LaTeX code editor on the left, with a live rendering of the output on the right. (LaTeX newbies can opt to edit in a “rich-text” mode with buttons for common formatting options.)
The software provides both ‘track changes’ and commenting features, but changes are also logged using Git, and Overleaf projects can be synched to GitHub for offline editing. There’s also an extensive library of LaTeX templates (including journal templates) to choose from, and some journals allow researchers to submit manuscripts for review directly from Overleaf.
Patrick Koppenburg is editorial board chair for one CERN’s Large Hadron Collider experiments. The board is a group of 11 individuals that reviews each of the 50 or so papers that emerge annually from that effort. He and his colleagues use Overleaf to help coordinate communication between his committee and manuscript authors.
Originally, Koppenburg explains, each member of the three-member panels that take the lead on each paper would read and comment on a PDF copy of the manuscript simultaneously, often producing the same comments. Now, they can edit and comment on the manuscript text directly - a more efficient, if also more drawn-out process.
For Aicheler, the killer feature of Overleaf is its ability to handle references and generate bibliographies. Traditional LaTeX workflows require users to import their references from a reference manager in the form of a “.bib” file, or to key them in manually.
That presents a challenge for collaborative projects, because as different researchers modify the reference database, those lists can get out of sync. Overleaf allows users to connect a group reference library directly from Mendeley or Zotero, meaning their master database is always up to date, even if multiple researchers are using it.
“References used to be a pain in LaTeX, now it's a breeze,” Aicheler says. “It really changed 180 degrees - this is now really a kick-ass feature.”