"A love letter to your future self": What scientists need to know about FAIR data
Following these guiding principles for sharing data can help researchers get ahead.
11 February 2019
Nature Index 360°
The idea that scientific data should be FAIR — Findable, Accessible, Interoperable, and Reusable — is one increasingly endorsed by scientific institutions including the United States National Academies of Science Engineering and Medicine, the European Commission, and the Wellcome Trust. But it is yet to gain much traction among the people that ultimately matter, the scientists generating the data. The 2018 State of Open Data report, published by Digital Science, found that just 15% of researchers were “familiar with FAIR principles”. (Digital Science is operated by the Holtzbrinck Publishing Group, which also has a majority share in the publisher of the Nature Index.)
So what do scientists need to know about FAIR? Nature Index spoke to Kate LeMay, senior research data specialist at the Australian Research Data Commons, and Lambert Heller, leader of the Open Science Lab at TIB, the German National Library of Science and Technology.
Is FAIR the same as open data?
Increasingly, researchers are encouraged to share their data openly. But open data and FAIR data are distinct concepts. “When we’re talking about open data,” LeMay says, “we’re generally referring to data that can be downloaded freely from the internet.” But researchers need to do more than simply post their data on the web for it to be useful.
“FAIR means thinking about the people who could benefit from your data,” explains Heller. “It means adding persistent identifiers like DOIs [digital object identifiers] to the data, having a stable URL so the data doesn’t ‘disappear’, adding metadata that describes the data and a license stating the conditions under which it can be reused. It also means presenting the data in a standardized way so it’s machine readable.”
Data can also be FAIR but not open. The guiding principle, Heller says, is for data to be “as open as possible, as closed as necessary”. A classic example is medical data, where access has to be controlled to ensure patient privacy and confidentiality. In such cases, LeMay explains, the FAIR approach would be to make the metadata publicly available and provide information about the conditions for accessing the data itself.
What exactly is metadata?
Metadata describe the data and are critical to helping users discover relevant datasets. “We love rich metadata,” says LeMay. “We want to know who made the data, where it was made, what it contains, who to credit, how to reference the dataset.”
Metadata can also include keywords, field of science classification codes, the DOIs of related papers, the researchers’ ORCID identifiers, and the codes for the grants that supported the research. For help with metadata, LeMay recommends talking to university librarians, whose experience with cataloguing books and journals has put them at the forefront of data archiving and curation.
Why does FAIR data need a license?
Having found and accessed the data, researchers need to know how they can reuse it. For open data, LeMay recommends scientists apply a Creative Commons license, which states any restrictions on re-use — whether, for example, the data can be modified or used commercially, and who should be credited. “The advantage of Creative Commons,” she says, “is that it’s internationally recognised and has been tested in many legal jurisdictions.” However, if access to the data is restricted (as in the case of medical data), a Creative Commons license is inappropriate because it entails a loss of control over who has access to the data. In such cases, LeMay says, a repository needs to have its own template licenses.
Why do we need data standards?
True interoperability (the I in FAIR) requires that the data and metadata follows predetermined standards with a consistent structure and agreed vocabularies and ontologies (keywords for describing the data). This allows the data to be interrogated automatically and datasets to be merged. The challenge, LeMay notes, is that different research fields have different cultures and requirements for data and metadata. “There needs to be community ownership of these data standards,” she says. “We can’t just impose them on researchers.”
Where should FAIR data be stored?
Most researchers currently share their data either as supplementary material to a journal article or in an independent data repository. Although the FAIR guidelines don’t state a preference, Heller notes that repositories such as Zenodo, Figshare and the Open Science Framework offer useful tools that help researchers make their uploaded data FAIR; for example, generating a DOI and populating the metadata. Repositories also allow researchers to make data curation part of their ongoing workflow, rather than an additional task at the end of the publication process. “It’s never too early to make your data reusable,” he says. “If I could add one more letter to FAIR it would be T for timely.”
What’s in it for researchers?
Many scientific journals and research funders now require scientists to share their data openly. Nature, for example, recently endorsed the Enabling FAIR Data initiative, which requires authors in the Earth, space and environmental science to share their supporting data on community repositories, where available. The American Chemical Society's author guidelines state that supplementary information submitted with a manuscript will be automatically hosted on Figshare "to promote open data discoverability and use of your research outputs." (See more details in the 'Useful resources' section below, particularly 'Fairsharing.org' and 'Publisher commitment'.)
But as LeMay argues, there are both altruistic and selfish reasons for researchers to take the next step and make their data FAIR. “Most people get into research because they want to make a difference,” she says. “That includes making your data as useful as possible.” FAIR can also be good for career advancement, particularly for early-career researchers. “FAIR helps you demonstrate the impact of your research when people re-use and cite your dataset,” LeMay says. “It gets your name out there and can lead to new collaborations.”
Heller agrees. However, the true benefit of FAIR, he argues, is in providing a framework for researchers to manage their own data so they can themselves find it, understand it, and reuse it. “As a scientist, you should treat your data like a love letter to your future self,” he says.
Introducing FAIR: The first formal publication introducing the FAIR Data Principles, including its rationale and some examples.
FAIR self-assessment tool: An online questionnaire for measuring the extent to which datasets are FAIR, with links for more information. Created by the Australian Research Data Commons.
Publisher commitment: List of signatories to the Coalition for Publishing Data in the Earth and Space Sciences Statement of Commitment. The list includes Springer, Elsevier, and several other publishers of journals in the Nature Index. The site includes a helpful FAQ.
FAIRsharing.org: A resource on existing data and metadata standards and policies, including those linked to specific funders and journals.
re3data.org: A detailed registry of more than 2,000 research data repositories to assist in finding the right repository for your data.
License finder: A simple tool for choosing the right Creative Commons license.
Further learning: Webinar series provided by the Australian National Data Service for those in the business of creating and curating research data.
See also: Big data goes green
Correction 19/03/2019: Springer has signed onto the Coalition for Publishing Data in the Earth and Space Sciences. An earlier version suggested it was Springer Nature.
This is the first article in the Nature Index 360° series, which takes an all-round look at key topics in scientific research performance and publishing.