At the last meeting of EUA’s Open Science Expert Group, we discussed the need for data to be “born” FAIR. What does that entail?
FAIR data
Just a recap: For data to be openly available (what is often called Open Data), they also need to be FAIR, which stands for Findable, Accessible, Interoperable, and Reusable. However, not all FAIR data needs to be open. At RITMO, we often find that we need to protect data due to copyright and/or privacy reasons. Still, we try to make as much as possible of our data openly available. Other times, we try to make some aggregated data available to support the conclusions in papers.
FAIRification is tidious
The problem today is that data is often “FAIRified” at the end of the research process, often as part of the requirements of a conference organizer or journal editor. More and more conferences and journals require data to be made available. I fully support this. How would anyone be able to verify findings if the underlying data and analysis scripts are not available.
Other times, FAIRification happens at the end of a project because external funders have started demanding data sharing. Not all data is published on. In fact, I would imagine that most data is not. Some data could and should probably be deleted. But other data may have value and could be shared so that others could benefit from it.
Whatever the reason for FAIRifying data, willingly or unwillingly, many researchers find the process tedious and time-consuming. One reason is that one needs to do a lot of manual labor at a point in time when the research is “done” and one wants to move on with other and new projects. Things would have been much easier if the data had been FAIR from the data collection stage, hence being “born” FAIR.
Some data are more FAIR than others
Some data types embed some metadata from when it is captured. A digital photo is a good example. They typically contain information about the time, device used, location, etc. This makes it much easier to organize them and search in metadata instead of inside of the photo itself (the pixels).
Many other devices, including research devices, have varying levels of metadata. I have not found any device that properly FAIRifies data from the start. Most of the proprietary systems we work with in the fourMs Lab include some information stored in their own software, but once we export a CSV file to work in another (non-commercial) software, little of this is preserved.
What to include?
Some key information would be good to have in any digitally stored file:
- timestamp of creation
- file ID (such as DOI)
- author ID (with a unique ID, such as orcid)
- copyright information (such as a license)
- privacy level information (“color” code)
There are lots of other things that could also be added, but those would be file—or domain-specific. However, if all files stored on any device carried this information, things would be so much easier in the long run!