Written by Kellie Snow, Research Data Librarian
Why ‘open data’?
The concept of ‘open data’ has attracted increased attention in recent years as part of the wider open research movement. Many funders and publishers now have policies around data sharing, meaning often researchers must comply in order to receive grant funding and publish papers. Yet an emphasis on compliance often muddies the original meaning of open data, and in some cases creates a reactive, rather than proactive culture towards making data widely available. So why should research data be made open, apart from to satisfy these requirements?
Firstly, there is the more immediate argument that research data is the evidence for claims made in articles. Therefore, if researchers are going to be open and accountable for the work they do (especially if they are publicly funded), that evidence should also be available so it can be scrutinised in the same way as the article itself.
But there is a second broader, perhaps more aspirational reason; making research data open means a worldwide audience, beyond the researcher’s own group, institution and potentially even discipline, can also see the data and make use of it. It has the potential to increase citations, foster collaborations, and enable unforeseen applications of research data. The notion of reuse and waste reduction isn’t confined to environmental issues – openly sharing and reusing data also reduces needless duplication of research and in turn saves effort and money. As a global community we can instead drive knowledge forward and move ever closer to answering those research questions so key to our societies and our planet.
Making data FAIR
That’s all very admirable in theory, but what are the practicalities of making data open? How do we know that others will understand it correctly, or that they will use it in an appropriate manner? The FAIR data principles, published in 2016, go some way to addressing these concerns by promoting research data that is Findable, Accessible, Interoperable, and Reusable. Or, to explain more fully, data that is made available to others should be:
- Findable – rich in metadata and documentation with a persistent identifier (such as a DOI) so it is discoverable.
- Accessible – retrievable through a trustworthy data repository/archive/centre and understandable to humans and machines. Metadata should always be available even if the data is not.
- Interoperable – with metadata that uses formal and shared languages. Open file formats should be used wherever possible.
- Reusable – clear usage licences and accurate information on provenance.
This set of guiding principles is extremely valuable in pushing an agenda for data that is not only openly available with clear usage guidelines, but actually understandable to others. If data becomes FAIR then it will in turn be available for all as anyone anywhere can potentially access, understand and use the data.
Consent is key
When discussing open data there is always a caveat that not all data can be made openly available. In some instances, there are good reasons why data cannot be shared more widely. These include commercial restrictions, protection of human participants or plant and animal species, or for national security. Here is where research data deviates from the open access path that academic papers can more easily follow. The phrase “as open as possible as closed as necessary” is increasingly being used to describe how data restrictions should be approached. Whilst we can appreciate the instances where data sharing of any kind is not possible, there is a sliding scale between fully open and fully closed where valuable research data can still be made available to some degree, if it is handled correctly.
First and foremost, for any data concerning participants, consent is key. Making individuals aware at the outset that their data may be shared and used by others means anonymised data can often be made available once it has been suitably prepared. For datasets where sensitive information cannot be fully removed, options of restricted access, often via a data repository, mean access for bona fide researchers is still possible. This might not be fully “open”, but it is a lot more accessible to those who could meaningfully do something with the data than simply discounting it as unfit for sharing.
Open data at Cardiff University
Here at Cardiff University we are working to support researchers in openly sharing data and encouraging a culture of open research throughout the institution. Our Open Research Task and Finish Group are working on a series of recommendations to embed open research activities at school and college levels, and to improve training and guidance around several areas including open data. EPSRC researchers can already publish metadata records and provide access to datasets through our CRIS, and an institutional research data repository is planned for next year. The data repository will allow University researchers to openly publish FAIR datasets whilst applying access controls for more sensitive data. It is hoped these developments will provide researchers with the facilities and support needed to effectively share their data with the world, encouraging collaboration, research integrity, and the progression of knowledge.
Open data for the future
So, data sharing is now expected in many circumstances, but to be truly open we need to think carefully about how and where this is done. Data stored on a USB stick and available through the creator is a world away from correctly prepared, documented, licenced datasets, deposited in an archive with cit-able persistent identifiers. If we adopt some of the principles outlined in this blog, then there is the possibility to make open data sharing a valuable exercise for everyone. Embrace the true meaning of open data and one day the world might just thank you for it.