Your research is important, and so is the data that it is based on. Making your data available is an important part of your work both as a researcher and scholar. When publishing your data you will need to:
- Describe your data using a standard metadata schema for your field
- Describe your research process and methodology!
- Get a DOI for you and your data and code!
- Select a repository where your materials can be made available to other researchers!
Here are some common ways to publish and share data:
Data Journals
The “data journal” is an emerging alternative. In data journals the data is the focus and the article is descriptive of the data set. This enables the data to be cited in a very familiar form.
- F1000Research – F1000Research is an open research publishing Platform for researchers in all subject areas.
- GigaScience – GigaScience is an open access, open data, open peer-review journal focusing on ‘big data’ research from the life and biomedical sciences.
- Scientific data – Scientific Data is a peer-reviewed, open-access journal for descriptions of datasets, and research that advances the sharing and reuse of scientific data.
Disciplinary Repository
Disciplinary repositories offer high visibility within a particular field. Not all repositories are committed to long-term preservation of data, and their mission and focus may change over time. Some, are only available to subscribers.
Not all repositories listed necessarily take researcher-produced datasets where you can share your data. Moreover, not all repositories listed can ensure long-term preservation of your data; contact each one for more details.
- Cambridge Structural Database – Established in 1965, the CSD is the world’s repository for small-molecule organic and metal-organic crystal structures. Containing the results of over half-a-million x-ray and neutron diffraction analyses this unique database of accurate 3D structures has become an essential resource to scientists around the world.
- DataCite – A not-for-profit organization which aims to establish easier access to research data on the Internet; increase acceptance of research data as legitimate, citable contributions to the scholarly record; and supports data archiving that will permit results to be verified and re-purposed for future study. DataCite makes research more effective by connecting research outputs and resources–from data and preprints to images and samples. DataCite supports the creation and management of DOIs and metadata records, enhance research workflows with service integration, and enable the discovery and reuse of research outputs and resources.
- DataONE – A community driven program providing access to data across multiple member repositories, supporting enhanced search and discovery of Earth and environmental data. DataONE promotes best practices in data management through responsive educational resources and materials. DataONE envisions researchers, educators, and the public using DataONE to better understand and conserve life on earth and the environment that sustains it.
- DRYAD – Dryad is an open data publishing platform and a community committed to the open availability and routine re-use of all research data.
- GenBank – GenBank ® is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences (Nucleic Acids Research, 2013 Jan;41(D1):D36-42). GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Nucleotide Archive (ENA), and GenBank at NCBI. These three organizations exchange data on a daily basis.
- ICPSR – The Inter-university Consortium for Political and Social Research is an international consortium of more than 810 academic institutions and research organizations. ICPSR provides leadership and training in data access, curation, and methods of analysis for the social science research community. The ICPSR maintains a data archive of more than 350,000 files of research in the social and behavioral sciences. It hosts 23 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.
- NSSDAC – The NASA Space Science Data Coordinated Archive serves as the permanent archive for NASA space science mission data. “Space science” means astronomy and astrophysics, solar and space plasma physics, and planetary and lunar science. As permanent archive, NSSDCA teams with NASA’s discipline-specific space science “active archives” which provide access to data to researchers and, in some cases, to the general public. NSSDCA also serves as NASA’s permanent archive for space physics mission data. It provides access to several geophysical models and to data from some non-NASA mission data.
- NIH-Supported Data Sharing Resources – This page shows a list of NIH-supported data repositories that accept submissions of appropriate data from NIH-funded investigators.
- re3data.org – The Registry of Research Data Repositories is a global registry of research data repositories from different academic disciplines.
- Scientific Data’s list of recommended repositories – Scientific Data mandates the release of datasets but do not themselves host data. Instead, they ask authors to submit datasets to an appropriate public data repository. This is their list of recommended data repositories.
Institutional Repository
The mission of an institutional repository is to permanently preserve the scholarly output of the institution. Here at the Missouri University of Science and Technology our institutional repository is know as Scholars’ Mine. Scholars’ Mine serves this function, and preserves text, audio, video, data and more. Scholars’ Mine is designed to meet the needs of scholars in all disciplines, and operates according to widely accepted standards for preservation and access.
Journals
Some journals publish data associated with their published articles. This will provide good visibility, but is often tied to a journal subscription, limiting access. Compliance with documentation standards and long-term preservation vary considerably from journal to journal.
Self-publishing
Self-publishing occur through individual, institutional, or third-party websites. The researcher assumes the responsibility for vetting their own data for quality and documentation, as well as preserving an accessible version of the data as file formats change in the future. Tools are emerging which focus on the broad sharing of data, while allowing individual researchers or research centers to manage their own data on a remote server. The long-term implications are uncertain at this point.
It is not necessary to choose only one of these options. In fact, there are advantages to using multiple publishing options. Most of these options do not require an exclusive granting of rights, making it possible to deposit data in multiple locations, which both maximizes current visibility and long-term preservation simultaneously.
Citing Data
Citing data is highly recommended to to provide reliable access to specific datasets and to provide credit to the producers of useful Data citation standards are just beginning the emerging in many disciplines. In the absence of a specific standards , a data citation should include the following:
- Author or Responsible Party(such as: study PI, sample collector, government agency)
- Name of the Data Element used (e.g., a specific Table/Map/dataset with any applicable unique IDs)
- Name of the Database
- Name of the Publication ( if applicable)
- Name of the Repository (if applicable)
- Version identifier (Study number, edition, year, version, etc.)
- Date accessed
- URL used
If specific steps were required to subset, analyze, or access the data, the citation should also include:
- parameters selected
- software used
If you have a DOI, you can use the CrossCite DOI data citation formatter or the DataCite citation formatter to create citations corresponding to a variety of citation styles.
Most citation style guides/manuals are including data as a resource type. The Citation Formatters (above) provide the information in a style that approximates style requirements, so it is suggested that you confirm that those generated citations completely follow a particular citation style guide.
Here are some additional examples of guidelines:
- American Geophysical Union (AGU) author guidelines for citing data sets
- Federation of Earth Science Information Partners (ESIP) Interagency Data Stewardship/Citations
- Citing and linking to the Gene Expression Omnibus (NCBI) database
- The Inter-university Consortium for Political and Social Research (ICPSR) provides recommended citation procedures
- DataCite citation examples
Citing Code
Citing code is as important as citing data, and for similar reasons: you’re providing appropriate credit, facilitating reproducibility, and ensuring future researchers can find and use the code.
A code citation should include:
- Creator (i.e., authors or organization who developed the software)
- Title
- Identifier (e.g., DOI or other persistent link)
- Date of publication
- Version
- Publisher (e.g., repository name)
The Force11 Software Citation Implementation Working Group has developed principles for software citation. Their GitHub page has examples of citing software in both APA and Chicago Style.
Data Availability Statements
When publishing an article using research data, the journal may require a data availability statement that briefly describes if and how readers can access the data that informs the research. This chart shows some sample language you might use for a data availability statement.
Data Availability | Sample Language |
---|---|
Data openly available in a public repository that issues datasets with DOIs | The datasets generated during and/or analyzed during the current study are available in the [repository name, e.g. “Iowa Research Online”] at [http://doi.org/[doi]] |
Data available on request due to privacy/ethical restrictions | The datasets generated during and/or analyzed during the current study are not publicly available due to [explanation of restrictions, e.g. “their containing private information”] but are available from the corresponding author on reasonable request. |
Data available on request from the authors | The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. |
Data sharing not applicable – no new data generated | Data sharing not applicable to this article as no datasets were generated or analyzed during the current study. |
Data available within the article or its supplementary materials | All data generated or analyzed during this study are included in this published article [and/or] its supplementary information files. |
Data subject to third party restrictions | The data that support the findings of this study are available from [third party name] but restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are, however, available from the authors upon reasonable request and with permission of [third party name]. |
Hrynaszkiewicz, I, Simons, N, Hussain, A, Grant, R and Goudie, S. 2020. “Developing a Research Data Policy Framework for All Journals and Publishers.” Data Science Journal, DOI: http://doi.org/10.5334/dsj-2020-005
Additional examples of data availability statements from publishers: