Basic Guidance for Repositories:
- Needs of Repository
- List of Repositories
- Generalist Repositories
- Social Sciences Repositories
- Materials Science Repositories
- Physics Repositories
- Solid Earth Sciences Repositories
- Ocean Sciences Repositories
- Geomagnetism and Palaeomagnetism Repositories
- Ecology Repositories
- Climate Sciences Repositories
- Biogeochemistry and Geochemistry Repositories
- Astronomy & Planetary Sciences Repositories
- Earth and Environmental Sciences Repositories
- Chemistry and Chemical Biology Repositories
- Health Sciences Repositories
- Biological Sciences Repositories
- Nucleic Acid Sequence Repositories
- Protein Sequence Repositories
- Molecular & Supramolecular Structure Repositories
- Neuroscience Repositories
- Omics Repositories
- Taxonomy & Species Diversity Repositories
- Mathematical & Modelling Resources Repositories
- Cytometry and Immunology Repositories
- Imaging Repositories
- Organism-Focused Resources Repositories
Needs of Repositories:
Authors must deposit data to a data repository as part of the article submission process; articles will not otherwise be sent for review. If data have not been deposited to a repository prior to article submission, authors can upload their data to figshare or the Dryad Digital Repository during the submission process. Data may also be deposited to these resources temporarily, if the main host repository does not support confidential peer review.
Repositories need to meet our requirements for anonymous peer-review, data access, preservation, resource stability, and suitability for use by all researchers with the appropriate types of data. Data repositories should meet all of the following requirements:
- Ensure long-term persistence and preservation of datasets in their published form (minimum of 5 years after publication).
- Provide stable persistent identifiers for submitted datasets (e.g. Datacite DOIs)
- Allow public access to data without barriers, such as logins or paywalls, unless required for sensitive human datasets requiring access registration and/or acceptance of terms such as Data Usage Agreements.
- Support open licences (CC0 and CC-BY). Exceptions will only be permitted for human derived data and should be discussed with the editorial team prior to manuscript submission.
- Provide for confidential review of submitted datasets without the requirement for reviewers to provide identifying information, as well as embargoed data for authors during peer review if required.
Authors may use external resources such as DataCite’s Repository Finder and the FAIRsharing registry to find an appropriate repository for their data. As of 2021, this list will not be expanded further and therefore the use of alternative data repositories not included are acceptable, provided they meet the above criteria. Authors should ensure that policies of the repository [adopted/ adapted form nature]
The generalist repositories listed below are able to accept data from all researchers, regardless of location or funding source:
Repository Name Information on fees/costs Size limits Integrated with Scientific Data‘s manuscript submission system Re3data / FAIRSharing entry $120 USD for first 20 GB, and $50 USD for each additional 10 GB None stated Yes view FAIRsharing entry figshare 100 GB free per Scientific Data manuscript. 1 TB per dataset Yes – To qualify for the 100 GB of free storage, data must be uploaded to figshare via our submission system. view FAIRsharing entry Harvard Dataverse Contact repository for datasets over 1 TB 2.5 GB per file, 10 GB per dataset No view re3data entry Open Science Framework Free of charge 5 GB per file, multiple files can be uploaded No view FAIRsharing entry Zenodo Donations towards sustainability encouraged 50 GB per dataset No view re3data entry Science Data Bank Free of charge 8 GB per file, no limit to dataset size No view FAIRsharing entry
- Climate Sciences Repositories:
World Data Center for Climate at DRKZ (WDCC) view re3data entry
- Astronomy and Planetary Sciences Repositories:
SIMBAD Astronomical Database view re3data entry UK Solar System Data Centre view re3data entry
- Earth and Environmental Sciences Repositories:
- Chemistry and Chemical Biology Repositories:
ioChem-BD Computational Chemistry Datasets view re3data entry NCBI PubChem BioAssay view FAIRsharing entry
NCBI PubChem Substance view FAIRsharing entry
Beilstein-Institut, STRENDA view FAIRsharing entry
Health Sciences Repositories:
Some of the repositories in this section are suitable for datasets requiring restricted data access, which may be required for the preservation of study participant anonymity in clinical datasets. Author ensures that repositories directly to determine those with data access controls best suited to the specific requirements of your study.
- Biological Sciences Repositories:
- Nucleic Acid Sequence-
Novel DNA sequence, novel RNA sequence, and novel genome assembly data must be deposited to repositories that are part of the International Nucleotide Sequence Collaboration (INSDC) or to those which are working towards INSDC inclusion (as listed below), unless there are privacy or ethics restrictions that prevent open sharing of such data. These data may in addition be deposited to regional and national repositories as required. For human data that requires special controls, please see our recommended health sciences repositories.
Data Type Repositories Option Data and Metadata Standards Raw sequencing data (reads or traces) Genome assemblies Annotated sequences Sample metadata INSDC repositories Browse data and metadata standards endorsed by the Genome Standards Consortium Genetic variation data dbSNP (human variations less than 50bp)
dbVar (human variations greater than 50bp)
ClinVar (human genotype & phenotype)
European Variation Archive (EVA) (all species)
Genome Sequence Archive for Human (GSA-Human)
- Protein Sequence-
UniProtKB view FAIRsharing entry
- Molecular & Supramolecular Structure-
These repositories accept structural data for small molecules; peptides and proteins (all); and larger assemblies (EMDB). Small molecule crystallographic data should be uploaded to Dryad or figshare before manuscript submission, and should include a .cif file, and structure factors for each structure. Both the structure factors and the structural output must have been checked using the IUCR’s CheckCIF routine, and a copy of the output must be included at submission, together with a justification for any alerts reported.
These data repositories all accept human-derived data (NeuroMorpho.org and G-Node also accept data from other organisms). Please note that human-subject data submitted to OpenfMRI must be de-identified.
NeuroMorpho.org view FAIRsharing entry OpenNeuro (formerly OpenfMRI) view FAIRsharing entry G-Node view FAIRsharing entry Neuroimaging Informatics Tools and Resources Collaboratory (NITRC) view FAIRsharing entry EBRAINS view FAIRsharing entry
–Functional Genomics: Please refer to the MIAME standard for microarray data. Molecular interaction data should be deposited with a member of the International Molecular Exchange Consortium (IMEx), following the MIMIx recommendations. For data linking genotyping and phenotyping information in human subjects, strongly recommend submission to dbGAP, EGA or JGA, which have mechanisms in place to handle sensitive data.
– Metabolomics & Proteomics-
Metabolomics data should be submitted following the MSI guidelines. Authors can submit proteomics data to members of the ProteomeXchange consortium (listed below), following the MIAPE recommendations.
MassIVE view FAIRsharing entry MetaboLights view FAIRsharing entry PeptideAtlas view FAIRsharing entry PRIDE view FAIRsharing entry Panorama Public view FAIRsharing entry
- Taxonomy & Species Diversity-
- Mathematical & Modelling Resources-
BioModels Database view FAIRsharing entry Kinetic Models of Biological Systems (KiMoSys) view FAIRsharing entry The Network Data Exchange (NDEx) view FAIRsharing entry
- Cytometry and Immunology-
FlowRepository view FAIRsharing entry ImmPort view FAIRsharing entry
- Organism-Focused Resources-
These resources provide information specific to a particular organism or disease pathogen. They may accept phenotype information, sequences, genome annotations and gene expression patterns, among other types of data. Incorporating data into these resources can be very valuable for promoting reuse within these specific communities; however, where applicable, data records be submitted both to a community repository and to one suitable for the type of data (e.g. transcriptome profiling; please see above).
- Nucleic Acid Sequence-