Basic Guidance for Repositories:

Needs of Repositories:
Authors must deposit data to a data repository as part of the article submission process; articles will not otherwise be sent for review. If data have not been deposited to a repository prior to article submission, authors can upload their data to figshare or the Dryad Digital Repository during the submission process. Data may also be deposited to these resources temporarily, if the main host repository does not support confidential peer review.

Repositories need to meet our requirements for anonymous peer-review, data access, preservation, resource stability, and suitability for use by all researchers with the appropriate types of data. Data repositories should meet all of the following requirements:

  • Ensure long-term persistence and preservation of datasets in their published form (minimum of 5 years after publication).
  • Provide stable persistent identifiers for submitted datasets (e.g. Datacite DOIs)
  • Allow public access to data without barriers, such as logins or paywalls, unless required for sensitive human datasets requiring access registration and/or acceptance of terms such as Data Usage Agreements.
  • Support open licences (CC0 and CC-BY). Exceptions will only be permitted for human derived data and should be discussed with the editorial team prior to manuscript submission.
  • Provide for confidential review of submitted datasets without the requirement for reviewers to provide identifying information, as well as embargoed data for authors during peer review if required.

Authors may use external resources such as DataCite’s Repository Finder and the FAIRsharing registry to find an appropriate repository for their data. As of 2021, this list will not be expanded further and therefore the use of alternative data repositories not included are acceptable, provided they meet the above criteria. Authors should ensure that policies of the repository [adopted/ adapted form nature]

List of Repositories:

  1. Generalist Repositories:
    The generalist repositories listed below are able to accept data from all researchers, regardless of location or funding source:
    Repository Name Information on fees/costs Size limits Integrated with Scientific Data‘s manuscript submission system Re3data / FAIRSharing entry
    $120 USD for first 20 GB, and $50 USD for each additional 10 GB None stated Yes view FAIRsharing entry
    figshare 100 GB free per Scientific Data manuscript. 1 TB per dataset Yes – To qualify for the 100 GB of free storage, data must be uploaded to figshare via our submission system. view FAIRsharing entry
    Harvard Dataverse Contact repository for datasets over 1 TB 2.5 GB per file, 10 GB per dataset No view re3data entry
    Open Science Framework Free of charge 5 GB per file, multiple files can be uploaded No view FAIRsharing entry
    Zenodo Donations towards sustainability encouraged 50 GB per dataset No view re3data entry
    Science Data Bank Free of charge 8 GB per file, no limit to dataset size No view FAIRsharing entry
  2. Social Sciences Repositories:

  3. Materials Science Repositories:

  4. Physics Repositories:

  5. Solid Earth Sciences Repository:
  6. Ocean Sciences Repositories:

  7. Geomagnetism and Palaeomagnetism Repositories:

  8. Ecology Repositories:

  9. Climate Sciences Repositories:
    World Data Center for Climate at DRKZ (WDCC) view re3data entry
  10. Biogeochemistry and Geochemistry Repositories:

  11. Astronomy and Planetary Sciences Repositories:
    SIMBAD Astronomical Database view re3data entry
    UK Solar System Data Centre view re3data entry
  12. Earth and Environmental Sciences Repositories:
    NASA Goddard Earth Sciences Data and Information Services Center view re3data entry
    NERC Data Centres view re3data entry
    PANGAEA view re3data entry
    National Tibetan Plateau/Third Pole Environment Data Center view FAIRsharing entry
    NOAA National Centers for Environmental Information (DOIs only assigned to deposited data on request) view re3data entry
    HydroShare (CUAHSI) view FAIRsharing entry
  13. Chemistry and Chemical Biology Repositories:
    ioChem-BD Computational Chemistry Datasets view re3data entry
    NCBI PubChem BioAssay view FAIRsharing entry
    NCBI PubChem Substance view FAIRsharing entry
    Beilstein-Institut, STRENDA view FAIRsharing entry
  14. Health Sciences Repositories:
    Some of the repositories in this section are suitable for datasets requiring restricted data access, which may be required for the preservation of study participant anonymity in clinical datasets. Author ensures that repositories directly to determine those with data access controls best suited to the specific requirements of your study.
  15. Biological Sciences Repositories:
    1. Nucleic Acid Sequence-
      Novel DNA sequence, novel RNA sequence, and novel genome assembly data must be deposited to repositories that are part of the International Nucleotide Sequence Collaboration (INSDC) or to those which are working towards INSDC inclusion (as listed below), unless there are privacy or ethics restrictions that prevent open sharing of such data. These data may in addition be deposited to regional and national repositories as required. For human data that requires special controls, please see our recommended health sciences repositories.

      Data Type Repositories Option Data and Metadata Standards
      Raw sequencing data (reads or traces) Genome assemblies Annotated sequences Sample metadata INSDC repositories

      Genome Sequence Archive (GSA)

      Browse data and metadata standards endorsed by the Genome Standards Consortium
      Genetic variation data dbSNP (human variations less than 50bp)
      dbVar (human variations greater than 50bp)
      ClinVar (human genotype & phenotype)
      European Variation Archive (EVA) (all species)
      Genome Sequence Archive for Human (GSA-Human)
    2. Protein Sequence-
      UniProtKB view FAIRsharing entry
    3. Molecular & Supramolecular Structure-
      These repositories accept structural data for small molecules; peptides and proteins (all); and larger assemblies (EMDB). Small molecule crystallographic data should be uploaded to Dryad or figshare before manuscript submission, and should include a .cif file, and structure factors for each structure. Both the structure factors and the structural output must have been checked using the IUCR’s CheckCIF routine, and a copy of the output must be included at submission, together with a justification for any alerts reported.

      Protein Circular Dichroism Data Bank (PCDDB) view FAIRsharing entry
      Crystallography Open Database (COD) view FAIRsharing entry
      Coherent X-ray Imaging Data Bank (CXIDB) view FAIRsharing entry
      Biological Magnetic Resonance Data Bank (BMRB) view FAIRsharing entry
      Electron Microscopy Data Bank (EMDB) view FAIRsharing entry
      Worldwide Protein Data Bank (wwPDB) view FAIRsharing entry
      Structural Biology Data Grid view FAIRsharing entry
      Cambridge Structural Database (CSD) – managed by the Cambridge Crystallographic Data Centre (CCDC)
      Inorganic Crystal Structure Database (ICSD), deposition via CCDC
      Electron Microscopy Data Bank
    4. Neuroscience-
      These data repositories all accept human-derived data (NeuroMorpho.org and G-Node also accept data from other organisms). Please note that human-subject data submitted to OpenfMRI must be de-identified.

      NeuroMorpho.org view FAIRsharing entry
      OpenNeuro (formerly OpenfMRI) view FAIRsharing entry
      G-Node view FAIRsharing entry
      Neuroimaging Informatics Tools and Resources Collaboratory (NITRC) view FAIRsharing entry
      EBRAINS view FAIRsharing entry
    5. Omics-
      Functional Genomics: Please refer to the MIAME standard for microarray data. Molecular interaction data should be deposited with a member of the International Molecular Exchange Consortium (IMEx), following the MIMIx recommendations. For data linking genotyping and phenotyping information in human subjects, strongly recommend submission to dbGAP, EGA or JGA, which have mechanisms in place to handle sensitive data.

      ArrayExpress view FAIRsharing entry
      Gene Expression Omnibus (GEO) view FAIRsharing entry
      GenomeRNAi view FAIRsharing entry
      dbGAP view FAIRsharing entry
      The European Genome-phenome Archive (EGA) view FAIRsharing entry
      Database of Interacting Proteins (DIP) view FAIRsharing entry
      IntAct view FAIRsharing entry
      Japanese Genotype-phenotype Archive (JGA) view FAIRsharing entry
      NCBI PubChem BioAssay view FAIRsharing entry
      Genomic Expression Archive (GEA) view FAIRsharing entry
      GWAS Catalog view FAIRsharing entry

      Metabolomics & Proteomics-
      Metabolomics data should be submitted following the
      MSI guidelines. Authors can submit proteomics data to members of the ProteomeXchange consortium (listed below), following the MIAPE recommendations.

      MassIVE view FAIRsharing entry
      MetaboLights view FAIRsharing entry
      PeptideAtlas view FAIRsharing entry
      PRIDE view FAIRsharing entry
      Panorama Public view FAIRsharing entry
    6. Taxonomy & Species Diversity-
      Environmental Data Initiative (formerly LTER Network Information System Data Portal) view re3data entry
      Global Biodiversity Information Facility (GBIF) view FAIRsharing entry
      Integrated Taxonomic Information System (ITIS) view FAIRsharing entry
      KNB: The Knowledge Network for Biocomplexity view FAIRsharing entry
      Morphobank.org view FAIRsharing entry
      Movebank Data Repository view FAIRsharing entry
    7. Mathematical & Modelling Resources-
      BioModels Database view FAIRsharing entry
      Kinetic Models of Biological Systems (KiMoSys) view FAIRsharing entry
      The Network Data Exchange (NDEx) view FAIRsharing entry
    8. Cytometry and Immunology-
      FlowRepository view FAIRsharing entry
      ImmPort view FAIRsharing entry
    9. Imaging-
      Image Data Resource view FAIRsharing entry
      The Cancer Imaging Archive view FAIRsharing entry
      SICAS Medical Image Repository view FAIRsharing entry
      Coherent X-ray Imaging Data Bank (CXIDB) view FAIRsharing entry
      Cell Image Library view FAIRsharing entry
    10. Organism-Focused Resources-
      These resources provide information specific to a particular organism or disease pathogen. They may accept phenotype information, sequences, genome annotations and gene expression patterns, among other types of data. Incorporating data into these resources can be very valuable for promoting reuse within these specific communities; however, where applicable, data records be submitted both to a community repository and to one suitable for the type of data (e.g. transcriptome profiling; please see above).

      Eukaryotic Pathogen Database Resources (EuPathDB) view FAIRsharing entry
      FlyBase view FAIRsharing entry
      Influenza Research Database view FAIRsharing entry
      Mouse Genome Informatics (MGI) view FAIRsharing entry
      Rat Genome Database (RGD) view FAIRsharing entry
      VectorBase view FAIRsharing entry
      WormBase view FAIRsharing entry
      Xenbase view FAIRsharing entry
      Zebrafish Model Organism Database (ZFIN) view FAIRsharing entry