Inspired by the efforts of scientists around the world and the game-changing efforts of projects like the Creative Commons, the Wikipedia Foundation, and the Free Software movement, we hope to engage the larger community in an open and fruitful discussion on issues concerning the use and reuse of scientific data, including the balance of openness and how to make ends meet in an increasingly competitive environment.
If you would like to join our efforts to highlight the use and reuse of data in the sciences, please feel free to contact us on our tracker, create a pull request against our repository, or join our forum.
We are not lawyers and this is not legal advice: all institutions and groups have their own perspectives and counsel. We are a group of scientists, engineers, librarians, and specialists that are concerned about the use and reuse of increasingly interconnected, derived, and reprocessed data. We want to make sure that data-driven scientific endeavors can work with one another in meaningful ways without undue legal concerns.
The (Re)usable Data Project is meant provide a resource that looks at some of the issues around the reuse of scientific data and open a conversation about how to deal with them.
We also want to actively work with the community in considering our criteria and in making sure that our information about scientific data resources is up-to-date and correct. If you have any questions, concerns, or see any problems, please open a ticket on our GitHub tracker.
![]() |
![]() |
The initial driving concern of this project is the use
and reuse of biological and biomedical data. However,
this is a general problem in the scientific community
and needs to be addressed directly.
For each
resource, using
our criteria, we attempt
to objectively assign zero to five stars for how well
we believe a resource's data may build upon, edited,
modified, and redistributed.
Grossly speaking:
If you see any problems with our determinations or would like to make corrections or clarifications, please open a ticket for us on our issue tracker.
This is a short overview of the criteria that we use when evaluating a resource's data license for use and reuse. We have attempted to balance many needs (credit, mutability, commercialization, redistribution, etc.) and focused on trying to objectively see how licenses can interact across resources.
To learn more about how we look at resource data licenses, please see our criteria and license type pages.
![]() |
You may also explore our data with simple visualizations here.
Name | Tags | Grade | Description | License Info | License Issues |
---|---|---|---|---|---|
Name | Tags | Grade | Description | License Info | License Issues |
Alliance of Genome Resources (AGR) π | biology, MOD, functional annotation, disease-gene association, orthology, phenotype and disease models | β β β β β | The primary mission of the Alliance of Genome Resources (the Alliance) is to develop and maintain sustainable genome information resources that facilitate the use of diverse model organisms in understanding the genetic and genomic basis of human biology, health and disease. | permissive π | |
ArrayExpress π | biology, microarray experiments, functional genomics, high-throughput, microarray, sequencing | β β β β Β½ | ArrayExpress Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community. | permissive π |
|
Bgee π | biomedical, x-species, expression data, curated data, biology, evo-devo, curated experiment annotations, RNA-Seq experiments, scRNA-Seq experiments, microarray experiments, in situ hybridization experiments, EST libraries | β β β β β | Bgee is a database to retrieve and compare gene expression patterns in multiple animal species, produced from multiple data types (RNA-Seq, scRNA-Seq, Affymetrix, in situ hybridization, and EST data). | permissive | |
BioCyc Database Collection (BioCyc, public) π | biology, genomic resource, sequence, gene structure, pathways, reactions, functional annotation | β β β | BioCyc is a collection of 20,028 Pathway/Genome Databases (PGDBs) for model eukaryotes and for thousands of microbes, plus software tools for exploring them. BioCyc is an encyclopedic reference that contains curated data from 130,000 publications. | restrictive π |
|
BioGRID π | biology, cross-species, protein-protein interaction | β β β β β | BioGRID is an interaction repository with data compiled through comprehensive curation efforts. Our current index is version 3.4.155 and searches 63,959 publications for 1,507,991 protein and genetic interactions, 27,785 chemical associations and 38,559 post translational modifications from major model organism species. All data are freely provided via our search index and available for download in standardized formats. | permissive π | |
BRENDA Tissue Ontology π | biology, ontology, enzyme sources | β β β β | A structured controlled vocabulary for the source of an enzyme. It comprises terms of tissues, cell lines, cell types and cell cultures from uni- and multicellular organisms. | permissive π |
|
Cancer Biomarkers database π | oncology, interaction, cancer, drug, biomarker, oncology | β β β β β | The Cancer Biomarkers database is curated and maintained by several clinical and scientific experts in the field of precision oncology. | permissive π | |
Catalogue of Life π | biology, custom, biodiversity, distribution, biogeography, taxonomy, ontology | β β | The Catalogue of Life is the most comprehensive and authoritative global index of species currently available. It consists of a single integrated species checklist and taxonomic hierarchy. The Catalogue holds essential information on the names, relationships and distributions of over 1.6 million species. | restrictive π |
|
CATH Protein Structure Database π | biology, protein families, protein family, superfamily, classification protein structure | β β β β β | CATH is a classification of protein structures downloaded from the Protein Data Bank. | permissive π | |
ChEMBL π | biology, biochemical, bioactive drug-like small molecules | β β β | ChEMBL is a database of bioactive drug-like small molecules, it contains 2-D structures, calculated properties (e.g. logP, Molecular Weight, Lipinski Parameters, etc.) and abstracted bioactivities (e.g. binding constants, pharmacology and ADMET data). | copyleft π |
|
All copyrightable materials on this site are Β©
2019 the (Re)usable Data
Project under the
CC-BY
4.0 license.
The (Re)usable Data Project is funded by the National Center for
Advancing Translational Sciences
(NCATS) OT3
TR002019 as part of
the Biomedical
Data Translator project
and U24TR002306 as part of the CTSA Program National Center for Data to Health (CD2H).
The (Re)usable Data Project would like to acknowledge
the assistance of many more people than can be listed
here. Please visit the about
page for the full list.