BERDL Collections
Explore the data collections available in the KBase Data Lakehouse.
Primary Research Collections
Pangenome Collection
kbase_ke_pangenome
Pangenome data for 293,059 genomes across 27,690 microbial species derived from GTDB r214. Includes core/accessory gene classification, functional annotations, and ANI relationships.
Fitness Browser
kescience_fitnessbrowser
Gene fitness data from transposon mutant experiments across 40+ bacterial organisms. Identify essential genes and condition-specific fitness effects.
KBase Genomes
kbase_genomes
Structural genomics data including contigs, features, and protein sequences from the KBase genome repository.
ModelSEED Biochemistry
kbase_msd_biochemistry
Biochemistry reference data for metabolic modeling. Reactions, compounds, and pathway mappings from ModelSEED.
Phenotype Collection
kbase_phenotype
Experimental phenotype data including growth conditions, measurements, and phenotypic outcomes.
Domain-Specific Collections
PhageFoundry Browsers
phagefoundry
Phage genome browsers for specific bacterial hosts including Klebsiella, P. aeruginosa, Acinetobacter, and P. viridiflava.
ENIGMA CORAL
enigma_coral
Subsurface microbial ecology and geochemistry data from the ENIGMA SFA project at the Oak Ridge Reservation (ORR), Tennessee. Covers environmental sampling (groundwater from boreholes), 16S amplicon c...
NMDC Multi-omics
nmdc_arkin
Multi-omics analysis data from the National Microbiome Data Collaborative. Includes functional annotations, embeddings, metabolomics, proteomics, lipidomics, and microbial trait data for integrated mi...
NMDC BioSamples
nmdc_ncbi_biosamples
Harmonized NCBI BioSample metadata from the National Microbiome Data Collaborative. Standardized attributes, environmental triads, and dimensional statistics for biosample integration.
PlanetMicrobe
planetmicrobe
Marine microbial ecology data including oceanographic sampling campaigns, environmental samples, sequencing experiments, and taxonomic/functional profiles from metagenomic and amplicon studies.
PROTECT Pathogen Browser
protect_genomedepot
Pathogen genome browser using the GenomeDepot format. Contains genome, gene, annotation, strain, sample, and taxon data for pathogen surveillance and research.
Reference Collections
Foundational reference data used across other collections.
| Collection | ID | Description |
|---|---|---|
| UniRef Clusters | kbase_uniref |
Protein sequence clusters at 50%, 90%, and 100% identity thresholds from UniRef. |
| UniProt Annotations | kbase_uniprot |
UniProt protein annotations for bacterial and archaeal proteins. |
| Ontologies | kbase_ontology_source |
Ontology reference data including GO, KEGG, EC numbers, and other controlled vocabularies. |
Cross-Collection Analysis
Many research questions benefit from combining data across collections. For example:
- Pangenome + Fitness: Which accessory genes show fitness effects under specific conditions?
- Pangenome + Biochemistry: What metabolic pathways are enriched in core vs accessory genomes?
- Fitness + Phenotype: How do gene fitness scores correlate with measured phenotypes?
Check individual collection pages for documented cross-collection query patterns.