Microbial Discovery Forge

Collections

Connections

Cross-Collection Projects

Explorer Projects

Collection Network

Click a collection to see its connections. Edges come from explicit links (schema relationships) and projects that use multiple collections (project co-usage).

🧬 Pangenome Collection 8 connections

📊 Fitness Browser 9 connections

💬 KBase Genomes 2 connections

⚗ ModelSEED Biochemistry 5 connections

🌱 Phenotype Collection 1 connection

🦠 PhageFoundry Browsers 5 connections

🪰 ENIGMA CORAL 2 connections

🔬 NMDC Multi-omics 3 connections

🔬 NMDC BioSamples 1 connection

🌊 PlanetMicrobe 0 connections

🦠 PROTECT Pathogen Browser 4 connections

📄 UniRef Clusters 4 connections

📄 UniProt Annotations 0 connections

📄 Ontologies 0 connections

Cross-Collection Join Paths

Documented ways to connect data across collections. Each path shows the relationship and linking strategy between two collections.

🧬 Pangenome Collection ⟶ 📊 Fitness Browser

Source: kbase_ke_pangenome

Target: kescience_fitnessbrowser

Bridging projects:

pseudomonas_carbon_ecology cf_formulation_design amr_cofitness_networks metal_fitness_atlas amr_pangenome_atlas pathway_capability_dependency aromatic_catabolism_network metabolic_capability_dependency acinetobacter_adp1_explorer bacdive_phenotype_metal_tolerance conservation_vs_fitness fitness_modules webofmicrobes_explorer respiratory_chain_wiring cofitness_coinheritance counter_ion_effects field_vs_lab_fitness costly_dispensable_genes fw300_metabolic_consistency amr_fitness_cost truly_dark_genes functional_dark_matter snipe_defense_system resistance_hotspots

🧬 Pangenome Collection ⟶ ⚗ ModelSEED Biochemistry

Source: kbase_ke_pangenome

Target: kbase_msd_biochemistry

Bridging projects:

cf_formulation_design acinetobacter_adp1_explorer webofmicrobes_explorer essential_metabolome fw300_metabolic_consistency

🧬 Pangenome Collection ⟶ 💬 KBase Genomes

Source: kbase_ke_pangenome

Target: kbase_genomes

Bridging projects:

resistance_hotspots

📊 Fitness Browser ⟶ 🌱 Phenotype Collection

Source: kescience_fitnessbrowser

Target: kbase_phenotype

Schema-linked collections. See individual collection pages for join examples.

⚗ ModelSEED Biochemistry ⟶ 📊 Fitness Browser

Source: kbase_msd_biochemistry

Target: kescience_fitnessbrowser

Bridging projects:

cf_formulation_design acinetobacter_adp1_explorer webofmicrobes_explorer fw300_metabolic_consistency

🪰 ENIGMA CORAL ⟶ 🧬 Pangenome Collection

Source: enigma_coral

Target: kbase_ke_pangenome

Bridging projects:

enigma_contamination_functional_potential field_vs_lab_fitness

🪰 ENIGMA CORAL ⟶ 📊 Fitness Browser

Source: enigma_coral

Target: kescience_fitnessbrowser

Bridging projects:

field_vs_lab_fitness

🔬 NMDC Multi-omics ⟶ 🧬 Pangenome Collection

Source: nmdc_arkin

Target: kbase_ke_pangenome

Bridging projects:

amr_environmental_resistome functional_dark_matter nmdc_community_metabolic_ecology prophage_ecology phb_granule_ecology

🔬 NMDC BioSamples ⟶ 🔬 NMDC Multi-omics

Source: nmdc_ncbi_biosamples

Target: nmdc_arkin

Schema-linked collections. See individual collection pages for join examples.

Explorer Project Highlights

Deep-dive explorations of BERDL collections, characterizing their content, cross-collection links, and research potential.

AlphaEarth Embeddings, Geography & Environment Explorer

Completed

What do AlphaEarth environmental embeddings capture, and how do they relate to geographic coordinates and NCBI environment labels?

Pangenome Collection

1. Environmental samples show 3.4x stronger geographic signal than human-associated samples
2. AlphaEarth embeddings encode real geographic signal — not noise
3. Strong clinical/human sampling bias in the AlphaEarth subset
4. 36% of coordinates flagged as potential institutional addresses
5. UMAP reveals fine-grained embedding structure with environment-correlated clusters
6. Embedding space also shows taxonomic structure

View full project →

PaperBLAST Data Explorer

Completed

What does the `kescience_paperblast` collection contain, how current is it, and what are its coverage patterns across organisms, domains of life, and functional databases?

Fitness Browser

Finding 1: One organism dominates nearly half of all literature
Finding 2: 65.6% of genes have exactly one paper
Finding 3: Literature inequality is extreme — Lorenz curves
Finding 4: Bacterial research is concentrated on pathogens
Finding 5: 345K protein families from 816K sequences
Finding 6: 55% of protein families are dark or dim

View full project →

Web of Microbes Data Explorer

Completed

What does the `kescience_webofmicrobes` exometabolomics collection contain, which organisms overlap with the Fitness Browser, and how well do metabolite uptake/release profiles connect to pangenome-pr...

Pangenome Collection Fitness Browser ModelSEED Biochemistry

1. WoM Action Encoding Uses Four Distinct Semantics, Not Three
2. Two Direct Fitness Browser Strain Matches Plus Two Genus-Level Matches
3. 19 WoM-Produced Metabolites Are Tested as FB Carbon/Nitrogen Sources
4. 26.8% of WoM Metabolites Have Definitive ModelSEED Links (68.5% with Ambiguous Formula Matches)
5. ENIGMA Isolates Show Distinct "Metabolic Novelty Rates"
6. All WoM Genera Have Pangenome Species Clades

View full project →

Acinetobacter baylyi ADP1 Data Explorer

Completed

What is the scope and structure of a comprehensive ADP1 database, and how do its annotations, metabolic models, and phenotype data intersect with BERDL collections (pangenome, biochemistry, fitness, P...

Pangenome Collection Fitness Browser ModelSEED Biochemistry PhageFoundry Browsers UniRef Clusters

1. Rich Multi-Omics Database with 6 Data Modalities
2. Strong BERDL Connectivity: 4 of 5 Connection Types at >90% Match
3. Pangenome Cluster ID Bridge: 100% Mapping via Gene Junction Table
4. FBA and TnSeq Essentiality Agree 74% of the Time
5. Condition-Specific Fitness: Urea and Quinate Stand Apart
6. Essential Genes Are 6x More Likely to Have COG Annotations
7. Highly Conserved Core Metabolism Across 14 Genomes
8. 87% of Growth Predictions Depend on Gapfilled Reactions

View full project →