Architecture

Microbial Discovery Forge architecture diagram

Two Experiences

Microbial Discovery Forge Team

Headshot of Paramvir S. Dehal

Paramvir S. Dehal

Project lead & primary contact; PI BERIL and AI/ML Team Lead KBase; Scientific lead, primary developer of Microbial Discovery Forge

psdehal@lbl.govORCID

Headshot of Chris Mungall
Chris Mungall

Ontologies and metadata/data harmonization guidance

Headshot of William J. Riehl
Bill Riehl

Engineering support

Headshot of Dileep Kishore
Dileep Kishore

Skills development and evaluations contributions

Headshot of Justin Reese
Justin Reese

Evaluations contributions

Headshot of Nomi Harris
Nomi Harris

Project management support

Headshot of Mikaela Cashman
Mikaela Cashman

Data collections

Partner / Enabling Effort

Headshot of Adam Arkin
Adam Arkin

Vision/leadership for BERDL (KBase)

Headshot of Chris Henry
Chris Henry

Driving BERDL data resources and curation

Headshot of Gazi Mahmud
Gazi Mahmud

BERDL architect

Headshot of Kjiersten Fagnan
Kjiersten Fagnan

BERIL co-PI

Headshot of Ratna Saripalli
Ratna Saripalli

BERIL co-PI

Contributing Projects and Resources

Infrastructure

BERIL

The BER Intelligent Layer is the broader program providing AI integration capabilities for DOE BER data resources. The Research Observatory is developed under the BERIL project umbrella.

KBase

The Department of Energy Systems Biology Knowledgebase provides the platform ecosystem, community infrastructure, and the AI/ML team that supports this work.

BERDL

The BER Data Lakehouse is the underlying data resource hosting 35+ databases across 9 tenants of curated scientific datasets. Developed by the KBase team, BERDL provides the data foundation for all Observatory analyses.

Data Partners

NMDC

The National Microbiome Data Collaborative provides multi-omics microbiome data including annotations, metabolomics, and proteomics. Used in Observatory projects for environmental and functional analyses.

ENIGMA

The Ecosystems and Networks Integrated with Genes and Molecular Assemblies SFA provides environmental microbiology data from Oak Ridge field sites, including genomes, communities, and strain isolates.

JGI

The Joint Genome Institute provides upstream genome sequencing, assembly, and annotation pipelines. JGI's GOLD and IMG databases supply the foundational genomic data underlying BERDL collections.

PhageFoundry

The Phage Foundry provides species-specific genome browsers for phage-host interaction research, with curated data for Acinetobacter, Klebsiella, Pseudomonas, and other priority pathogens.

What is BERDL?

The KBase BER Data Lakehouse (BERDL) is a Delta Lakehouse hosting 35 databases across 9 tenants of curated scientific datasets for computational biology research. It provides:

  • Multiple data collections including pangenomes, mutant fitness data, biochemistry, multi-omics, marine ecology, phage research, and more
  • Spark SQL access for large-scale queries
  • JupyterHub integration for interactive analysis
  • REST API for programmatic access

Access BERDL JupyterHub

Primary BERDL Collections

Explore All 14 Collections

AI Integration

BERIL provides skills and plugins for AI assistants that enable:

  • Schema exploration - Understand available tables and columns
  • Query generation - Generate SQL queries for common analysis patterns
  • Data interpretation - Help interpret results in biological context
  • Cross-collection analysis - Combine data from multiple collections

AI assistants with BERIL skills can help researchers explore the data lakehouse more efficiently, reducing the overhead of learning new schemas and query patterns.

How to Cite

If you use the Microbial Discovery Forge or its findings in your work, please cite:

Paramvir S. Dehal. Microbial Discovery Forge — v0.1, 2026. Built on the KBase BER Data Lakehouse (BERDL).

Getting Started

For Researchers

  1. Browse existing projects for inspiration
  2. Explore collections to understand available data
  3. Read query pitfalls before writing SQL
  4. Access JupyterHub at hub.berdl.kbase.us

For Contributors

For contributions, please contact psdehal@lbl.gov.

  1. Pick a research idea or propose your own
  2. Create a project folder in projects/
  3. Document findings in docs/discoveries.md
  4. Share pitfalls and learnings as you go

Resources