AI Co-Scientist
A human-in-the-loop AI research workflow that plans analyses, executes them on BERDL, and turns results into reusable knowledge.
How It Works
The co-scientist follows a closed-loop research workflow. Each project moves through four stages, and the knowledge gained compounds across projects.
Plan
Define a research question and hypothesis. The co-scientist drafts a
RESEARCH_PLAN.md with specific analyses and expected outcomes.
Run
Execute analyses as Jupyter notebooks on BERDL JupyterHub. Query collections via Spark SQL, generate figures, and produce data outputs.
Learn
Synthesize findings into a REPORT.md with literature context.
Capture pitfalls, performance tips, and discoveries for the shared knowledge base.
Reuse
Skills, memory, and data products from each project compound. The next project starts with everything the system has learned so far.
Skills
View All 15 SkillsReusable tools the co-scientist invokes during research. Each skill encapsulates domain knowledge and workflow patterns learned from prior projects.
berdl
Query the KBase BERDL (BER Data Lakehouse) databases. Use when the user asks to explore pangenome data, query species information, get genome statisti...
berdl-discover
Discover and document BERDL databases. Use when the user wants to explore a new database, generate documentation for a database, or create a module fi...
berdl-ingest
Ingest a local dataset into the BERDL Lakehouse from a local (off-cluster) machine. Handles data format detection and preparation, MinIO upload, and D...
berdl-minio
Retrieve and use BERDL MinIO credentials and transfer result artifacts between BERDL object storage and the local machine. Use when exported query res...
berdl-query
Run SQL queries from a local machine against a provisioned BERDL Spark cluster using spark_connect_remote. Use when the user wants remote Spark comput...
berdl-review
Run an independent AI review of a project or research plan. Use when you want feedback without the full /submit checklist.
Example: The 5,526 Costly + Dispensable Genes
Research Question: What characterizes genes that are simultaneously burdensome (fitness improves when deleted) and not conserved in the pangenome? Are they mobile elements, recent acquisitions, degraded pathways, or something else?
Plan
RESEARCH_PLAN.md
Run
3 notebooks, 6 figures
Learn
REPORT.md with 547 references
Reuse
0 data products
Shared Memory
Knowledge captured during research that helps future projects avoid mistakes and build on prior findings.
Get Started
The co-scientist runs on BERDL JupyterHub with AI assistance via Claude Code and BERIL skills.