Primary Research Collections

Domain-Specific Collections

Reference Collections

Foundational reference data used across other collections.

Collection ID Description
UniRef Clusters kbase_uniref Protein sequence clusters at 50%, 90%, and 100% identity thresholds from UniRef.
UniProt Annotations kbase_uniprot UniProt protein annotations for bacterial and archaeal proteins.
Ontologies kbase_ontology_source Ontology reference data including GO, KEGG, EC numbers, and other controlled vocabularies.

Cross-Collection Analysis

Many research questions benefit from combining data across collections. For example:

  • Pangenome + Fitness: Which accessory genes show fitness effects under specific conditions?
  • Pangenome + Biochemistry: What metabolic pathways are enriched in core vs accessory genomes?
  • Fitness + Phenotype: How do gene fitness scores correlate with measured phenotypes?

Check individual collection pages for documented cross-collection query patterns.