02 Map Pathways To Fitness
Jupyter notebook from the Metabolic Capability vs Metabolic Dependency project.
NB02: Map Pathways to Fitness Browser Genes¶
Requires: BERDL JupyterHub (Spark access)
Purpose: Extract Fitness Browser gene-fitness data and map genes to GapMind pathway categories
using SEED subsystem annotations as a proxy for pathway membership. This approach replaces the
DIAMOND-based link table from conservation_vs_fitness by querying BERDL databases directly.
Inputs:
data/gapmind_genome_pathways.csv(from NB01)data/gapmind_pathway_summary.csv(from NB01)- BERDL:
kescience_fitnessbrowser.*,kbase_ke_pangenome.gtdb_metadata
Outputs:
data/organism_metadata.csv— FB organism infodata/organism_mapping.tsv— FB org → GapMind species mappingdata/seed_annotations.csv— SEED subsystem annotations per genedata/gene_fitness_aggregates.csv— Mean |t|, max |t| per genedata/essential_genes.tsv— Protein-coding genes absent from genefitnessdata/pathway_fitness_metrics.csv— Per-organism per-pathway fitness summary
Runtime: ~15-25 minutes (Spark aggregation over 27M genefitness rows)
import pandas as pd
import numpy as np
import sys
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')
spark = get_spark_session()
PROJECT_ROOT = Path('/home/cjneely/repos/BERIL-research-observatory/projects/metabolic_capability_dependency')
DATA_DIR = PROJECT_ROOT / 'data'
DATA_DIR.mkdir(exist_ok=True, parents=True)
# Also make src importable
sys.path.insert(0, str(PROJECT_ROOT / 'src'))
from pathway_utils import categorize_pathway, classify_pathway_dependency
print(f'Spark session: {spark}')
print(f'Data directory: {DATA_DIR}')
Spark session: <pyspark.sql.connect.session.SparkSession object at 0x70b52dc5d2b0> Data directory: /home/cjneely/repos/BERIL-research-observatory/projects/metabolic_capability_dependency/data
1. Explore Fitness Browser Schema¶
Inspect what tables are available and understand their structure.
print('=== Tables in kescience_fitnessbrowser ===')
spark.sql('SHOW TABLES IN kescience_fitnessbrowser').show(50, truncate=False)
print('\n=== Schema: organism ===')
spark.sql('DESCRIBE kescience_fitnessbrowser.organism').show(50, truncate=False)
print('\n=== Schema: gene ===')
spark.sql('DESCRIBE kescience_fitnessbrowser.gene').show(50, truncate=False)
print('\n=== Schema: genefitness ===')
spark.sql('DESCRIBE kescience_fitnessbrowser.genefitness').show(50, truncate=False)
print('\n=== Schema: seedannotation ===')
spark.sql('DESCRIBE kescience_fitnessbrowser.seedannotation').show(50, truncate=False)
=== Tables in kescience_fitnessbrowser === +------------------------+---------------------------------+-----------+ |namespace |tableName |isTemporary| +------------------------+---------------------------------+-----------+ |kescience_fitnessbrowser|organism |false | |kescience_fitnessbrowser|gene |false | |kescience_fitnessbrowser|ortholog |false | |kescience_fitnessbrowser|experiment |false | |kescience_fitnessbrowser|genefitness |false | |kescience_fitnessbrowser|cofit |false | |kescience_fitnessbrowser|specificphenotype |false | |kescience_fitnessbrowser|genedomain |false | |kescience_fitnessbrowser|genefeature |false | |kescience_fitnessbrowser|straindataseek |false | |kescience_fitnessbrowser|compounds |false | |kescience_fitnessbrowser|mediacomponents |false | |kescience_fitnessbrowser|locusxref |false | |kescience_fitnessbrowser|besthitkegg |false | |kescience_fitnessbrowser|keggmember |false | |kescience_fitnessbrowser|kgroupdesc |false | |kescience_fitnessbrowser|kgroupec |false | |kescience_fitnessbrowser|besthitswissprot |false | |kescience_fitnessbrowser|swissprotdesc |false | |kescience_fitnessbrowser|besthitmetacyc |false | |kescience_fitnessbrowser|metacycpathway |false | |kescience_fitnessbrowser|metacycpathwayreaction |false | |kescience_fitnessbrowser|metacycpathwayreactionpredecessor|false | |kescience_fitnessbrowser|metacycpathwayprimarycompound |false | |kescience_fitnessbrowser|metacycpathwaycoverage |false | |kescience_fitnessbrowser|metacycreaction |false | |kescience_fitnessbrowser|metacycreactioncompound |false | |kescience_fitnessbrowser|metacycreactionec |false | |kescience_fitnessbrowser|metacyccompound |false | |kescience_fitnessbrowser|specog |false | |kescience_fitnessbrowser|conservedcofit |false | |kescience_fitnessbrowser|seedannotation |false | |kescience_fitnessbrowser|seedclass |false | |kescience_fitnessbrowser|seedroles |false | |kescience_fitnessbrowser|seedannotationtoroles |false | |kescience_fitnessbrowser|seedrolereaction |false | |kescience_fitnessbrowser|seedreaction |false | |kescience_fitnessbrowser|ecinfo |false | |kescience_fitnessbrowser|keggcompound |false | |kescience_fitnessbrowser|keggconf |false | |kescience_fitnessbrowser|keggmap |false | |kescience_fitnessbrowser|reannotation |false | |kescience_fitnessbrowser|reannotationec |false | |kescience_fitnessbrowser|publication |false | |kescience_fitnessbrowser|scaffoldseq |false | |kescience_fitnessbrowser|fitbyexp_dyella79 |false | |kescience_fitnessbrowser|fitbyexp_kang |false | |kescience_fitnessbrowser|fitbyexp_burk376 |false | |kescience_fitnessbrowser|fitbyexp_sb2b |false | |kescience_fitnessbrowser|fitbyexp_ralstoniauw163 |false | +------------------------+---------------------------------+-----------+ only showing top 50 rows === Schema: organism === +----------+---------+-------+ |col_name |data_type|comment| +----------+---------+-------+ |orgId |string |NULL | |division |string |NULL | |genus |string |NULL | |species |string |NULL | |strain |string |NULL | |taxonomyId|string |NULL | +----------+---------+-------+ === Schema: gene === +----------+---------+-------+ |col_name |data_type|comment| +----------+---------+-------+ |orgId |string |NULL | |locusId |string |NULL | |sysName |string |NULL | |scaffoldId|string |NULL | |begin |string |NULL | |end |string |NULL | |type |string |NULL | |strand |string |NULL | |gene |string |NULL | |desc |string |NULL | |GC |string |NULL | +----------+---------+-------+ === Schema: genefitness === +--------+---------+-------+ |col_name|data_type|comment| +--------+---------+-------+ |orgId |string |NULL | |locusId |string |NULL | |expName |string |NULL | |fit |string |NULL | |t |string |NULL | +--------+---------+-------+ === Schema: seedannotation === +---------+---------+-------+ |col_name |data_type|comment| +---------+---------+-------+ |orgId |string |NULL | |locusId |string |NULL | |seed_desc|string |NULL | +---------+---------+-------+
2. Get Fitness Browser Organism Metadata¶
organisms = spark.sql('SELECT * FROM kescience_fitnessbrowser.organism').toPandas()
print(f'FB organisms: {len(organisms)}')
print('\nColumns:', organisms.columns.tolist())
print('\nSample:')
print(organisms.head(10).to_string())
organisms.to_csv(DATA_DIR / 'organism_metadata.csv', index=False)
print(f'\nSaved to: {DATA_DIR}/organism_metadata.csv')
FB organisms: 48
Columns: ['orgId', 'division', 'genus', 'species', 'strain', 'taxonomyId']
Sample:
orgId division genus species strain taxonomyId
0 acidovorax_3H11 Betaproteobacteria Acidovorax sp. GW101-3H11 12916
1 ANA3 Gammaproteobacteria Shewanella sp. ANA-3 94122
2 azobra Alphaproteobacteria Azospirillum brasilense Sp245 1064539
3 BFirm Betaproteobacteria Burkholderia phytofirmans PsJN 398527
4 Btheta Bacteroidetes Bacteroides thetaiotaomicron VPI-5482 226186
5 Burk376 Betaproteobacteria Paraburkholderia bryophila 376MFSha3.1 1169143
6 Caulo Alphaproteobacteria Caulobacter crescentus NA1000 565050
7 Cola Bacteroidetes Echinicola vietnamensis KMM 6221, DSM 17526 926556
8 Cup4G11 Betaproteobacteria Cupriavidus basilensis FW507-4G11 68895
9 Dda3937 Gammaproteobacteria Dickeya dadantii 3937 198628
Saved to: /home/cjneely/repos/BERIL-research-observatory/projects/metabolic_capability_dependency/data/organism_metadata.csv
3. Inspect Raw GapMind Data¶
Check the sequence_scope column to see if it contains gene identifiers.
If so, we can use it for more precise pathway-gene mapping.
print('Raw gapmind_pathways sample (pre-aggregation):')
raw_sample = spark.sql("""
SELECT *
FROM kbase_ke_pangenome.gapmind_pathways
WHERE pathway = 'his'
LIMIT 10
""").toPandas()
print(raw_sample.to_string())
print('\n\nDistinct sequence_scope values:')
sc_vals = spark.sql("""
SELECT sequence_scope, COUNT(*) as n
FROM kbase_ke_pangenome.gapmind_pathways
GROUP BY sequence_scope
ORDER BY n DESC
LIMIT 20
""").toPandas()
print(sc_vals.to_string())
Raw gapmind_pathways sample (pre-aggregation):
genome_id pathway clade_name metabolic_category sequence_scope nHi nMed nLo score score_category score_simplified
0 GCA_021627005.1 his s__Cryptobacteroides_sp900544195--GB_GCA_900544195.1 aa core 8 1 2 3.9 steps_missing_low 0.0
1 GCA_021630165.1 his s__Cryptobacteroides_sp900544195--GB_GCA_900544195.1 aa core 9 1 1 6.9 steps_missing_low 0.0
2 GCA_021623465.1 his s__Cryptobacteroides_sp900544195--GB_GCA_900544195.1 aa core 9 1 1 6.9 steps_missing_low 0.0
3 GCA_021635085.1 his s__Cryptobacteroides_sp900544195--GB_GCA_900544195.1 aa core 5 1 5 -5.1 steps_missing_low 0.0
4 GCA_021621345.1 his s__Cryptobacteroides_sp900544195--GB_GCA_900544195.1 aa core 9 1 1 6.9 steps_missing_low 0.0
5 GCA_900753265.1 his s__Cryptobacteroides_sp900544195--GB_GCA_900544195.1 aa core 9 1 1 6.9 steps_missing_low 0.0
6 GCA_934693025.1 his s__Cryptobacteroides_sp900544195--GB_GCA_900544195.1 aa core 9 1 1 6.9 steps_missing_low 0.0
7 GCA_934700295.1 his s__Cryptobacteroides_sp900544195--GB_GCA_900544195.1 aa core 1 1 9 -17.1 steps_missing_low 0.0
8 GCA_934725755.1 his s__Cryptobacteroides_sp900544195--GB_GCA_900544195.1 aa core 9 1 1 6.9 steps_missing_low 0.0
9 GCA_016715315.1 his s__Rubrivivax_sp016709385--GB_GCA_016709385.1 aa core 10 0 1 8.0 steps_missing_low 0.0
Distinct sequence_scope values:
sequence_scope n
0 aux 116097525
1 all 110719804
2 core 78653951
4. Get SEED Subsystem Annotations¶
SEED subsystems provide a functional categorization of genes that overlaps with GapMind pathways. We use them as a proxy for pathway membership when direct gene-pathway links are unavailable.
print('Sample SEED annotations:')
seed_sample = spark.sql('SELECT * FROM kescience_fitnessbrowser.seedannotation LIMIT 10').toPandas()
print(seed_sample.to_string())
print('\nColumns:', seed_sample.columns.tolist())
Sample SEED annotations:
orgId locusId seed_desc
0 Pedo557 CA265_RS14400 DNA topoisomerase IB (poxvirus type) (EC 5.99.1.2)
1 Pedo557 CA265_RS14405 Methionyl-tRNA formyltransferase (EC 2.1.2.9)
2 Pedo557 CA265_RS14410 Aminodeoxychorismate lyase (EC 4.1.3.38)
3 Pedo557 CA265_RS14415 Ribosomal large subunit pseudouridine synthase D (EC 4.2.1.70)
4 Pedo557 CA265_RS14425 1-aminocyclopropane-1-carboxylate deaminase (EC 3.5.99.7)
5 Pedo557 CA265_RS14430 Alpha-L-fucosidase (EC 3.2.1.51)
6 Pedo557 CA265_RS14440 Transcriptional regulator, HxlR family
7 Pedo557 CA265_RS14445 3-oxoacyl-[acyl-carrier protein] reductase (EC 1.1.1.100)
8 Pedo557 CA265_RS14455 Probable L-lysine-epsilon aminotransferase (EC 2.6.1.36) (L-lysine aminotransferase) (Lysine 6-aminotransferase)
9 Pedo557 CA265_RS14465 Glutaryl-CoA dehydrogenase (EC 1.3.99.7)
Columns: ['orgId', 'locusId', 'seed_desc']
# Pull all SEED annotations — manageable size (~200-400K rows for 48 organisms)
# The seedannotation table has `seed_desc` (gene role description), not `subsystem`.
# We match GapMind pathways against these role descriptions via keywords.
seed_all = spark.sql("""
SELECT orgId, locusId, seed_desc
FROM kescience_fitnessbrowser.seedannotation
""").toPandas()
print(f'Total SEED annotations: {len(seed_all):,}')
print(f'Unique role descriptions: {seed_all["seed_desc"].nunique():,}')
print(f'Organisms with annotations: {seed_all["orgId"].nunique()}')
print('\nTop 40 most common role descriptions:')
print(seed_all.groupby('seed_desc').size().sort_values(ascending=False).head(40).to_string())
seed_all.to_csv(DATA_DIR / 'seed_annotations.csv', index=False)
print(f'\nSaved to: {DATA_DIR}/seed_annotations.csv')
Total SEED annotations: 177,519 Unique role descriptions: 23,049 Organisms with annotations: 48 Top 40 most common role descriptions: seed_desc Mobile element protein 1494 Transcriptional regulator, LysR family 983 diguanylate cyclase/phosphodiesterase (GGDEF & EAL domains) with PAS/PAC sensor(s) 543 Methyl-accepting chemotaxis protein I (serine chemoreceptor protein) 532 Probable transmembrane protein 522 3-oxoacyl-[acyl-carrier protein] reductase (EC 1.1.1.100) 501 Permease of the drug/metabolite transporter (DMT) superfamily 450 Methyl-accepting chemotaxis protein 401 Permeases of the major facilitator superfamily 371 membrane protein, putative 320 Transcriptional regulator 310 Glutamate synthase [NADPH] large chain (EC 1.4.1.13) 289 Phosphonate ABC transporter phosphate-binding periplasmic component (TC 3.A.1.9.1) 278 Enoyl-CoA hydratase (EC 4.2.1.17) 265 putative membrane protein 261 Transcriptional regulator, GntR family 242 Transcriptional regulator, MarR family 231 Transcriptional regulator, GntR family domain / Aspartate aminotransferase (EC 2.6.1.1) 230 Probable Co/Zn/Cd efflux system membrane fusion protein 220 FIG074102: hypothetical protein 219 D-3-phosphoglycerate dehydrogenase (EC 1.1.1.95) 218 Transcriptional regulator, IclR family 218 Aldehyde dehydrogenase (EC 1.2.1.3) 213 Glycosyltransferase 211 Histone acetyltransferase HPA2 and related acetyltransferases 211 Alcohol dehydrogenase (EC 1.1.1.1) 208 Signal transduction histidine kinase 205 Glutathione S-transferase (EC 2.5.1.18) 204 Tricarboxylate transport protein TctC 198 Aspartate aminotransferase (EC 2.6.1.1) 197 Ferrichrome-iron receptor 197 Cobalt-zinc-cadmium resistance protein CzcA; Cation efflux system protein CusA 195 Transcriptional regulator, AsnC family 186 UDP-glucose 4-epimerase (EC 5.1.3.2) 183 Outer membrane protein assembly factor YaeT precursor 178 Transcriptional regulator, ArsR family 178 Short-chain dehydrogenase/reductase SDR 177 PROBABLE TRANSMEMBRANE PROTEIN 177 Nucleoside-diphosphate-sugar epimerases 176 High-affinity branched-chain amino acid transport system permease protein LivH (TC 3.A.1.4.1) 175 Saved to: /home/cjneely/repos/BERIL-research-observatory/projects/metabolic_capability_dependency/data/seed_annotations.csv
5. Build GapMind → SEED Keyword Mapping¶
The seed_desc column contains gene role descriptions (e.g. "Histidinol dehydrogenase (EC 1.1.1.23)").
We keyword-match these against GapMind pathway names. Amino acid biosynthesis genes typically
include the pathway name in their description (e.g. "Arginine biosynthesis protein ArgJ"),
so this proxy works well for those categories.
# Keyword map: GapMind pathway name → list of strings to search in seed_desc values
# Each keyword is checked as a case-insensitive substring of the role description.
# Amino acid biosynthesis descriptions typically contain the amino acid name directly.
PATHWAY_SEED_KEYWORDS = {
# ── Amino acid biosynthesis ──────────────────────────────────────
'alanine': ['alanine biosyn', 'alanine aminotransferase'],
'arg': ['arginine biosyn', 'argininosuccinate', 'carbamoyl phosphate synthase', 'ornithine carbamoyltransferase'],
'arginine': ['arginine catab', 'arginine deiminase', 'arginine decarboxylase'],
'asn': ['asparagine biosyn', 'asparagine synthetase'],
'asparagine': ['asparagine biosyn', 'asparagine synthetase'],
'aspartate': ['aspartate biosyn', 'aspartate transaminase', 'aspartate aminotransferase'],
'chorismate': ['chorismate', 'shikimate', '3-dehydroquinate', 'EPSP synthase', 'aromatic amino acid biosyn'],
'cys': ['cysteine biosyn', 'cysteine synthase', 'serine acetyltransferase'],
'cysteine': ['cysteine biosyn', 'cysteine synthase', 'serine acetyltransferase'],
'gln': ['glutamine synthetase', 'glutamine biosyn'],
'glu': ['glutamate biosyn', 'glutamate synthase', 'glutamine oxoglutarate aminotransferase'],
'glutamate': ['glutamate biosyn', 'glutamate synthase', 'glutamate dehydrogenase'],
'glutamine': ['glutamine synthetase', 'glutamine biosyn'],
'gly': ['glycine biosyn', 'serine hydroxymethyltransferase', 'glycine cleavage', 'threonine aldolase'],
'glycine': ['glycine biosyn', 'serine hydroxymethyltransferase', 'glycine cleavage'],
'his': ['histidine biosyn', 'histidinol', 'imidazoleglycerol', 'phosphoribosyl-atp'],
'histidine': ['histidine biosyn', 'histidinol', 'imidazoleglycerol'],
'ile': ['isoleucine biosyn', 'threonine dehydratase', 'acetolactate synthase', 'dihydroxyacid dehydratase', 'branched-chain amino acid biosyn'],
'isoleucine': ['isoleucine biosyn', 'threonine dehydratase'],
'leu': ['leucine biosyn', '2-isopropylmalate', 'branched-chain amino acid biosyn'],
'leucine': ['leucine biosyn', '2-isopropylmalate'],
'lys': ['lysine biosyn', 'diaminopimelate', 'aspartate kinase', 'aspartate semialdehyde'],
'lysine': ['lysine biosyn', 'diaminopimelate', 'lysine aminotransferase'],
'met': ['methionine biosyn', 'homocysteine methyltransferase', 'cystathionine', 'O-succinylhomoserine'],
'methionine': ['methionine biosyn', 'methionine synthase', 'cystathionine'],
'phe': ['phenylalanine biosyn', 'chorismate mutase', 'prephenate dehydratase', 'aromatic amino acid biosyn'],
'phenylalanine':['phenylalanine biosyn', 'phenylalanine aminotransferase'],
'pro': ['proline biosyn', 'gamma-glutamyl kinase', 'pyrroline-5-carboxylate'],
'proline': ['proline biosyn', 'gamma-glutamyl kinase', 'pyrroline-5-carboxylate'],
'ser': ['serine biosyn', 'phosphoserine', 'phosphoglycerate dehydrogenase'],
'serine': ['serine biosyn', 'phosphoserine', 'phosphoglycerate dehydrogenase'],
'thr': ['threonine biosyn', 'homoserine kinase', 'threonine synthase', 'aspartate kinase'],
'threonine': ['threonine biosyn', 'homoserine kinase', 'threonine synthase'],
'trp': ['tryptophan biosyn', 'anthranilate', 'indole-3-glycerol phosphate', 'tryptophan synthase'],
'tryptophan': ['tryptophan biosyn', 'anthranilate', 'tryptophan synthase'],
'tyr': ['tyrosine biosyn', 'prephenate dehydrogenase', 'chorismate mutase', 'aromatic amino acid biosyn'],
'tyrosine': ['tyrosine biosyn', 'prephenate dehydrogenase'],
'val': ['valine biosyn', 'acetolactate synthase', 'dihydroxyacid dehydratase', 'branched-chain amino acid biosyn'],
'valine': ['valine biosyn', 'acetolactate synthase'],
# ── Carbon source utilization ─────────────────────────────────────
'2-oxoglutarate': ['2-oxoglutarate', 'alpha-ketoglutarate', '2-oxoglutarate dehydrogenase'],
'4-hydroxybenzoate': ['4-hydroxybenzoate', 'hydroxybenzoate'],
'acetate': ['acetate kinase', 'phosphotransacetylase', 'acetyl-coa synthetase'],
'arabinose': ['arabinose isomerase', 'ribulokinase', 'l-arabinose', 'arabinose transport'],
'cellobiose': ['cellobiose', 'beta-glucosidase', 'phospho-beta-glucosidase'],
'citrate': ['citrate lyase', 'citrate synthase', 'citrate transport'],
'D-alanine': ['d-alanine', 'alanine racemase'],
'D-lactate': ['d-lactate', 'd-lactate dehydrogenase'],
'D-serine': ['d-serine', 'd-serine dehydratase', 'd-serine deaminase'],
'deoxyinosine': ['purine nucleoside phosphorylase', 'deoxyinosine', 'nucleoside catab'],
'deoxyribose': ['deoxyribose', '2-deoxyribose', 'deoxyribose-phosphate aldolase'],
'ethanol': ['alcohol dehydrogenase', 'aldehyde dehydrogenase', 'ethanol oxidation'],
'fructose': ['fructokinase', 'fructose-bisphosphate', 'fructose pts', 'fructose transport'],
'galactose': ['galactokinase', 'galactose-1-phosphate', 'galactose mutarotase', 'gal operon'],
'glycerol': ['glycerol kinase', 'glycerol-3-phosphate', 'glycerol facilitator'],
'L-lactate': ['l-lactate dehydrogenase', 'l-lactate permease'],
'L-malate': ['malate dehydrogenase', 'malate permease', 'malic enzyme'],
'lactose': ['lactose permease', 'beta-galactosidase', 'lactose pts'],
'maltose': ['maltose-binding protein', 'maltodextrin', 'alpha-glucosidase'],
'mannose': ['mannose-6-phosphate isomerase', 'mannose pts', 'mannose transport'],
'NAG': ['n-acetylglucosamine', 'glucosamine-6-phosphate', 'nagB', 'nagA'],
'ribose': ['ribose transport', 'ribokinase', 'd-ribose'],
'sorbitol': ['sorbitol-6-phosphate', 'glucitol', 'l-iditol dehydrogenase'],
'sucrose': ['sucrose-6-phosphate', 'sucrose pts', 'invertase', 'sucrose phosphorylase'],
'thymidine': ['thymidine phosphorylase', 'thymine permease'],
'trehalose': ['trehalose-6-phosphate', 'trehalose pts', 'trehalase'],
'xylose': ['xylose isomerase', 'xylulokinase', 'd-xylose'],
# ── Other ────────────────────────────────────────────────────────
'citrulline': ['citrulline', 'ornithine carbamoyltransferase', 'argininosuccinate'],
'putrescine': ['putrescine', 'spermidine synthase', 'ornithine decarboxylase', 'agmatinase'],
}
def find_matching_descs(pathway: str, all_descs: np.ndarray) -> list:
"""Return seed_desc values that match keywords for the given GapMind pathway."""
keywords = PATHWAY_SEED_KEYWORDS.get(pathway, [pathway.lower()])
matches = [
desc for desc in all_descs
if any(kw.lower() in str(desc).lower() for kw in keywords)
]
return matches
# Load GapMind pathway list from NB01 summary
pathway_summary = pd.read_csv(DATA_DIR / 'gapmind_pathway_summary.csv')
pathways_list = pathway_summary['pathway'].tolist()
all_descs = seed_all['seed_desc'].dropna().unique()
# Build the mapping: pathway → matched seed_desc values
pathway_to_descs = {}
no_match = []
for pathway in sorted(pathways_list):
matched = find_matching_descs(pathway, all_descs)
pathway_to_descs[pathway] = matched
if not matched:
no_match.append(pathway)
matched_count = sum(1 for v in pathway_to_descs.values() if v)
print(f'Pathways with ≥1 seed_desc match: {matched_count} / {len(pathways_list)}')
print(f'Pathways with NO match: {len(no_match)}')
if no_match:
print(' No-match pathways:', sorted(no_match))
print('\nMapping sample (first 20 pathways, up to 3 matched descs each):')
for pw, descs in sorted(pathway_to_descs.items())[:20]:
print(f' {pw} ({len(descs)} matches): {descs[:3]}')
Pathways with ≥1 seed_desc match: 76 / 80 Pathways with NO match: 4 No-match pathways: ['deoxyribonate', 'myoinositol', 'phenylalanine', 'tyrosine'] Mapping sample (first 20 pathways, up to 3 matched descs each): 2-oxoglutarate (27 matches): ['Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)', '2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)', '4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)'] 4-hydroxybenzoate (12 matches): ['3-polyprenyl-4-hydroxybenzoate carboxy-lyase UbiX (EC 4.1.1.-)', '4-hydroxybenzoate transporter', 'P-hydroxybenzoate hydroxylase (EC 1.14.13.2)'] D-alanine (15 matches): ['D-alanyl-D-alanine carboxypeptidase (EC 3.4.16.4)', 'UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopimelate--D-alanyl-D-alanine ligase (EC 6.3.2.10)', 'D-serine/D-alanine/glycine transporter'] D-lactate (4 matches): ['D-lactate dehydrogenase (EC 1.1.1.28)', 'Predicted D-lactate dehydrogenase, Fe-S protein, FAD/FMN-containing', 'D-Lactate dehydrogenase, cytochrome c-dependent (EC 1.1.2.4)'] D-serine (5 matches): ['D-serine dehydratase transcriptional activator', 'D-serine/D-alanine/glycine transporter', 'D-serine deaminase (EC 4.3.1.18)'] L-lactate (8 matches): ['Predicted L-lactate dehydrogenase, Iron-sulfur cluster-binding subunit YkgF', 'L-lactate permease', 'L-lactate dehydrogenase (EC 1.1.2.3)'] L-malate (10 matches): ['Putative malate dehydrogenase (EC 1.1.1.37), similar to archaeal MJ1425', '3-isopropylmalate dehydrogenase (EC 1.1.1.85)', 'NADP-dependent malic enzyme (EC 1.1.1.40)'] NAG (34 matches): ['N-acetylglucosamine-6-phosphate deacetylase (EC 3.5.1.25)', 'UDP-N-acetylglucosamine 4,6-dehydratase (EC 4.2.1.-)', 'UDP-N-acetylglucosamine 2-epimerase (EC 5.1.3.14)'] acetate (7 matches): ['Acetoacetyl-CoA synthetase (EC 6.2.1.16)', 'Acetyl-CoA synthetase (ADP-forming) alpha and beta chains, putative', 'Acetoacetyl-CoA synthetase (EC 6.2.1.16) / Long-chain-fatty-acid--CoA ligase (EC 6.2.1.3)'] alanine (1 matches): ['D-alanine aminotransferase (EC 2.6.1.21)'] arabinose (21 matches): ['4-amino-4-deoxy-L-arabinose transferase and related glycosyltransferases of PMT family', '4-amino-4-deoxy-L-arabinose transferase', 'Hydrolase, alpha/beta fold family functionally coupled to Phosphoribulokinase'] arg (5 matches): ['Argininosuccinate lyase (EC 4.3.2.1)', 'Ornithine carbamoyltransferase (EC 2.1.3.3)', 'Argininosuccinate synthase (EC 6.3.4.5)'] arginine (9 matches): ['Biosynthetic arginine decarboxylase (EC 4.1.1.19)', 'Ornithine decarboxylase (EC 4.1.1.17) / Arginine decarboxylase (EC 4.1.1.19)', 'Arginine deiminase (EC 3.5.3.6)'] asn (2 matches): ['Asparagine synthetase [glutamine-hydrolyzing] (EC 6.3.5.4)', 'Asparagine synthetase [glutamine-hydrolyzing] (EC 6.3.5.4) AsnH'] asparagine (2 matches): ['Asparagine synthetase [glutamine-hydrolyzing] (EC 6.3.5.4)', 'Asparagine synthetase [glutamine-hydrolyzing] (EC 6.3.5.4) AsnH'] aspartate (7 matches): ['Aspartate aminotransferase (EC 2.6.1.1)', 'Transcriptional regulator, GntR family domain / Aspartate aminotransferase (EC 2.6.1.1)', 'Histidinol-phosphate aminotransferase (EC 2.6.1.9) @ Aspartate aminotransferase (EC 2.6.1.1)'] cellobiose (14 matches): ['Periplasmic beta-glucosidase (EC 3.2.1.21)', 'Beta-glucosidase (EC 3.2.1.21)', 'PTS system, cellobiose-specific IIC component (EC 2.7.1.69)'] chorismate (36 matches): ['Aminodeoxychorismate lyase (EC 4.1.3.38)', 'Shikimate 5-dehydrogenase I alpha (EC 1.1.1.25)', 'Shikimate kinase I (EC 2.7.1.71)'] citrate (24 matches): ['Uncharacterized transporter, similarity to citrate transporter', 'Iron(III) dicitrate transport protein FecA @ Iron siderophore receptor protein', 'Methylisocitrate lyase (EC 4.1.3.30)'] citrulline (5 matches): ['Argininosuccinate lyase (EC 4.3.2.1)', 'Ornithine carbamoyltransferase (EC 2.1.3.3)', 'Argininosuccinate synthase (EC 6.3.4.5)']
6. Extract Gene-Level Fitness Aggregates¶
Query mean |t-score| and max |t-score| per gene across all conditions. This takes ~5-10 minutes (aggregates 27M rows → ~150K gene records).
gene_fitness = spark.sql("""
SELECT
orgId,
locusId,
COUNT(*) AS n_conditions,
AVG(ABS(fit)) AS mean_abs_fit,
AVG(ABS(t)) AS mean_abs_t,
MAX(ABS(t)) AS max_abs_t,
percentile_approx(ABS(t), 0.5) AS median_abs_t
FROM kescience_fitnessbrowser.genefitness
GROUP BY orgId, locusId
""").toPandas()
print(f'Fitness data: {len(gene_fitness):,} gene records')
print(f'Organisms: {gene_fitness["orgId"].nunique()}')
print('\nSample:')
print(gene_fitness.head(10).to_string())
print('\nMean |t| distribution:')
print(gene_fitness['mean_abs_t'].describe())
gene_fitness.to_csv(DATA_DIR / 'gene_fitness_aggregates.csv', index=False)
print(f'\nSaved to: {DATA_DIR}/gene_fitness_aggregates.csv')
Fitness data: 182,447 gene records Organisms: 48 Sample: orgId locusId n_conditions mean_abs_fit mean_abs_t max_abs_t median_abs_t 0 ANA3 7022501 107 0.065629 0.533129 1.597991 0.458542 1 ANA3 7022518 107 0.298570 1.018012 4.382178 0.864578 2 ANA3 7022523 107 0.327679 0.793829 2.290653 0.736008 3 ANA3 7022525 107 0.204495 0.806039 3.869253 0.662241 4 ANA3 7022527 107 0.309969 1.188399 2.761472 1.205314 5 ANA3 7022535 107 0.387406 1.143489 3.471375 1.099596 6 ANA3 7022550 107 0.291871 0.683936 2.526782 0.618861 7 ANA3 7022556 107 0.166246 0.654944 2.304738 0.571546 8 ANA3 7022561 107 0.087301 0.581252 1.901286 0.481979 9 ANA3 7022572 107 0.472985 0.672740 3.883407 0.469636 Mean |t| distribution: count 182447.000000 mean 1.074785 std 1.007088 min 0.114294 25% 0.674028 50% 0.789880 75% 1.000928 max 22.283387 Name: mean_abs_t, dtype: float64 Saved to: /home/cjneely/repos/BERIL-research-observatory/projects/metabolic_capability_dependency/data/gene_fitness_aggregates.csv
7. Identify Essential Genes¶
Putative essential genes = protein-coding genes (type='1') with no entries in genefitness. Absence from genefitness means no viable transposon mutants were recovered under library conditions.
essential = spark.sql("""
SELECT
g.orgId,
g.locusId,
g.desc AS gene_desc
FROM kescience_fitnessbrowser.gene g
LEFT JOIN (
SELECT DISTINCT orgId, locusId
FROM kescience_fitnessbrowser.genefitness
) gf
ON g.orgId = gf.orgId
AND g.locusId = gf.locusId
WHERE g.type = '1'
AND gf.locusId IS NULL
""").toPandas()
print(f'Putative essential genes: {len(essential):,}')
print(f'Organisms: {essential["orgId"].nunique()}')
print('\nEssential genes per organism:')
print(essential.groupby('orgId').size().sort_values(ascending=False).to_string())
essential.to_csv(DATA_DIR / 'essential_genes.tsv', sep='\t', index=False)
print(f'\nSaved to: {DATA_DIR}/essential_genes.tsv')
Putative essential genes: 41,059 Organisms: 48 Essential genes per organism: orgId BFirm 1760 pseudo1_N1B4 1639 Burk376 1408 Magneto 1334 azobra 1310 RalstoniaGMI1000 1103 WCS417 1092 RalstoniaUW163 1091 Smeli 1087 acidovorax_3H11 1040 Dino 1007 RalstoniaBSBF1503 1007 Cup4G11 985 pseudo6_N2E2 968 psRCH2 920 PS 886 PV4 861 SyringaeB728a_mexBdelta 852 RalstoniaPSI07 837 HerbieS 837 Koxy 824 SyringaeB728a 818 pseudo5_N2C3_1 813 pseudo13_GW456_L13 806 MR1 805 Putida 794 Korea 789 Phaeo 781 SynE 771 Marino 762 Btheta 762 pseudo3_N2E3 742 ANA3 717 Dyella79 706 DvH 678 Ddia6719 677 Cola 675 Miya 669 Pedo557 605 Dda3937 599 Caulo 581 Ponti 577 DdiaME23 571 Keio 561 SB2B 546 Methanococcus_S2 503 Kang 474 Methanococcus_JJ 429 Saved to: /home/cjneely/repos/BERIL-research-observatory/projects/metabolic_capability_dependency/data/essential_genes.tsv
8. Match FB Organisms to GapMind Species¶
Use NCBI taxonomy IDs to link FB organisms to pangenome species clades. GapMind pathway completeness data can then be retrieved for those species.
# Inspect gtdb_metadata schema
print('=== gtdb_metadata schema ===')
spark.sql('DESCRIBE kbase_ke_pangenome.gtdb_metadata').show(50, truncate=False)
print('\nSample rows:')
spark.sql('SELECT * FROM kbase_ke_pangenome.gtdb_metadata LIMIT 5').show(5, truncate=False)
=== gtdb_metadata schema === +---------------------------------------+---------+-------+ |col_name |data_type|comment| +---------------------------------------+---------+-------+ |accession |string |NULL | |ambiguous_bases |string |NULL | |checkm_completeness |string |NULL | |checkm_contamination |string |NULL | |checkm_marker_count |string |NULL | |checkm_marker_lineage |string |NULL | |checkm_marker_set_count |string |NULL | |checkm_strain_heterogeneity |string |NULL | |coding_bases |string |NULL | |coding_density |string |NULL | |contig_count |string |NULL | |gc_count |string |NULL | |gc_percentage |string |NULL | |genome_size |string |NULL | |gtdb_genome_representative |string |NULL | |gtdb_representative |string |NULL | |gtdb_taxonomy |string |NULL | |gtdb_type_designation_ncbi_taxa |string |NULL | |gtdb_type_designation_ncbi_taxa_sources|string |NULL | |gtdb_type_species_of_genus |string |NULL | |l50_contigs |string |NULL | |l50_scaffolds |string |NULL | |longest_contig |string |NULL | |longest_scaffold |string |NULL | |lsu_23s_contig_len |string |NULL | |lsu_23s_count |string |NULL | |lsu_23s_length |string |NULL | |lsu_23s_query_id |string |NULL | |lsu_5s_contig_len |string |NULL | |lsu_5s_count |string |NULL | |lsu_5s_length |string |NULL | |lsu_5s_query_id |string |NULL | |lsu_silva_23s_blast_align_len |string |NULL | |lsu_silva_23s_blast_bitscore |string |NULL | |lsu_silva_23s_blast_evalue |string |NULL | |lsu_silva_23s_blast_perc_identity |string |NULL | |lsu_silva_23s_blast_subject_id |string |NULL | |lsu_silva_23s_taxonomy |string |NULL | |mean_contig_length |string |NULL | |mean_scaffold_length |string |NULL | |mimag_high_quality |string |NULL | |mimag_low_quality |string |NULL | |mimag_medium_quality |string |NULL | |n50_contigs |string |NULL | |n50_scaffolds |string |NULL | |ncbi_assembly_level |string |NULL | |ncbi_assembly_name |string |NULL | |ncbi_assembly_type |string |NULL | |ncbi_bioproject |string |NULL | |ncbi_biosample |string |NULL | +---------------------------------------+---------+-------+ only showing top 50 rows Sample rows: +------------------+---------------+-------------------+--------------------+-------------------+---------------------------+-----------------------+---------------------------+------------+-----------------+------------+--------+------------------+-----------+--------------------------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------------------------------------+--------------------------+-----------+-------------+--------------+----------------+------------------+-------------+--------------+-----------------+-----------------+------------+-------------+-----------------+-----------------------------+----------------------------+--------------------------+---------------------------------+------------------------------+-------------------------------------------------------------------------------------------------------------------------+------------------+--------------------+------------------+-----------------+--------------------+-----------+-------------+-------------------+------------------+------------------+---------------+--------------+-----------------+---------------+-------------------------------------------------+----------+-------------------------------+--------------------+--------------------------+------------+---------------------+---------------------+-------------------+----------------+----------------------------------+------------------+---------------------+---------------+-------------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+------------------+--------------+-----------------------+------------------------------------------+----------+----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-----------------+----------------------+---------------+------------------------------+--------------------+-------------------+---------------+-------------+--------------+--------------+---------+----------------------+---------------------+-------------------+--------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------+----------+-----------------+-------------------------+------------------------+----------------------+-----------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------+----------------+-------------+----------+-------------------------+ |accession |ambiguous_bases|checkm_completeness|checkm_contamination|checkm_marker_count|checkm_marker_lineage |checkm_marker_set_count|checkm_strain_heterogeneity|coding_bases|coding_density |contig_count|gc_count|gc_percentage |genome_size|gtdb_genome_representative|gtdb_representative|gtdb_taxonomy |gtdb_type_designation_ncbi_taxa|gtdb_type_designation_ncbi_taxa_sources|gtdb_type_species_of_genus|l50_contigs|l50_scaffolds|longest_contig|longest_scaffold|lsu_23s_contig_len|lsu_23s_count|lsu_23s_length|lsu_23s_query_id |lsu_5s_contig_len|lsu_5s_count|lsu_5s_length|lsu_5s_query_id |lsu_silva_23s_blast_align_len|lsu_silva_23s_blast_bitscore|lsu_silva_23s_blast_evalue|lsu_silva_23s_blast_perc_identity|lsu_silva_23s_blast_subject_id|lsu_silva_23s_taxonomy |mean_contig_length|mean_scaffold_length|mimag_high_quality|mimag_low_quality|mimag_medium_quality|n50_contigs|n50_scaffolds|ncbi_assembly_level|ncbi_assembly_name|ncbi_assembly_type|ncbi_bioproject|ncbi_biosample|ncbi_contig_count|ncbi_contig_n50|ncbi_country |ncbi_date |ncbi_genbank_assembly_accession|ncbi_genome_category|ncbi_genome_representation|ncbi_isolate|ncbi_isolation_source|ncbi_lat_lon |ncbi_molecule_count|ncbi_ncrna_count|ncbi_organism_name |ncbi_protein_count|ncbi_refseq_category |ncbi_rrna_count|ncbi_scaffold_count|ncbi_scaffold_l50|ncbi_scaffold_n50|ncbi_scaffold_n75|ncbi_scaffold_n90|ncbi_seq_rel_date|ncbi_spanned_gaps|ncbi_species_taxid|ncbi_ssu_count|ncbi_strain_identifiers|ncbi_submitter |ncbi_taxid|ncbi_taxonomy |ncbi_taxonomy_unfiltered |ncbi_total_gap_length|ncbi_total_length|ncbi_translation_table|ncbi_trna_count|ncbi_type_material_designation|ncbi_ungapped_length|ncbi_unspanned_gaps|ncbi_wgs_master|protein_count|scaffold_count|ssu_contig_len|ssu_count|ssu_gg_blast_align_len|ssu_gg_blast_bitscore|ssu_gg_blast_evalue|ssu_gg_blast_perc_identity|ssu_gg_blast_subject_id|ssu_gg_taxonomy |ssu_length|ssu_query_id |ssu_silva_blast_align_len|ssu_silva_blast_bitscore|ssu_silva_blast_evalue|ssu_silva_blast_perc_identity|ssu_silva_blast_subject_id|ssu_silva_taxonomy |total_gap_length|trna_aa_count|trna_count|trna_selenocysteine_count| +------------------+---------------+-------------------+--------------------+-------------------+---------------------------+-----------------------+---------------------------+------------+-----------------+------------+--------+------------------+-----------+--------------------------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------------------------------------+--------------------------+-----------+-------------+--------------+----------------+------------------+-------------+--------------+-----------------+-----------------+------------+-------------+-----------------+-----------------------------+----------------------------+--------------------------+---------------------------------+------------------------------+-------------------------------------------------------------------------------------------------------------------------+------------------+--------------------+------------------+-----------------+--------------------+-----------+-------------+-------------------+------------------+------------------+---------------+--------------+-----------------+---------------+-------------------------------------------------+----------+-------------------------------+--------------------+--------------------------+------------+---------------------+---------------------+-------------------+----------------+----------------------------------+------------------+---------------------+---------------+-------------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+------------------+--------------+-----------------------+------------------------------------------+----------+----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-----------------+----------------------+---------------+------------------------------+--------------------+-------------------+---------------+-------------+--------------+--------------+---------+----------------------+---------------------+-------------------+--------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------+----------+-----------------+-------------------------+------------------------+----------------------+-----------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------+----------------+-------------+----------+-------------------------+ |RS_GCF_000246985.2|44 |99.5 |0.5 |299 |p__Euryarchaeota (UID4) |202 |0.0 |2014456 |90.93903317665628|1 |954455 |43.08802922449628 |2215172 |RS_GCF_024054535.1 |f |d__Archaea;p__Methanobacteriota_B;c__Thermococci;o__Thermococcales;f__Thermococcaceae;g__Thermococcus_A;s__Thermococcus_A alcaliphilus |type strain of species |LPSN |f |1 |1 |2215172 |2215172 |2215172 |1 |3020 |NC_022084.1 |2215172 |2 |103 |NC_022084.1 |3020 |5561 |0 |99.901 |AKID01000054.18410.21433 |Archaea;Euryarchaeota;Thermococci;Thermococcales;Thermococcaceae;Thermococcus;Thermococcus sp. PK |2215172 |2215172 |t |f |f |2215172 |2215172 |Complete Genome |ASM24698v3 |na |PRJNA224116 |SAMN02603679 |none |none |none |2013-08-13|GCA_000246985.3 |none |full |none |none |none |1 |0 |Thermococcus litoralis DSM 5473 |2402 |representative genome|4 |1 |1 |2215172 |2215172 |2215172 |2013/08/13 |0 |2265 |1 |DSM 5473 |New England Biolabs, Inc. |523849 |d__Archaea;p__Euryarchaeota;c__Thermococci;o__Thermococcales;f__Thermococcaceae;g__Thermococcus;s__Thermococcus litoralis |d__Archaea;p__Euryarchaeota;c__Thermococci;o__Thermococcales;f__Thermococcaceae;g__Thermococcus;s__Thermococcus litoralis;x__Thermococcus litoralis DSM 5473 |0 |2215172 |11 |46 |assembly from type material |2215172 |0 |none |2497 |1 |2215172 |1 |none |none |none |none |none |none |1485 |NC_022084.1 |1485 |2743 |0 |100 |CP006670.774259.775759 |Archaea;Euryarchaeota;Thermococci;Thermococcales;Thermococcaceae;Thermococcus;Thermococcus litoralis DSM 5473 |0 |19 |45 |0 | |RS_GCF_000980135.1|4 |100.0 |0.65 |228 |p__Euryarchaeota (UID49) |153 |0.0 |3081035 |75.56062230246616|193 |1694808 |41.57088540809451 |4077567 |RS_GCF_000970205.1 |f |d__Archaea;p__Halobacteriota;c__Methanosarcinia;o__Methanosarcinales;f__Methanosarcinaceae;g__Methanosarcina;s__Methanosarcina mazei |not type material |none |f |32 |28 |130619 |130619 |72964 |1 |2891 |NZ_JJQZ01000122.1|72964 |1 |117 |NZ_JJQZ01000122.1|2891 |5334 |0 |99.965 |CP009514.2145810.2148703 |Archaea;Halobacterota;Methanosarcinia;Methanosarciniales;Methanosarcinaceae;Methanosarcina;Methanosarcina mazei C16 |21123 |25645 |t |f |f |37490 |46703 |Scaffold |gtlEnvA5udCFS |na |PRJNA224116 |SAMN02708973 |193 |37490 |USA: Columbia River Estuary; north of Astoria; OR|2015-04-23|GCA_000980135.1 |none |full |none |sediment |46.17467 N 123.8493 W|0 |0 |Methanosarcina mazei |3462 |na |3 |159 |28 |46703 |23322 |15334 |2015/04/23 |34 |2209 |1 |1.H.T.2.1 |University of Illinois at Urbana-Champaign|2209 |d__Archaea;p__Euryarchaeota;c__Methanomicrobia;o__Methanosarcinales;f__Methanosarcinaceae;g__Methanosarcina;s__Methanosarcina mazei |d__Archaea;p__Euryarchaeota;x__Stenosarchaea group;c__Methanomicrobia;o__Methanosarcinales;f__Methanosarcinaceae;g__Methanosarcina;s__Methanosarcina mazei |652 |4077567 |11 |56 |none |4076915 |0 |JJQZ00000000.1 |3513 |159 |72964 |1 |1473 |2710 |0 |99.864 |439 |k__Archaea;p__Euryarchaeota;c__Methanomicrobia;o__Methanosarcinales;f__Methanosarcinaceae;g__Methanosarcina;s__mazei|1475 |NZ_JJQZ01000122.1|1474 |2723 |0 |100 |AE008384.2400692.2402165 |Archaea;Halobacterota;Methanosarcinia;Methanosarciniales;Methanosarcinaceae;Methanosarcina;Methanosarcina mazei Go1 |652 |19 |56 |0 | |RS_GCF_000337075.1|61 |99.38 |0.19 |417 |f__Halobacteriaceae (UID96)|263 |0.0 |2629910 |86.58048705330512|64 |2099820 |69.1305365549169 |3037532 |RS_GCF_000337075.1 |t |d__Archaea;p__Halobacteriota;c__Halobacteria;o__Halobacteriales;f__Haloferacaceae;g__Halorubrum;s__Halorubrum hochstenium |not type material |none |f |12 |12 |180409 |180409 |1116 |9 |1114 |NZ_AOJO01000029.1|15773 |1 |116 |NZ_AOJO01000028.1|1114 |2019 |0 |99.372 |AOJD01000044.1.2796 |Archaea;Halobacterota;Halobacteria;Halobacterales;Haloferacaceae;Halorubrum;Halorubrum tebenquichense DSM 14210 |47461 |47461 |f |f |t |87029 |87029 |Contig |ASM33707v1 |na |PRJNA224116 |SAMN02471650 |64 |87029 |none |2013-02-04|GCA_000337075.1 |none |full |none |none |none |0 |0 |Halorubrum hochstenium ATCC 700873|2950 |representative genome|12 |none |none |none |none |none |2013/02/04 |0 |1227480 |2 |ATCC 700873 |University of California, Davis |1227481 |d__Archaea;p__Euryarchaeota;c__Halobacteria;o__Haloferacales;f__Halorubraceae;g__Halorubrum;s__Halorubrum hochstenium |d__Archaea;p__Euryarchaeota;x__Stenosarchaea group;c__Halobacteria;o__Haloferacales;f__Halorubraceae;g__Halorubrum;s__Halorubrum hochstenium;x__Halorubrum hochstenium ATCC 700873|0 |3037532 |11 |47 |none |3037532 |0 |AOJO00000000.1 |3008 |64 |38194 |1 |932 |1666 |0 |98.927 |4320043 |k__Archaea;p__Euryarchaeota;c__Halobacteria;o__Halobacteriales;f__Halobacteriaceae;g__Halorubrum;s__ |1739 |NZ_AOJO01000055.1|1637 |3016 |0 |100 |AOJO01000055.1063.2699 |Archaea;Halobacterota;Halobacteria;Halobacterales;Haloferacaceae;Halorubrum;Halorubrum hochstenium ATCC 700873 |0 |19 |47 |0 | |RS_GCF_000979515.1|1 |100.0 |1.31 |228 |p__Euryarchaeota (UID49) |153 |50.0 |3083899 |75.65795009797338|215 |1696947 |41.63285884621612 |4076107 |RS_GCF_000970205.1 |f |d__Archaea;p__Halobacteriota;c__Methanosarcinia;o__Methanosarcinales;f__Methanosarcinaceae;g__Methanosarcina;s__Methanosarcina mazei |not type material |none |f |38 |37 |123524 |123524 |42213 |1 |2891 |NZ_JJPD01000038.1|42213 |1 |117 |NZ_JJPD01000038.1|2891 |5339 |0 |100 |CP009514.2145810.2148703 |Archaea;Halobacterota;Methanosarcinia;Methanosarciniales;Methanosarcinaceae;Methanosarcina;Methanosarcina mazei C16 |18958 |19502 |t |f |f |35291 |35291 |Scaffold |gtlEnvA5udCFS |na |PRJNA224116 |SAMN02708975 |215 |35291 |USA: Columbia River Estuary; north of Astoria; OR|2015-04-23|GCA_000979515.1 |none |full |none |sediment |46.1597 N 123.8065 W |0 |0 |Methanosarcina mazei |3471 |na |3 |209 |37 |35291 |19137 |10098 |2015/04/23 |6 |2209 |1 |3.F.A.2.12 |University of Illinois at Urbana-Champaign|2209 |d__Archaea;p__Euryarchaeota;c__Methanomicrobia;o__Methanosarcinales;f__Methanosarcinaceae;g__Methanosarcina;s__Methanosarcina mazei |d__Archaea;p__Euryarchaeota;x__Stenosarchaea group;c__Methanomicrobia;o__Methanosarcinales;f__Methanosarcinaceae;g__Methanosarcina;s__Methanosarcina mazei |126 |4076107 |11 |58 |none |4075981 |0 |JJPD00000000.1 |3557 |209 |42213 |1 |1473 |2699 |0 |99.728 |439 |k__Archaea;p__Euryarchaeota;c__Methanomicrobia;o__Methanosarcinales;f__Methanosarcinaceae;g__Methanosarcina;s__mazei|1475 |NZ_JJPD01000038.1|1470 |2715 |0 |100 |CP009514.3051791.3053260 |Archaea;Halobacterota;Methanosarcinia;Methanosarciniales;Methanosarcinaceae;Methanosarcina;Methanosarcina mazei C16 |126 |19 |58 |0 | |RS_GCF_000762265.1|0 |100.0 |0.0 |188 |p__Euryarchaeota (UID3) |125 |0.0 |2045404 |83.48632053966 |1 |1012813 |41.339525475033135|2449987 |RS_GCF_001316325.1 |f |d__Archaea;p__Methanobacteriota;c__Methanobacteria;o__Methanobacteriales;f__Methanobacteriaceae;g__Methanobacterium;s__Methanobacterium formicicum|not type material |none |f |1 |1 |2449987 |2449987 |2449987 |2 |2982 |NZ_CP006933.1-#2 |2449987 |3 |119 |NZ_CP006933.1-#2 |2982 |5507 |0 |100 |CP006933.285239.288223 |Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanobacterium;Methanobacterium formicicum|2449987 |2449987 |t |f |f |2449987 |2449987 |Complete Genome |ASM76226v1 |na |PRJNA224116 |SAMN03085433 |none |none |New Zealand |2014-10-02|GCA_000762265.1 |none |full |none |rumen contents |40.3350 S 175.6117 E |1 |0 |Methanobacterium formicicum |2367 |na |7 |1 |1 |2449987 |2449987 |2449987 |2014/10/02 |0 |2162 |2 |BRM9 |PGgRc |2162 |d__Archaea;p__Euryarchaeota;c__Methanobacteria;o__Methanobacteriales;f__Methanobacteriaceae;g__Methanobacterium;s__Methanobacterium formicicum|d__Archaea;p__Euryarchaeota;x__Methanomada group;c__Methanobacteria;o__Methanobacteriales;f__Methanobacteriaceae;g__Methanobacterium;s__Methanobacterium formicicum |0 |2449987 |11 |46 |none |2449987 |0 |none |2403 |1 |2449987 |2 |none |none |none |none |none |none |1475 |NZ_CP006933.1 |1474 |2723 |0 |100 |CP006933.750440.751913 |Archaea;Euryarchaeota;Methanobacteria;Methanobacteriales;Methanobacteriaceae;Methanobacterium;Methanobacterium formicicum|0 |19 |46 |0 | +------------------+---------------+-------------------+--------------------+-------------------+---------------------------+-----------------------+---------------------------+------------+-----------------+------------+--------+------------------+-----------+--------------------------+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------+---------------------------------------+--------------------------+-----------+-------------+--------------+----------------+------------------+-------------+--------------+-----------------+-----------------+------------+-------------+-----------------+-----------------------------+----------------------------+--------------------------+---------------------------------+------------------------------+-------------------------------------------------------------------------------------------------------------------------+------------------+--------------------+------------------+-----------------+--------------------+-----------+-------------+-------------------+------------------+------------------+---------------+--------------+-----------------+---------------+-------------------------------------------------+----------+-------------------------------+--------------------+--------------------------+------------+---------------------+---------------------+-------------------+----------------+----------------------------------+------------------+---------------------+---------------+-------------------+-----------------+-----------------+-----------------+-----------------+-----------------+-----------------+------------------+--------------+-----------------------+------------------------------------------+----------+----------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------+-----------------+----------------------+---------------+------------------------------+--------------------+-------------------+---------------+-------------+--------------+--------------+---------+----------------------+---------------------+-------------------+--------------------------+-----------------------+--------------------------------------------------------------------------------------------------------------------+----------+-----------------+-------------------------+------------------------+----------------------+-----------------------------+--------------------------+-------------------------------------------------------------------------------------------------------------------------+----------------+-------------+----------+-------------------------+
# Get NCBI taxonomy IDs from FB organisms
# Look for column names containing 'tax', 'ncbi', 'id'
print('FB organism columns:', organisms.columns.tolist())
print()
for col in organisms.columns:
print(f' {col}: {organisms[col].head(3).tolist()}')
FB organism columns: ['orgId', 'division', 'genus', 'species', 'strain', 'taxonomyId'] orgId: ['acidovorax_3H11', 'ANA3', 'azobra'] division: ['Betaproteobacteria', 'Gammaproteobacteria', 'Alphaproteobacteria'] genus: ['Acidovorax', 'Shewanella', 'Azospirillum'] species: ['sp.', 'sp.', 'brasilense'] strain: ['GW101-3H11', 'ANA-3', 'Sp245'] taxonomyId: ['12916', '94122', '1064539']
# ── Determine the taxid column name from organism table ──────────────────────
# Adjust TAX_COL and GTDB_TAX_COL below if the schema inspection above
# shows different column names.
TAX_COL = next(
(c for c in organisms.columns if 'tax' in c.lower() or 'ncbi' in c.lower()),
None
)
print(f'Detected FB taxid column: {TAX_COL}')
# Query GTDB metadata schema to find taxid column
gtdb_schema = spark.sql('DESCRIBE kbase_ke_pangenome.gtdb_metadata').toPandas()
GTDB_TAX_COL = next(
(r['col_name'] for _, r in gtdb_schema.iterrows()
if 'tax' in r['col_name'].lower() and 'ncbi' in r['col_name'].lower()),
None
)
print(f'Detected GTDB taxid column: {GTDB_TAX_COL}')
# Also look for species/clade ID column
GTDB_SPECIES_COL = next(
(r['col_name'] for _, r in gtdb_schema.iterrows()
if 'species' in r['col_name'].lower() or 'clade' in r['col_name'].lower()),
None
)
print(f'Detected GTDB species/clade column: {GTDB_SPECIES_COL}')
Detected FB taxid column: taxonomyId Detected GTDB taxid column: gtdb_type_designation_ncbi_taxa Detected GTDB species/clade column: gtdb_type_species_of_genus
# Build organism → GapMind species mapping via taxid join
# If TAX_COL or GTDB_TAX_COL are None, update them manually based on schema output above
if TAX_COL and GTDB_TAX_COL and GTDB_SPECIES_COL:
# Get distinct species with GapMind data
gapmind_species_df = pd.read_csv(DATA_DIR / 'gapmind_species_summary.csv')
gapmind_species_set = set(gapmind_species_df['species'].tolist())
# Build taxid → clade mapping from GTDB metadata
gtdb_map = spark.sql(f"""
SELECT DISTINCT {GTDB_TAX_COL} AS ncbi_taxid, {GTDB_SPECIES_COL} AS clade_name
FROM kbase_ke_pangenome.gtdb_metadata
WHERE {GTDB_TAX_COL} IS NOT NULL
""").toPandas()
gtdb_map['ncbi_taxid'] = gtdb_map['ncbi_taxid'].astype(str)
print(f'GTDB taxid→clade mappings: {len(gtdb_map):,}')
print(gtdb_map.head(10).to_string())
# Match FB organisms to GTDB clades
org_map = organisms.copy()
org_map[TAX_COL] = org_map[TAX_COL].astype(str)
org_map = org_map.merge(gtdb_map, left_on=TAX_COL, right_on='ncbi_taxid', how='left')
# Check which clades have GapMind data
org_map['has_gapmind'] = org_map['clade_name'].isin(gapmind_species_set)
matched = org_map[org_map['has_gapmind']]
print(f'\nFB organisms with GapMind species match: {len(matched)} / {len(organisms)}')
print(matched[['orgId', 'clade_name']].to_string())
else:
print('WARNING: Could not auto-detect taxid columns.')
print('Manually set TAX_COL and GTDB_TAX_COL based on schema output above.')
org_map = organisms.copy()
org_map['clade_name'] = np.nan
org_map['has_gapmind'] = False
matched = org_map.head(0)
# Also try genus-species name matching as fallback
if 'genus' in organisms.columns and 'species' in organisms.columns:
gapmind_species_df = pd.read_csv(DATA_DIR / 'gapmind_species_summary.csv')
for _, row in organisms.iterrows():
genus = str(row.get('genus', '')).strip()
species = str(row.get('species', '')).strip()
name_match = gapmind_species_df['species'].str.contains(
f'{genus}.*{species}', case=False, na=False, regex=True
)
if name_match.any() and not org_map.loc[org_map.get('orgId', '') == row.get('orgId', ''), 'has_gapmind'].any():
clade = gapmind_species_df.loc[name_match, 'species'].iloc[0]
org_map.loc[org_map.get('orgId', '') == row.get('orgId', ''), 'clade_name'] = clade
org_map.loc[org_map.get('orgId', '') == row.get('orgId', ''), 'has_gapmind'] = True
# Save organism mapping
org_map.to_csv(DATA_DIR / 'organism_mapping.tsv', sep='\t', index=False)
print(f'\nSaved organism mapping: {DATA_DIR}/organism_mapping.tsv')
GTDB taxid→clade mappings: 5
ncbi_taxid clade_name
0 type strain of species t
1 not type material f
2 type strain of subspecies f
3 type strain of heterotypic synonym f
4 type strain of species f
FB organisms with GapMind species match: 0 / 48
Empty DataFrame
Columns: [orgId, clade_name]
Index: []
Saved organism mapping: /home/cjneely/repos/BERIL-research-observatory/projects/metabolic_capability_dependency/data/organism_mapping.tsv
9. Compute Pathway-Level Fitness Metrics¶
For each (organism, GapMind pathway) pair, use SEED subsystem gene membership as a proxy for pathway gene membership and aggregate fitness scores.
# Build gene → pathway assignments via SEED role description matching
# For each pathway, find all genes annotated to matching seed_desc values
# Index: (orgId, locusId) → mean_abs_t, max_abs_t
fitness_idx = gene_fitness.set_index(['orgId', 'locusId'])
# Essential gene set: (orgId, locusId)
essential_set = set(zip(essential['orgId'], essential['locusId']))
pathway_metrics = []
for pathway in sorted(pathways_list):
matched_descs = pathway_to_descs.get(pathway, [])
if not matched_descs:
continue
# Genes annotated to this pathway via matching seed_desc values
pathway_genes_seed = seed_all[
seed_all['seed_desc'].isin(matched_descs)
][['orgId', 'locusId']].drop_duplicates()
if len(pathway_genes_seed) == 0:
continue
# Group by organism
for org_id, org_genes in pathway_genes_seed.groupby('orgId'):
loci = org_genes['locusId'].tolist()
# Genes with fitness data
loci_in_fitness = [
l for l in loci
if (org_id, l) in fitness_idx.index
]
# Essential genes in this pathway
n_essential = sum(1 for l in loci if (org_id, l) in essential_set)
n_genes = len(loci)
n_with_fitness = len(loci_in_fitness)
pct_essential = 100.0 * n_essential / n_genes if n_genes > 0 else np.nan
if n_with_fitness > 0:
t_scores = fitness_idx.loc[
[(org_id, l) for l in loci_in_fitness], 'mean_abs_t'
].values
mean_abs_t = float(np.nanmean(t_scores))
max_abs_t = float(np.nanmax(t_scores))
median_abs_t = float(np.nanmedian(t_scores))
else:
mean_abs_t = np.nan
max_abs_t = np.nan
median_abs_t = np.nan
pathway_metrics.append({
'orgId': org_id,
'pathway': pathway,
'pathway_category': categorize_pathway(pathway),
'n_seed_genes': n_genes,
'n_with_fitness': n_with_fitness,
'n_essential': n_essential,
'pct_essential': pct_essential,
'mean_abs_t': mean_abs_t,
'max_abs_t': max_abs_t,
'median_abs_t': median_abs_t,
'matched_seed_descs': '|'.join(matched_descs),
})
pathway_metrics_df = pd.DataFrame(pathway_metrics)
print(f'Pathway-level fitness metrics: {len(pathway_metrics_df):,} records')
print(f'Organisms: {pathway_metrics_df["orgId"].nunique()}')
print(f'Pathways covered: {pathway_metrics_df["pathway"].nunique()}')
print('\nSample:')
print(pathway_metrics_df.head(20).to_string())
Pathway-level fitness metrics: 3,065 records
Organisms: 48
Pathways covered: 76
Sample:
orgId pathway pathway_category n_seed_genes n_with_fitness n_essential pct_essential mean_abs_t max_abs_t median_abs_t matched_seed_descs
0 ANA3 2-oxoglutarate amino_acid 4 1 3 75.000000 0.632157 0.632157 0.632157 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
1 BFirm 2-oxoglutarate amino_acid 9 5 4 44.444444 0.767611 0.833964 0.753904 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
2 Btheta 2-oxoglutarate amino_acid 7 2 5 71.428571 10.138245 10.813140 10.138245 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
3 Burk376 2-oxoglutarate amino_acid 7 3 4 57.142857 1.040629 1.421618 0.911640 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
4 Caulo 2-oxoglutarate amino_acid 3 0 3 100.000000 NaN NaN NaN Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
5 Cola 2-oxoglutarate amino_acid 5 2 3 60.000000 1.122539 1.436413 1.122539 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
6 Cup4G11 2-oxoglutarate amino_acid 11 8 3 27.272727 0.788078 1.107266 0.772708 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
7 Dda3937 2-oxoglutarate amino_acid 8 6 2 25.000000 1.052219 1.713227 0.905339 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
8 Ddia6719 2-oxoglutarate amino_acid 6 4 2 33.333333 1.043230 1.209794 1.032591 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
9 DdiaME23 2-oxoglutarate amino_acid 6 4 2 33.333333 0.881028 1.181621 0.891282 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
10 Dino 2-oxoglutarate amino_acid 4 1 3 75.000000 1.948995 1.948995 1.948995 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
11 DvH 2-oxoglutarate amino_acid 10 8 2 20.000000 0.907357 1.250661 0.922363 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
12 Dyella79 2-oxoglutarate amino_acid 6 2 4 66.666667 0.884932 1.245765 0.884932 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
13 HerbieS 2-oxoglutarate amino_acid 6 3 3 50.000000 0.819252 0.976382 0.757750 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
14 Kang 2-oxoglutarate amino_acid 2 0 2 100.000000 NaN NaN NaN Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
15 Keio 2-oxoglutarate amino_acid 7 6 1 14.285714 1.331262 3.057372 0.825957 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
16 Korea 2-oxoglutarate amino_acid 5 2 3 60.000000 1.233377 1.271412 1.233377 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
17 Koxy 2-oxoglutarate amino_acid 9 6 3 33.333333 0.731788 0.863819 0.720408 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
18 MR1 2-oxoglutarate amino_acid 4 1 3 75.000000 0.765044 0.765044 0.765044 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
19 Magneto 2-oxoglutarate amino_acid 6 3 3 50.000000 1.125600 1.654769 1.286561 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3)
# Filter to records with enough genes for reliable metrics
MIN_SEED_GENES = 3
pathway_metrics_filtered = pathway_metrics_df[
pathway_metrics_df['n_seed_genes'] >= MIN_SEED_GENES
].copy()
print(f'Records with ≥{MIN_SEED_GENES} SEED genes: {len(pathway_metrics_filtered):,}')
print(f'Organisms: {pathway_metrics_filtered["orgId"].nunique()}')
print(f'Pathways: {pathway_metrics_filtered["pathway"].nunique()}')
print('\nPathway coverage:')
print(pathway_metrics_filtered.groupby('pathway_category')[['pathway', 'orgId']].nunique())
print('\nmean_abs_t distribution by category:')
print(pathway_metrics_filtered.groupby('pathway_category')['mean_abs_t'].describe())
Records with ≥3 SEED genes: 2,063
Organisms: 48
Pathways: 74
Pathway coverage:
pathway orgId
pathway_category
amino_acid 43 48
carbon 16 48
other 15 48
mean_abs_t distribution by category:
count mean std min 25% 50% \
pathway_category
amino_acid 1246.0 1.841079 1.323796 0.443576 0.921716 1.361938
carbon 383.0 1.232448 0.806703 0.390014 0.792297 0.973844
other 372.0 1.777302 1.584412 0.496418 0.800839 1.081759
75% max
pathway_category
amino_acid 2.282729 15.461652
carbon 1.352725 8.509163
other 2.180193 15.461652
# Save all outputs
pathway_metrics_df.to_csv(DATA_DIR / 'pathway_fitness_metrics.csv', index=False)
print(f'Saved: {DATA_DIR}/pathway_fitness_metrics.csv ({len(pathway_metrics_df):,} rows)')
# Summary stats
print('\n=== Completion Summary ===')
for f in sorted(DATA_DIR.glob('*.csv')) + sorted(DATA_DIR.glob('*.tsv')):
size_mb = f.stat().st_size / 1024**2
print(f' {f.name}: {size_mb:.2f} MB')
Saved: /home/cjneely/repos/BERIL-research-observatory/projects/metabolic_capability_dependency/data/pathway_fitness_metrics.csv (3,065 rows) === Completion Summary === gapmind_genome_pathways.csv: 1669.98 MB gapmind_pathway_summary.csv: 0.01 MB gapmind_species_summary.csv: 1.45 MB gene_fitness_aggregates.csv: 16.96 MB organism_metadata.csv: 0.00 MB pathway_fitness_metrics.csv: 3.47 MB seed_annotations.csv: 11.36 MB essential_genes.tsv: 2.19 MB organism_mapping.tsv: 0.01 MB
Completion¶
Outputs generated:
data/organism_metadata.csv— All 48 FB organisms with metadatadata/organism_mapping.tsv— FB org → GapMind species clade mappingdata/seed_annotations.csv— SEED subsystem annotations per genedata/gene_fitness_aggregates.csv— Mean/max |t-score| per genedata/essential_genes.tsv— Putative essential genesdata/pathway_fitness_metrics.csv— Per-organism per-pathway fitness metrics
Limitation: Gene-pathway assignment uses SEED subsystems as a proxy. Pathways with
no SEED match or poorly annotated SEED subsystems will have fewer genes and noisier metrics.
Inspect matched_subsystems column to see which SEED subsystems were used for each pathway.
Next step: Run NB03 to classify pathways as active dependencies vs latent capabilities.