03 Classify Pathway Dependencies
Jupyter notebook from the Metabolic Capability vs Metabolic Dependency project.
NB03: Classify Pathway Dependencies¶
Purpose: Classify each (organism, pathway) pair with a complete GapMind prediction as:
- Active dependency: mean |t| > 2.0 OR >20% essential genes → organism relies on this pathway
- Latent capability: mean |t| < 1.0 AND <5% essential genes → pathway present but not needed
- Intermediate: between thresholds → conditionally important or partially redundant
Tests H1: Not all genomically complete pathways are functionally important.
Inputs (local — no Spark required):
data/pathway_fitness_metrics.csv(from NB02)data/gapmind_genome_pathways.csv(from NB01)data/organism_mapping.tsv(from NB02)data/gapmind_pathway_summary.csv(from NB01)
Outputs:
data/pathway_classification.csv— Per-organism per-pathway classfigures/nb03_stacked_bar.png— Classification by pathway categoryfigures/nb03_scatter.png— Fitness vs essentiality scatterfigures/nb03_organism_overview.png— % active/latent per organism
Runtime: < 2 minutes (local pandas only)
import pandas as pd
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import chi2_contingency, kruskal
from pathlib import Path
import sys
import warnings
warnings.filterwarnings('ignore')
sns.set_style('whitegrid')
plt.rcParams['font.size'] = 11
PROJECT_ROOT = Path('/home/cjneely/repos/BERIL-research-observatory/projects/metabolic_capability_dependency')
DATA_DIR = PROJECT_ROOT / 'data'
FIG_DIR = PROJECT_ROOT / 'figures'
FIG_DIR.mkdir(exist_ok=True)
sys.path.insert(0, str(PROJECT_ROOT / 'src'))
from pathway_utils import classify_pathway_dependency, categorize_pathway
print(f'Data: {DATA_DIR}')
print(f'Figures: {FIG_DIR}')
Data: /home/cjneely/repos/BERIL-research-observatory/projects/metabolic_capability_dependency/data Figures: /home/cjneely/repos/BERIL-research-observatory/projects/metabolic_capability_dependency/figures
1. Load Input Data¶
# ── Pathway fitness metrics from NB02 ────────────────────────────────────────
metrics_path = DATA_DIR / 'pathway_fitness_metrics.csv'
if not metrics_path.exists():
raise FileNotFoundError(
f'{metrics_path} not found. Run NB02 first.'
)
metrics = pd.read_csv(metrics_path)
print(f'Pathway fitness metrics: {len(metrics):,} records')
print(f' Organisms: {metrics["orgId"].nunique()}')
print(f' Pathways: {metrics["pathway"].nunique()}')
# ── GapMind pathway completeness (genome-level) from NB01 ────────────────────
gapmind_path = DATA_DIR / 'gapmind_genome_pathways.csv'
print(f'\nLoading GapMind data (this may take ~1 min)...')
gapmind = pd.read_csv(gapmind_path, dtype={
'genome_id': str, 'species': str, 'pathway': str,
'best_score': np.int8, 'is_complete': np.int8
})
print(f'GapMind records: {len(gapmind):,}')
# ── Organism mapping (FB orgId → GapMind clade_name) ────────────────────────
org_map_path = DATA_DIR / 'organism_mapping.tsv'
if org_map_path.exists():
org_map = pd.read_csv(org_map_path, sep='\t')
print(f'\nOrganism mapping: {len(org_map)} organisms')
# Identify the orgId and clade_name columns
print(' Columns:', org_map.columns.tolist())
else:
print('\nWARNING: organism_mapping.tsv not found — will classify without GapMind completeness join')
org_map = None
# ── Pathway summary from NB01 ────────────────────────────────────────────────
pathway_summary = pd.read_csv(DATA_DIR / 'gapmind_pathway_summary.csv')
Pathway fitness metrics: 3,065 records Organisms: 48 Pathways: 76 Loading GapMind data (this may take ~1 min)... GapMind records: 23,424,480 Organism mapping: 48 organisms Columns: ['orgId', 'division', 'genus', 'species', 'strain', 'taxonomyId', 'ncbi_taxid', 'clade_name', 'has_gapmind']
2. Join GapMind Completeness with Fitness Metrics¶
For each FB organism, find its matched GapMind species and retrieve pathway completeness rates. We classify only pathways that are predicted complete for that organism.
if org_map is not None and 'clade_name' in org_map.columns:
# Detect orgId column in org_map
org_id_col = next(
(c for c in org_map.columns if c.lower() in ('orgid', 'org_id')),
org_map.columns[0]
)
print(f'orgId column in org_map: {org_id_col}')
# Build orgId → clade_name lookup for organisms with GapMind matches
org_clade = org_map[org_map['clade_name'].notna()][[org_id_col, 'clade_name']].copy()
org_clade.columns = ['orgId', 'clade_name']
print(f'Organisms with GapMind clade: {len(org_clade)}')
# Compute species-level pathway completion rates from NB01 data
# (% of genomes in that clade with pathway complete)
print('Computing species-level pathway completion rates...')
species_completion = (
gapmind.groupby(['species', 'pathway'])['is_complete']
.mean()
.reset_index()
.rename(columns={'is_complete': 'species_completion_rate', 'species': 'clade_name'})
)
# Join: metrics → org_clade → species_completion
merged = (
metrics
.merge(org_clade, on='orgId', how='left')
.merge(species_completion, on=['clade_name', 'pathway'], how='left')
)
print(f'Merged records: {len(merged):,}')
has_completion = merged['species_completion_rate'].notna().sum()
print(f'Records with GapMind completeness: {has_completion:,} '
f'({100 * has_completion / len(merged):.1f}%)')
else:
print('Using fitness metrics without GapMind completeness join.')
merged = metrics.copy()
merged['clade_name'] = np.nan
merged['species_completion_rate'] = np.nan
print('\nMerged dataset sample:')
print(merged.head(10).to_string())
orgId column in org_map: orgId
Organisms with GapMind clade: 41
Computing species-level pathway completion rates...
Merged records: 3,065
Records with GapMind completeness: 2,650 (86.5%)
Merged dataset sample:
orgId pathway pathway_category n_seed_genes n_with_fitness n_essential pct_essential mean_abs_t max_abs_t median_abs_t matched_seed_descs clade_name species_completion_rate
0 ANA3 2-oxoglutarate amino_acid 4 1 3 75.000000 0.632157 0.632157 0.632157 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3) s__Shewanella_sp000203935--RS_GCF_000203935.1 1.0
1 BFirm 2-oxoglutarate amino_acid 9 5 4 44.444444 0.767611 0.833964 0.753904 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3) s__Paraburkholderia_phytofirmans--RS_GCF_000020125.1 1.0
2 Btheta 2-oxoglutarate amino_acid 7 2 5 71.428571 10.138245 10.813140 10.138245 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3) s__Bacteroides_thetaiotaomicron--RS_GCF_000011065.1 0.0
3 Burk376 2-oxoglutarate amino_acid 7 3 4 57.142857 1.040629 1.421618 0.911640 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3) s__Paraburkholderia_bryophila--RS_GCF_003269035.1 1.0
4 Caulo 2-oxoglutarate amino_acid 3 0 3 100.000000 NaN NaN NaN Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3) NaN NaN
5 Cola 2-oxoglutarate amino_acid 5 2 3 60.000000 1.122539 1.436413 1.122539 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3) NaN NaN
6 Cup4G11 2-oxoglutarate amino_acid 11 8 3 27.272727 0.788078 1.107266 0.772708 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3) s__Cupriavidus_basilensis--RS_GCF_008801925.2 1.0
7 Dda3937 2-oxoglutarate amino_acid 8 6 2 25.000000 1.052219 1.713227 0.905339 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3) s__Dickeya_dadantii--RS_GCF_000406145.1 1.0
8 Ddia6719 2-oxoglutarate amino_acid 6 4 2 33.333333 1.043230 1.209794 1.032591 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3) s__Dickeya_dianthicola--RS_GCF_000365305.1 1.0
9 DdiaME23 2-oxoglutarate amino_acid 6 4 2 33.333333 0.881028 1.181621 0.891282 Dihydrolipoamide succinyltransferase component (E2) of 2-oxoglutarate dehydrogenase complex (EC 2.3.1.61)|2-oxoglutarate dehydrogenase E1 component (EC 1.2.4.2)|4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Dihydrolipoamide dehydrogenase of 2-oxoglutarate dehydrogenase (EC 1.8.1.4)|Alpha-ketoglutarate-dependent taurine dioxygenase (EC 1.14.11.17)|5-aminovalerate aminotransferase (EC 2.6.1.48) / Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|4-hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16) @ 2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14)|Fe(2+)/alpha-ketoglutarate-dependent dioxygenase LpxO|Gamma-aminobutyrate:alpha-ketoglutarate aminotransferase (EC 2.6.1.19)|Pyoverdin biosynthesis protein PvdH, L-2,4-diaminobutyrate:2-oxoglutarate aminotransferase (EC 2.6.1.76)|2-dehydro-3-deoxyphosphogluconate aldolase (EC 4.1.2.14) / 4-Hydroxy-2-oxoglutarate aldolase (EC 4.1.3.16)|Alpha-ketoglutarate permease|2-oxoglutarate/malate translocator|Siderophore biosynthesis diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide acyltransferase (E2) component, and related enzymes|Achromobactin biosynthesis protein AcsF; Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Pyruvate/2-oxoglutarate dehydrogenase complex, dihydrolipoamide dehydrogenase (E3) component, and related enzymes|Gamma-butyrobetaine,2-oxoglutarate dioxygenase (EC 1.14.11.1)|2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)|2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)|Diaminobutyrate--2-oxoglutarate aminotransferase (EC 2.6.1.76)|Coenzyme B synthesis from 2-oxoglutarate: steps 5, 9, and 13|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [small subunit]|Coenzyme B synthesis from 2-oxoglutarate: steps 1, 6, and 10|2-oxoglutarate oxidoreductase, delta subunit, putative (EC 1.2.7.3)|Coenzyme B synthesis from 2-oxoglutarate: steps 4, 7, 8, 11, and 12 [large subunit]|2-oxoglutarate oxidoreductase, gamma subunit (EC 1.2.7.3) s__Dickeya_dianthicola--RS_GCF_000365305.1 1.0
3. Apply Pathway Classification¶
Thresholds (from RESEARCH_PLAN.md):
- Active dependency: mean |t| > 2.0 OR pct_essential > 20%
- Latent capability: mean |t| < 1.0 AND pct_essential < 5%
- Intermediate: between thresholds
Classification applies only to records with sufficient data (≥3 SEED genes with fitness).
MIN_GENES_FOR_CLASSIFICATION = 3
# Filter to records with enough data
classifiable = merged[
merged['n_with_fitness'] >= MIN_GENES_FOR_CLASSIFICATION
].copy()
# Apply classification using pathway_utils function
classifiable['dependency_class'] = classifiable.apply(
lambda r: classify_pathway_dependency(
mean_abs_t=r['mean_abs_t'],
pct_essential=r['pct_essential'],
),
axis=1
)
# Add pathway category (uses categorize_pathway from pathway_utils)
classifiable['pathway_category'] = classifiable['pathway'].apply(categorize_pathway)
print(f'Classifiable records: {len(classifiable):,}')
print(f'Organisms: {classifiable["orgId"].nunique()}')
print(f'Pathways: {classifiable["pathway"].nunique()}')
print('\nClassification breakdown:')
class_counts = classifiable['dependency_class'].value_counts()
print(class_counts.to_string())
print(f'\n% Active: {100 * class_counts.get("active_dependency", 0) / len(classifiable):.1f}%')
print(f'% Latent: {100 * class_counts.get("latent_capability", 0) / len(classifiable):.1f}%')
print(f'% Intermediate: {100 * class_counts.get("intermediate", 0) / len(classifiable):.1f}%')
print('\nClass by pathway category:')
print(pd.crosstab(classifiable['pathway_category'], classifiable['dependency_class']))
Classifiable records: 1,695 Organisms: 48 Pathways: 74 Classification breakdown: dependency_class active_dependency 881 intermediate 547 latent_capability 267 % Active: 52.0% % Latent: 15.8% % Intermediate: 32.3% Class by pathway category: dependency_class active_dependency intermediate latent_capability pathway_category amino_acid 467 220 48 carbon 355 320 217 other 59 7 2
# ── Chi-square: dependency class × pathway category ──────────────────────────
contingency_table = pd.crosstab(
classifiable['pathway_category'],
classifiable['dependency_class']
)
print('Contingency table (category × class):')
print(contingency_table.to_string())
chi2, p, dof, expected = chi2_contingency(contingency_table)
print(f'\nChi-square test: χ²={chi2:.2f}, df={dof}, p={p:.2e}')
if p < 0.05:
print('→ Significant: dependency class distribution differs across pathway categories (H1 supported)')
else:
print('→ Not significant: no clear category effect')
# ── Kruskal-Wallis: mean |t| across categories ────────────────────────────────
groups = [
classifiable.loc[classifiable['pathway_category'] == cat, 'mean_abs_t'].dropna().values
for cat in classifiable['pathway_category'].unique()
]
groups = [g for g in groups if len(g) > 0]
if len(groups) >= 2:
stat, kw_p = kruskal(*groups)
print(f'\nKruskal-Wallis test (mean |t| by category): H={stat:.2f}, p={kw_p:.2e}')
# ── Latent capability rate per category ──────────────────────────────────────
print('\nLatent capability rate by pathway category:')
latent_by_cat = classifiable.groupby('pathway_category').apply(
lambda g: (g['dependency_class'] == 'latent_capability').sum() / len(g)
).rename('latent_rate').reset_index()
latent_by_cat['latent_pct'] = (latent_by_cat['latent_rate'] * 100).round(1)
print(latent_by_cat.to_string(index=False))
# ── Per-organism summary ──────────────────────────────────────────────────────
org_summary = classifiable.groupby('orgId')['dependency_class'].value_counts(
normalize=True
).unstack(fill_value=0).reset_index()
org_summary.columns.name = None
print('\nPer-organism classification rates (sample):')
print(org_summary.head(15).to_string(index=False))
Contingency table (category × class):
dependency_class active_dependency intermediate latent_capability
pathway_category
amino_acid 467 220 48
carbon 355 320 217
other 59 7 2
Chi-square test: χ²=163.60, df=4, p=2.47e-34
→ Significant: dependency class distribution differs across pathway categories (H1 supported)
Kruskal-Wallis test (mean |t| by category): H=315.78, p=2.68e-69
Latent capability rate by pathway category:
pathway_category latent_rate latent_pct
amino_acid 0.065306 6.5
carbon 0.243274 24.3
other 0.029412 2.9
Per-organism classification rates (sample):
orgId active_dependency intermediate latent_capability
ANA3 0.684211 0.263158 0.052632
BFirm 0.423077 0.423077 0.153846
Btheta 0.457143 0.371429 0.171429
Burk376 0.416667 0.416667 0.166667
Caulo 1.000000 0.000000 0.000000
Cola 0.612903 0.193548 0.193548
Cup4G11 0.390244 0.390244 0.219512
Dda3937 0.555556 0.355556 0.088889
Ddia6719 0.227273 0.545455 0.227273
DdiaME23 0.478261 0.282609 0.239130
Dino 0.375000 0.406250 0.218750
DvH 0.538462 0.384615 0.076923
Dyella79 0.621622 0.270270 0.108108
HerbieS 0.523810 0.357143 0.119048
Kang 0.409091 0.363636 0.227273
CLASS_COLORS = {
'active_dependency': '#e74c3c',
'intermediate': '#f39c12',
'latent_capability': '#3498db',
'unknown': '#bdc3c7',
}
# Stacked bar: category × class
cat_class = pd.crosstab(
classifiable['pathway_category'],
classifiable['dependency_class'],
normalize='index'
) * 100
# Ensure all class columns present
for cls in ['active_dependency', 'intermediate', 'latent_capability', 'unknown']:
if cls not in cat_class.columns:
cat_class[cls] = 0.0
fig, ax = plt.subplots(figsize=(10, 6))
bottom = np.zeros(len(cat_class))
x = np.arange(len(cat_class))
for cls in ['active_dependency', 'intermediate', 'latent_capability', 'unknown']:
if cls in cat_class.columns:
vals = cat_class[cls].values
ax.bar(x, vals, bottom=bottom, color=CLASS_COLORS[cls],
label=cls.replace('_', ' ').title(), width=0.6)
bottom += vals
ax.set_xticks(x)
ax.set_xticklabels([c.replace('_', ' ').title() for c in cat_class.index],
fontsize=11)
ax.set_ylabel('Percentage of Pathway-Organism Pairs', fontsize=11)
ax.set_title('Pathway Dependency Classification by Functional Category', fontsize=13)
ax.legend(loc='upper right', fontsize=10)
ax.set_ylim(0, 110)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda y, _: f'{y:.0f}%'))
# Add chi-square annotation
ax.text(0.02, 0.97, f'χ²={chi2:.1f}, p={p:.2e}',
transform=ax.transAxes, va='top', fontsize=9,
bbox=dict(boxstyle='round', facecolor='white', alpha=0.8))
plt.tight_layout()
plt.savefig(FIG_DIR / 'nb03_stacked_bar.png', dpi=150, bbox_inches='tight')
plt.close()
print('Saved: figures/nb03_stacked_bar.png')
Saved: figures/nb03_stacked_bar.png
Figure 2: Scatter Plot — Mean |t| vs % Essential¶
plot_df = classifiable.dropna(subset=['mean_abs_t', 'pct_essential']).copy()
fig, ax = plt.subplots(figsize=(10, 7))
for cls, color in CLASS_COLORS.items():
subset = plot_df[plot_df['dependency_class'] == cls]
if len(subset) == 0:
continue
ax.scatter(
subset['mean_abs_t'], subset['pct_essential'],
c=color, alpha=0.4, s=12, label=f'{cls.replace("_", " ").title()} (n={len(subset)})',
linewidths=0
)
# Decision boundary lines
ax.axvline(x=2.0, color='#e74c3c', linestyle='--', alpha=0.5, lw=1.5, label='Active threshold (|t|=2.0)')
ax.axvline(x=1.0, color='#3498db', linestyle='--', alpha=0.5, lw=1.5, label='Latent threshold (|t|=1.0)')
ax.axhline(y=20, color='#e74c3c', linestyle=':', alpha=0.5, lw=1.5, label='Active threshold (20% essential)')
ax.axhline(y=5, color='#3498db', linestyle=':', alpha=0.5, lw=1.5, label='Latent threshold (5% essential)')
ax.set_xlabel('Mean |t-score| (per-pathway fitness importance)', fontsize=11)
ax.set_ylabel('% Essential Genes in Pathway', fontsize=11)
ax.set_title('Pathway Classification: Fitness Importance vs Gene Essentiality', fontsize=13)
ax.legend(loc='upper right', fontsize=8, markerscale=2)
ax.set_xlim(left=0)
ax.set_ylim(bottom=0)
plt.tight_layout()
plt.savefig(FIG_DIR / 'nb03_scatter.png', dpi=150, bbox_inches='tight')
plt.close()
print('Saved: figures/nb03_scatter.png')
Saved: figures/nb03_scatter.png
Figure 3: Per-Organism Latent Capability Rate¶
# Compute % latent and % active per organism
org_class = classifiable.groupby(['orgId', 'dependency_class']).size().unstack(fill_value=0)
org_class_pct = org_class.div(org_class.sum(axis=1), axis=0) * 100
for cls in ['active_dependency', 'intermediate', 'latent_capability', 'unknown']:
if cls not in org_class_pct.columns:
org_class_pct[cls] = 0.0
org_class_pct = org_class_pct.sort_values('latent_capability', ascending=True)
fig, ax = plt.subplots(figsize=(12, max(6, len(org_class_pct) * 0.35)))
bottom = np.zeros(len(org_class_pct))
y = np.arange(len(org_class_pct))
for cls in ['active_dependency', 'intermediate', 'latent_capability', 'unknown']:
vals = org_class_pct[cls].values
ax.barh(y, vals, left=bottom, color=CLASS_COLORS[cls],
label=cls.replace('_', ' ').title(), height=0.7)
bottom += vals
ax.set_yticks(y)
ax.set_yticklabels(org_class_pct.index, fontsize=9)
ax.set_xlabel('% of Complete Pathways', fontsize=11)
ax.set_title('Active vs Latent Pathway Classification per Organism\n(ordered by % latent capabilities)', fontsize=12)
ax.legend(loc='lower right', fontsize=9)
ax.xaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.0f}%'))
ax.set_xlim(0, 105)
plt.tight_layout()
plt.savefig(FIG_DIR / 'nb03_organism_overview.png', dpi=150, bbox_inches='tight')
plt.close()
print('Saved: figures/nb03_organism_overview.png')
Saved: figures/nb03_organism_overview.png
6. Save Classification¶
# Select and rename output columns
output_cols = [
'orgId', 'pathway', 'pathway_category',
'dependency_class',
'mean_abs_t', 'max_abs_t', 'median_abs_t',
'n_seed_genes', 'n_with_fitness', 'n_essential', 'pct_essential',
]
if 'clade_name' in classifiable.columns:
output_cols.append('clade_name')
if 'species_completion_rate' in classifiable.columns:
output_cols.append('species_completion_rate')
output_cols = [c for c in output_cols if c in classifiable.columns]
classification_out = classifiable[output_cols].copy()
classification_out.to_csv(DATA_DIR / 'pathway_classification.csv', index=False)
print(f'Saved: {DATA_DIR}/pathway_classification.csv ({len(classification_out):,} rows)')
# Print final summary
print('\n=== NB03 Summary ===')
print(f'Total classified records: {len(classification_out):,}')
for cls, count in classification_out['dependency_class'].value_counts().items():
pct = 100 * count / len(classification_out)
print(f' {cls:<22}: {count:>6,} ({pct:.1f}%)')
print(f'\nOrganisms: {classification_out["orgId"].nunique()}')
print(f'Pathways: {classification_out["pathway"].nunique()}')
latent_rate = (classification_out['dependency_class'] == 'latent_capability').mean()
print(f'\nOverall latent capability rate: {latent_rate:.1%}')
if latent_rate >= 0.10:
print('→ H1 supported: ≥10% of complete pathways are latent capabilities')
else:
print('→ H1 weakly supported or not supported at 10% threshold')
Saved: /home/cjneely/repos/BERIL-research-observatory/projects/metabolic_capability_dependency/data/pathway_classification.csv (1,695 rows) === NB03 Summary === Total classified records: 1,695 active_dependency : 881 (52.0%) intermediate : 547 (32.3%) latent_capability : 267 (15.8%) Organisms: 48 Pathways: 74 Overall latent capability rate: 15.8% → H1 supported: ≥10% of complete pathways are latent capabilities
7. Threshold Sensitivity Analysis¶
How sensitive is the 15.8% latent fraction to the choice of classification thresholds?
We vary active_t_threshold (default 2.0) and latent_t_threshold (default 1.0) over ±25%
and report the resulting latent capability rate.
from pathway_utils import classify_pathway_dependency
# Threshold grid: ±25% around defaults
active_thresholds = [1.5, 1.75, 2.0, 2.25, 2.5] # default 2.0
latent_thresholds = [0.75, 0.875, 1.0, 1.125, 1.25] # default 1.0
records = []
for at in active_thresholds:
for lt in latent_thresholds:
if lt >= at:
continue # thresholds must not cross
labels = classifiable.apply(
lambda r: classify_pathway_dependency(
mean_abs_t=r['mean_abs_t'],
pct_essential=r['pct_essential'],
active_t_threshold=at,
latent_t_threshold=lt,
),
axis=1
)
n = len(labels)
records.append({
'active_t': at,
'latent_t': lt,
'pct_latent': round(100 * (labels == 'latent_capability').sum() / n, 1),
'pct_active': round(100 * (labels == 'active_dependency').sum() / n, 1),
'pct_intermediate': round(100 * (labels == 'intermediate').sum() / n, 1),
})
sens_df = pd.DataFrame(records)
print('Threshold sensitivity: % latent capability')
print('(rows = active threshold, columns = latent threshold)\n')
pivot = sens_df.pivot(index='active_t', columns='latent_t', values='pct_latent')
print(pivot.to_string())
print('\nDefault (active_t=2.0, latent_t=1.0): 15.8% latent')
print(f'Range across all threshold combinations: '
f'{sens_df["pct_latent"].min()}% – {sens_df["pct_latent"].max()}%')
print(f'Standard deviation: {sens_df["pct_latent"].std():.1f} percentage points')
Threshold sensitivity: % latent capability (rows = active threshold, columns = latent threshold) latent_t 0.750 0.875 1.000 1.125 1.250 active_t 1.50 4.7 11.2 15.8 18.7 21.1 1.75 4.7 11.2 15.8 18.7 21.1 2.00 4.7 11.2 15.8 18.7 21.1 2.25 4.7 11.2 15.8 18.7 21.1 2.50 4.7 11.2 15.8 18.7 21.1 Default (active_t=2.0, latent_t=1.0): 15.8% latent Range across all threshold combinations: 4.7% – 21.1% Standard deviation: 5.9 percentage points
Completion¶
Outputs generated:
data/pathway_classification.csv— Per-organism per-pathway dependency classfigures/nb03_stacked_bar.png— Classification breakdown by functional categoryfigures/nb03_scatter.png— Fitness vs essentiality scatter with decision boundariesfigures/nb03_organism_overview.png— Active vs latent breakdown per organism
Interpretation:
- H1 test: If latent capabilities are common (≥10% of complete pathways), this shows that genomic capability ≠ functional dependency
- Category differences: Amino acid biosynthesis may show higher latent rates than carbon utilization, because bacteria can scavenge amino acids from the environment (cross-feeding)
Next step: Run NB04 to test whether latent capabilities predict gene loss (Black Queen Hypothesis).