02 Wom Fb Integration
Jupyter notebook from the Metabolic Consistency of Pseudomonas FW300-N2E3 project.
NB02: WoM ↔ Fitness Browser Gene-Level Integration¶
For each WoM-produced metabolite that has a corresponding FB carbon/nitrogen source experiment, identify genes with significant fitness effects and annotate them functionally.
Key Question: When FW300-N2E3 produces compound X on rich medium (WoM), do genes important for growing on X as a sole carbon/nitrogen source (FB) reveal the underlying biosynthetic or catabolic pathways?
Inputs:
data/metabolite_crosswalk.tsv— WoM↔FB metabolite mapping from NB01data/fb_experiments.tsv— FB experiment metadata- BERDL:
kescience_fitnessbrowser.genefitness— per-gene fitness scores - BERDL:
kescience_fitnessbrowser.seedannotation— SEED functional annotations
Outputs:
data/wom_fb_gene_table.tsv— genes with significant fitness for overlapping metabolitesdata/wom_fb_summary.tsv— per-metabolite summary of fitness hitsfigures/fitness_hits_per_metabolite.png— bar chart of gene counts per metabolite
import os
import pandas as pd
import numpy as np
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
spark = get_spark_session()
DATA_DIR = '../data'
FIG_DIR = '../figures'
os.makedirs(FIG_DIR, exist_ok=True)
FB_ORG = 'pseudo3_N2E3'
# Load crosswalk from NB01
crosswalk = pd.read_csv(f'{DATA_DIR}/metabolite_crosswalk.tsv', sep='\t')
fb_exps = pd.read_csv(f'{DATA_DIR}/fb_experiments.tsv', sep='\t')
# Filter to WoM metabolites matched to FB
matched = crosswalk[crosswalk['fb_matched'] == True].copy()
print(f"WoM metabolites with FB matches: {len(matched)}")
print(matched[['wom_compound', 'wom_action', 'fb_condition']].to_string(index=False))
WoM metabolites with FB matches: 28
wom_compound wom_action fb_condition
Cytosine E Cytidine
betaine E Betaine
carnitine E Carnitine Hydrochloride
lactate E Sodium D,L-Lactate; Sodium D-Lactate; Sodium L-Lactate
lysine E L-Lysine
sarcosine E Sarcosine
thymine E Thymine
trans-aconitate E trans-Aconitate
tyrosine E L-tyrosine disodium salt
valine E L-Valine
4-aminobutanoate I 4-aminobutanoate
5-oxo-proline I 5-oxo-proline
Adenine I Adenine hydrochloride hydrate
Adenosine I Adenosine
Guanine I Guanine
Malate I L-Malic acid disodium salt monohydrate
Uracil I Uridine
alanine I L-Alanine; D-Alanine
arginine I L-Arginine
aspartate I L-Aspartic Acid
glutamic acid I L-Glutamic acid monopotassium salt monohydrate
glycine I Glycine
inosine I Inosine
nicotinamide I Nicotinamide
phenylalanine I L-Phenylalanine
proline I L-Proline
trehalose I D-Trehalose dihydrate
tryptophan I L-Tryptophan
1. Extract Gene Fitness for Overlapping Metabolites¶
Pull per-gene fitness scores for all experiments matching the overlapping conditions. A gene is considered a significant fitness hit if |fitness| > 1 and |t| > 4.
# Build the list of FB condition names to query
all_fb_conditions = set()
for conds in matched['fb_condition']:
for c in str(conds).split('; '):
all_fb_conditions.add(c.strip())
print(f"FB conditions to query: {len(all_fb_conditions)}")
for c in sorted(all_fb_conditions):
print(f" {c}")
FB conditions to query: 31 4-aminobutanoate 5-oxo-proline Adenine hydrochloride hydrate Adenosine Betaine Carnitine Hydrochloride Cytidine D-Alanine D-Trehalose dihydrate Glycine Guanine Inosine L-Alanine L-Arginine L-Aspartic Acid L-Glutamic acid monopotassium salt monohydrate L-Lysine L-Malic acid disodium salt monohydrate L-Phenylalanine L-Proline L-Tryptophan L-Valine L-tyrosine disodium salt Nicotinamide Sarcosine Sodium D,L-Lactate Sodium D-Lactate Sodium L-Lactate Thymine Uridine trans-Aconitate
# Query gene fitness for all relevant experiments
# Use a single query with IN clause for efficiency
# CAST fit and t to double — they are stored as strings in BERDL
condition_list = "', '".join(all_fb_conditions)
fitness_df = spark.sql(f"""
SELECT gf.locusId, gf.expName,
CAST(gf.fit AS DOUBLE) as fit,
CAST(gf.t AS DOUBLE) as t,
e.condition_1, e.expGroup,
g.sysName, g.gene, g.desc
FROM kescience_fitnessbrowser.genefitness gf
JOIN kescience_fitnessbrowser.experiment e
ON gf.orgId = e.orgId AND gf.expName = e.expName
JOIN kescience_fitnessbrowser.gene g
ON gf.orgId = g.orgId AND gf.locusId = g.locusId
WHERE gf.orgId = '{FB_ORG}'
AND e.condition_1 IN ('{condition_list}')
AND e.expGroup IN ('carbon source', 'nitrogen source')
AND g.type = '1'
""").toPandas()
print(f"Total gene-experiment records: {len(fitness_df)}")
print(f"Unique experiments: {fitness_df['expName'].nunique()}")
print(f"Unique genes: {fitness_df['locusId'].nunique()}")
print(f"Unique conditions: {fitness_df['condition_1'].nunique()}")
print(f"\nColumn dtypes: fit={fitness_df['fit'].dtype}, t={fitness_df['t'].dtype}")
Total gene-experiment records: 221056 Unique experiments: 44 Unique genes: 5024 Unique conditions: 24 Column dtypes: fit=float64, t=float64
# Diagnose which queried conditions returned no data
conditions_with_data = set(fitness_df['condition_1'].unique())
conditions_queried = all_fb_conditions
missing_conditions = sorted(conditions_queried - conditions_with_data)
print(f"Conditions queried: {len(conditions_queried)}")
print(f"Conditions with data: {len(conditions_with_data)}")
print(f"Conditions with NO data: {len(missing_conditions)}")
if missing_conditions:
print(f"\n--- Missing conditions diagnostic ---")
# Check if these conditions exist in the experiment table at all
for cond in missing_conditions:
exp_match = fb_exps[fb_exps['condition_1'] == cond]
if len(exp_match) == 0:
print(f" {cond}: NO experiments exist for {FB_ORG} with this condition name")
else:
n_exps = len(exp_match)
groups = exp_match['expGroup'].unique()
print(f" {cond}: {n_exps} experiment(s) exist ({', '.join(groups)}) "
f"but 0 genes met |fit|>1 & |t|>4 threshold")
print(f"\n Explanation: These {len(missing_conditions)} conditions were mapped in the")
print(f" WoM→FB crosswalk but have no experiments in the Fitness Browser for {FB_ORG}.")
print(f" This means these compounds were not tested as C/N sources for this organism,")
print(f" or the condition names in the crosswalk don't exactly match the FB experiment names.")
Conditions queried: 31 Conditions with data: 24 Conditions with NO data: 7 --- Missing conditions diagnostic --- 4-aminobutanoate: NO experiments exist for pseudo3_N2E3 with this condition name 5-oxo-proline: NO experiments exist for pseudo3_N2E3 with this condition name Betaine: NO experiments exist for pseudo3_N2E3 with this condition name Guanine: NO experiments exist for pseudo3_N2E3 with this condition name Nicotinamide: NO experiments exist for pseudo3_N2E3 with this condition name Sarcosine: NO experiments exist for pseudo3_N2E3 with this condition name trans-Aconitate: NO experiments exist for pseudo3_N2E3 with this condition name Explanation: These 7 conditions were mapped in the WoM→FB crosswalk but have no experiments in the Fitness Browser for pseudo3_N2E3. This means these compounds were not tested as C/N sources for this organism, or the condition names in the crosswalk don't exactly match the FB experiment names.
# Filter for significant fitness effects
sig = fitness_df[(fitness_df['fit'].abs() > 1) & (fitness_df['t'].abs() > 4)].copy()
sig['direction'] = np.where(sig['fit'] > 0, 'beneficial', 'detrimental')
print(f"Significant fitness hits (|fit|>1, |t|>4): {len(sig)}")
print(f" Detrimental (gene important for growth): {(sig['direction']=='detrimental').sum()}")
print(f" Beneficial (gene inhibits growth): {(sig['direction']=='beneficial').sum()}")
print(f"\nUnique genes with significant fitness: {sig['locusId'].nunique()}")
print(f"Conditions with hits: {sig['condition_1'].nunique()}")
Significant fitness hits (|fit|>1, |t|>4): 4764 Detrimental (gene important for growth): 4438 Beneficial (gene inhibits growth): 326 Unique genes with significant fitness: 601 Conditions with hits: 24
# Map FB conditions back to WoM compound names
fb_to_wom = {}
for _, row in matched.iterrows():
for c in str(row['fb_condition']).split('; '):
fb_to_wom[c.strip()] = row['wom_compound']
sig['wom_compound'] = sig['condition_1'].map(fb_to_wom)
# Summary per metabolite
met_summary = sig.groupby(['wom_compound', 'condition_1', 'expGroup']).agg(
n_genes=('locusId', 'nunique'),
n_detrimental=('direction', lambda x: (x == 'detrimental').sum()),
n_beneficial=('direction', lambda x: (x == 'beneficial').sum()),
mean_fit_detrimental=('fit', lambda x: x[x < 0].mean() if (x < 0).any() else np.nan),
min_fit=('fit', 'min'),
max_fit=('fit', 'max'),
).reset_index().sort_values('n_genes', ascending=False)
print("Significant fitness genes per metabolite:")
print(met_summary.to_string(index=False))
Significant fitness genes per metabolite:
wom_compound condition_1 expGroup n_genes n_detrimental n_beneficial mean_fit_detrimental min_fit max_fit
carnitine Carnitine Hydrochloride carbon source 168 156 12 -2.669588 -4.880531 4.075280
valine L-Valine carbon source 160 136 24 -2.455440 -4.091335 2.540146
lactate Sodium D-Lactate carbon source 153 241 21 -2.837218 -5.090104 3.310396
trehalose D-Trehalose dihydrate carbon source 152 231 31 -2.439370 -4.516183 2.677892
arginine L-Arginine nitrogen source 140 237 6 -2.369874 -4.506504 1.279048
Adenosine Adenosine nitrogen source 134 109 25 -2.525184 -4.612731 6.230414
lactate Sodium D,L-Lactate carbon source 132 215 18 -2.804058 -5.268548 2.657816
arginine L-Arginine carbon source 130 230 9 -2.373477 -4.693262 2.888054
Cytosine Cytidine carbon source 126 99 27 -2.519219 -4.938908 4.506314
tryptophan L-Tryptophan carbon source 126 121 5 -2.431773 -4.317210 2.013139
Uracil Uridine carbon source 123 113 10 -2.588311 -4.712125 2.453895
glycine Glycine nitrogen source 122 214 5 -2.385829 -4.561408 1.444166
glutamic acid L-Glutamic acid monopotassium salt monohydrate nitrogen source 122 204 9 -2.437988 -4.949864 1.742681
lysine L-Lysine nitrogen source 119 185 34 -2.486912 -4.924011 6.858823
lactate Sodium L-Lactate carbon source 118 204 7 -2.821713 -4.743132 2.203953
carnitine Carnitine Hydrochloride nitrogen source 115 115 0 -2.774157 -4.808201 -1.018449
Adenine Adenine hydrochloride hydrate nitrogen source 109 105 4 -2.722487 -4.797412 1.271302
alanine D-Alanine nitrogen source 107 107 0 -2.846057 -4.959222 -1.000259
phenylalanine L-Phenylalanine carbon source 106 95 11 -2.273625 -4.026115 1.392954
alanine D-Alanine carbon source 106 105 1 -2.605568 -4.373890 1.232153
tryptophan L-Tryptophan nitrogen source 105 102 3 -2.785935 -5.136272 1.712857
tyrosine L-tyrosine disodium salt nitrogen source 104 100 4 -2.670989 -5.036103 1.804645
Uracil Uridine nitrogen source 100 99 1 -2.929486 -5.053690 1.644816
inosine Inosine nitrogen source 99 95 4 -2.338730 -4.125109 1.191990
thymine Thymine nitrogen source 96 96 0 -2.711310 -4.719136 -1.011980
phenylalanine L-Phenylalanine nitrogen source 95 84 11 -2.478284 -4.748429 4.194349
Malate L-Malic acid disodium salt monohydrate carbon source 92 165 4 -2.558290 -4.365882 1.494431
glutamic acid L-Glutamic acid monopotassium salt monohydrate carbon source 90 159 6 -2.495423 -4.550672 2.047235
alanine L-Alanine carbon source 82 77 5 -2.548579 -3.960274 1.452788
proline L-Proline carbon source 77 126 12 -2.523442 -4.471556 3.223972
aspartate L-Aspartic Acid carbon source 72 113 17 -2.214458 -3.285422 3.071843
2. SEED Functional Annotations¶
Annotate fitness-significant genes with SEED subsystem categories to identify pathway associations.
# Get SEED annotations for significant genes
# Note: seedannotation table has columns: orgId, locusId, seed_desc (no seed_subsystem)
sig_loci = sig['locusId'].unique()
# Use a join via temp view instead of huge IN clause
seed_df = spark.sql(f"""
SELECT locusId, seed_desc
FROM kescience_fitnessbrowser.seedannotation
WHERE orgId = '{FB_ORG}'
""").toPandas()
# Filter to just our significant genes
seed_df = seed_df[seed_df['locusId'].isin(sig_loci)].copy()
print(f"SEED annotations for significant genes: {len(seed_df)}")
print(f"Unique genes with SEED annotation: {seed_df['locusId'].nunique()} / {len(sig_loci)}")
# Top descriptions
if len(seed_df) > 0:
print(f"\nTop SEED descriptions:")
print(seed_df['seed_desc'].value_counts().head(20).to_string())
SEED annotations for significant genes: 565 Unique genes with SEED annotation: 565 / 601 Top SEED descriptions: seed_desc Cytochrome c oxidase subunit CcoN (EC 1.9.3.1) 3 3-ketoacyl-CoA thiolase (EC 2.3.1.16) @ Acetyl-CoA acetyltransferase (EC 2.3.1.9) 2 Cytochrome c oxidase subunit CcoP (EC 1.9.3.1) 2 Transcriptional regulator, GntR family domain / Aspartate aminotransferase (EC 2.6.1.1) 2 Phosphoserine aminotransferase (EC 2.6.1.52) 2 Sensory box histidine kinase/response regulator 2 Phosphogluconate repressor HexR, RpiR family 2 Aminomethyltransferase (glycine cleavage system T protein) (EC 2.1.2.10) 2 Glutamate Aspartate transport system permease protein GltJ (TC 3.A.1.3.4) 2 Predicted L-lactate dehydrogenase, Iron-sulfur cluster-binding subunit YkgF 2 Leucine-responsive regulatory protein, regulator for leucine (or lrp) regulon and high-affinity branched-chain amino acid transport system 2 Aromatic amino acid transport protein AroP 2 Methylmalonate-semialdehyde dehydrogenase (EC 1.2.1.27) 2 Transcriptional regulator, AsnC family 2 Serine hydroxymethyltransferase (EC 2.1.2.1) 2 Transcriptional regulator containing an amidase domain and an AraC-type DNA-binding HTH domain 2 L-proline glycine betaine binding ABC transporter protein ProX (TC 3.A.1.12.1) 2 Homoserine O-acetyltransferase (EC 2.3.1.31) 2 Phosphoserine phosphatase (EC 3.1.3.3) 2 Threonine dehydratase biosynthetic (EC 4.3.1.19) 2
# Merge SEED annotations into fitness hits
# A gene can have multiple SEED annotations; keep all
sig_annotated = sig.merge(seed_df, on='locusId', how='left')
# Save full gene table
sig_annotated.to_csv(f'{DATA_DIR}/wom_fb_gene_table.tsv', sep='\t', index=False)
met_summary.to_csv(f'{DATA_DIR}/wom_fb_summary.tsv', sep='\t', index=False)
print(f"Saved {len(sig_annotated)} annotated fitness hits")
print(f"\nSample rows:")
sig_annotated[['wom_compound', 'locusId', 'gene', 'desc', 'fit', 't',
'direction', 'seed_desc']].sort_values('fit').head(20)
Saved 4764 annotated fitness hits Sample rows:
| wom_compound | locusId | gene | desc | fit | t | direction | seed_desc | |
|---|---|---|---|---|---|---|---|---|
| 4499 | lactate | AO353_20695 | NaN | O-succinylhomoserine sulfhydrylase | -5.268548 | -5.128211 | detrimental | O-acetylhomoserine sulfhydrylase (EC 2.5.1.49)... |
| 4523 | tryptophan | AO353_20695 | NaN | O-succinylhomoserine sulfhydrylase | -5.136272 | -7.020057 | detrimental | O-acetylhomoserine sulfhydrylase (EC 2.5.1.49)... |
| 1166 | lactate | AO353_05705 | NaN | oxidoreductase | -5.090104 | -8.495170 | detrimental | Predicted L-lactate dehydrogenase, Fe-S oxidor... |
| 1568 | Uracil | AO353_07220 | NaN | anthranilate synthase | -5.053690 | -6.015652 | detrimental | Anthranilate synthase, amidotransferase compon... |
| 232 | lactate | AO353_02070 | NaN | prephenate dehydratase | -5.044139 | -4.913809 | detrimental | Chorismate mutase I (EC 5.4.99.5) / Prephenate... |
| 4690 | Uracil | AO353_26580 | NaN | dihydropyrimidine dehydrogenase | -5.043094 | -10.207538 | detrimental | Dihydropyrimidine dehydrogenase [NADP+] (EC 1.... |
| 3373 | tyrosine | AO353_13070 | NaN | phosphoserine phosphatase | -5.036103 | -7.698820 | detrimental | Phosphoserine phosphatase (EC 3.1.3.3) |
| 3349 | lactate | AO353_13070 | NaN | phosphoserine phosphatase | -4.979465 | -4.853628 | detrimental | Phosphoserine phosphatase (EC 3.1.3.3) |
| 178 | alanine | AO353_01375 | NaN | phosphate acyltransferase | -4.959222 | -8.904415 | detrimental | Phosphate:acyl-ACP acyltransferase PlsX |
| 4537 | glutamic acid | AO353_20695 | NaN | O-succinylhomoserine sulfhydrylase | -4.949864 | -6.759177 | detrimental | O-acetylhomoserine sulfhydrylase (EC 2.5.1.49)... |
| 1086 | Cytosine | AO353_05115 | NaN | ATP phosphoribosyltransferase | -4.938908 | -11.035663 | detrimental | ATP phosphoribosyltransferase (EC 2.4.2.17) |
| 3374 | tryptophan | AO353_13070 | NaN | phosphoserine phosphatase | -4.936573 | -8.881211 | detrimental | Phosphoserine phosphatase (EC 3.1.3.3) |
| 4663 | lysine | AO353_24130 | NaN | hypothetical protein | -4.924011 | -4.750003 | detrimental | NaN |
| 4360 | carnitine | AO353_20620 | NaN | isopropylmalate isomerase | -4.880531 | -7.442707 | detrimental | 3-isopropylmalate dehydratase large subunit (E... |
| 4500 | lactate | AO353_20695 | NaN | O-succinylhomoserine sulfhydrylase | -4.871043 | -6.658694 | detrimental | O-acetylhomoserine sulfhydrylase (EC 2.5.1.49)... |
| 3379 | Uracil | AO353_13070 | NaN | phosphoserine phosphatase | -4.850491 | -8.100824 | detrimental | Phosphoserine phosphatase (EC 3.1.3.3) |
| 1537 | lactate | AO353_07220 | NaN | anthranilate synthase | -4.833131 | -8.067088 | detrimental | Anthranilate synthase, amidotransferase compon... |
| 1090 | alanine | AO353_05115 | NaN | ATP phosphoribosyltransferase | -4.830444 | -8.697987 | detrimental | ATP phosphoribosyltransferase (EC 2.4.2.17) |
| 1061 | lactate | AO353_05115 | NaN | ATP phosphoribosyltransferase | -4.830058 | -9.271248 | detrimental | ATP phosphoribosyltransferase (EC 2.4.2.17) |
| 1169 | lactate | AO353_05705 | NaN | oxidoreductase | -4.825996 | -8.054385 | detrimental | Predicted L-lactate dehydrogenase, Fe-S oxidor... |
3. Per-Metabolite Fitness Landscape¶
For each WoM-produced metabolite, show the genes with strongest fitness effects.
# Show top fitness hits per metabolite (top 5 most detrimental genes each)
print("Top 5 most important genes per metabolite (most negative fitness):")
print("=" * 100)
for compound in sorted(sig['wom_compound'].dropna().unique()):
subset = sig_annotated[sig_annotated['wom_compound'] == compound].copy()
top_det = subset.nsmallest(5, 'fit')
if len(top_det) == 0:
continue
wom_action = matched[matched['wom_compound'] == compound]['wom_action'].iloc[0]
print(f"\n{compound} (WoM: {wom_action}) — {len(subset)} significant genes")
print("-" * 80)
for _, g in top_det.iterrows():
gene_name = g['gene'] if pd.notna(g['gene']) else g['sysName']
seed = g['seed_desc'] if pd.notna(g.get('seed_desc')) else 'no SEED'
print(f" {gene_name:15s} fit={g['fit']:+.2f} t={g['t']:+.1f} {g['desc'][:50]:50s} [{seed[:40]}]")
Top 5 most important genes per metabolite (most negative fitness): ==================================================================================================== Adenine (WoM: I) — 109 significant genes -------------------------------------------------------------------------------- AO353_01375 fit=-4.80 t=-9.2 phosphate acyltransferase [Phosphate:acyl-ACP acyltransferase PlsX] AO353_20695 fit=-4.74 t=-8.5 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_20665 fit=-4.66 t=-10.8 N-(5'-phosphoribosyl)anthranilate isomerase [Phosphoribosylanthranilate isomerase (EC] AO353_20635 fit=-4.58 t=-12.5 3-isopropylmalate dehydrogenase [3-isopropylmalate dehydrogenase (EC 1.1.] AO353_07220 fit=-4.46 t=-8.0 anthranilate synthase [Anthranilate synthase, amidotransferase ] Adenosine (WoM: I) — 134 significant genes -------------------------------------------------------------------------------- AO353_20695 fit=-4.61 t=-6.3 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_07220 fit=-4.41 t=-7.4 anthranilate synthase [Anthranilate synthase, amidotransferase ] AO353_05115 fit=-4.37 t=-10.2 ATP phosphoribosyltransferase [ATP phosphoribosyltransferase (EC 2.4.2.] AO353_12520 fit=-4.34 t=-11.2 glutamate synthase [Glutamate synthase [NADPH] small chain (] AO353_13070 fit=-4.26 t=-8.2 phosphoserine phosphatase [Phosphoserine phosphatase (EC 3.1.3.3)] Cytosine (WoM: E) — 126 significant genes -------------------------------------------------------------------------------- AO353_05115 fit=-4.94 t=-11.0 ATP phosphoribosyltransferase [ATP phosphoribosyltransferase (EC 2.4.2.] AO353_12070 fit=-4.74 t=-9.1 imidazoleglycerol-phosphate dehydratase [Imidazoleglycerol-phosphate dehydratase ] AO353_12085 fit=-4.58 t=-10.2 1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)me [Phosphoribosylformimino-5-aminoimidazole] AO353_07230 fit=-4.47 t=-19.1 anthranilate synthase [Anthranilate synthase, aminase component] AO353_00310 fit=-4.31 t=-11.8 transaldolase [Transaldolase (EC 2.2.1.2)] Malate (WoM: I) — 169 significant genes -------------------------------------------------------------------------------- AO353_20695 fit=-4.37 t=-4.2 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_20665 fit=-4.27 t=-9.1 N-(5'-phosphoribosyl)anthranilate isomerase [Phosphoribosylanthranilate isomerase (EC] AO353_05115 fit=-4.15 t=-7.5 ATP phosphoribosyltransferase [ATP phosphoribosyltransferase (EC 2.4.2.] AO353_13070 fit=-4.15 t=-6.9 phosphoserine phosphatase [Phosphoserine phosphatase (EC 3.1.3.3)] AO353_08185 fit=-4.06 t=-6.8 methionine biosynthesis protein MetW [Homoserine O-acetyltransferase (EC 2.3.1] Uracil (WoM: I) — 223 significant genes -------------------------------------------------------------------------------- AO353_07220 fit=-5.05 t=-6.0 anthranilate synthase [Anthranilate synthase, amidotransferase ] AO353_26580 fit=-5.04 t=-10.2 dihydropyrimidine dehydrogenase [Dihydropyrimidine dehydrogenase [NADP+] ] AO353_13070 fit=-4.85 t=-8.1 phosphoserine phosphatase [Phosphoserine phosphatase (EC 3.1.3.3)] AO353_05115 fit=-4.71 t=-9.6 ATP phosphoribosyltransferase [ATP phosphoribosyltransferase (EC 2.4.2.] AO353_05115 fit=-4.68 t=-11.7 ATP phosphoribosyltransferase [ATP phosphoribosyltransferase (EC 2.4.2.] alanine (WoM: I) — 295 significant genes -------------------------------------------------------------------------------- AO353_01375 fit=-4.96 t=-8.9 phosphate acyltransferase [Phosphate:acyl-ACP acyltransferase PlsX] AO353_05115 fit=-4.83 t=-8.7 ATP phosphoribosyltransferase [ATP phosphoribosyltransferase (EC 2.4.2.] AO353_07220 fit=-4.60 t=-7.7 anthranilate synthase [Anthranilate synthase, amidotransferase ] AO353_20695 fit=-4.58 t=-7.6 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_20625 fit=-4.58 t=-4.5 3-isopropylmalate dehydratase [3-isopropylmalate dehydratase small subu] arginine (WoM: I) — 482 significant genes -------------------------------------------------------------------------------- AO353_02995 fit=-4.69 t=-7.8 succinylglutamate desuccinylase [Succinylglutamate desuccinylase (EC 3.5.] AO353_02995 fit=-4.55 t=-7.6 succinylglutamate desuccinylase [Succinylglutamate desuccinylase (EC 3.5.] AO353_02070 fit=-4.51 t=-4.4 prephenate dehydratase [Chorismate mutase I (EC 5.4.99.5) / Prep] AO353_20695 fit=-4.47 t=-6.1 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_03005 fit=-4.47 t=-6.1 succinylarginine dihydrolase [Succinylarginine dihydrolase (EC 3.5.3.2] aspartate (WoM: I) — 130 significant genes -------------------------------------------------------------------------------- AO353_07230 fit=-3.29 t=-11.8 anthranilate synthase [Anthranilate synthase, aminase component] AO353_07230 fit=-3.24 t=-9.1 anthranilate synthase [Anthranilate synthase, aminase component] AO353_13070 fit=-3.21 t=-6.8 phosphoserine phosphatase [Phosphoserine phosphatase (EC 3.1.3.3)] AO353_09950 fit=-3.15 t=-4.3 aspartate ammonia-lyase [Aspartate ammonia-lyase (EC 4.3.1.1)] AO353_20665 fit=-3.13 t=-7.3 N-(5'-phosphoribosyl)anthranilate isomerase [Phosphoribosylanthranilate isomerase (EC] carnitine (WoM: E) — 283 significant genes -------------------------------------------------------------------------------- AO353_20620 fit=-4.88 t=-7.4 isopropylmalate isomerase [3-isopropylmalate dehydratase large subu] AO353_20620 fit=-4.81 t=-8.6 isopropylmalate isomerase [3-isopropylmalate dehydratase large subu] AO353_20625 fit=-4.78 t=-4.6 3-isopropylmalate dehydratase [3-isopropylmalate dehydratase small subu] AO353_05115 fit=-4.68 t=-9.5 ATP phosphoribosyltransferase [ATP phosphoribosyltransferase (EC 2.4.2.] AO353_20635 fit=-4.66 t=-11.3 3-isopropylmalate dehydrogenase [3-isopropylmalate dehydrogenase (EC 1.1.] glutamic acid (WoM: I) — 378 significant genes -------------------------------------------------------------------------------- AO353_20695 fit=-4.95 t=-6.8 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_13070 fit=-4.55 t=-5.4 phosphoserine phosphatase [Phosphoserine phosphatase (EC 3.1.3.3)] AO353_20695 fit=-4.55 t=-4.4 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_20620 fit=-4.41 t=-7.9 isopropylmalate isomerase [3-isopropylmalate dehydratase large subu] AO353_02070 fit=-4.33 t=-7.2 prephenate dehydratase [Chorismate mutase I (EC 5.4.99.5) / Prep] glycine (WoM: I) — 219 significant genes -------------------------------------------------------------------------------- AO353_20695 fit=-4.56 t=-6.9 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_13110 fit=-4.37 t=-7.3 hypothetical protein [putative membrane protein] AO353_20625 fit=-4.28 t=-4.2 3-isopropylmalate dehydratase [3-isopropylmalate dehydratase small subu] AO353_12520 fit=-4.27 t=-10.3 glutamate synthase [Glutamate synthase [NADPH] small chain (] AO353_05115 fit=-4.20 t=-10.1 ATP phosphoribosyltransferase [ATP phosphoribosyltransferase (EC 2.4.2.] inosine (WoM: I) — 99 significant genes -------------------------------------------------------------------------------- AO353_07210 fit=-4.13 t=-6.3 indole-3-glycerol-phosphate synthase [Indole-3-glycerol phosphate synthase (EC] AO353_20625 fit=-3.67 t=-6.1 3-isopropylmalate dehydratase [3-isopropylmalate dehydratase small subu] AO353_20635 fit=-3.62 t=-14.1 3-isopropylmalate dehydrogenase [3-isopropylmalate dehydrogenase (EC 1.1.] AO353_05115 fit=-3.58 t=-13.5 ATP phosphoribosyltransferase [ATP phosphoribosyltransferase (EC 2.4.2.] AO353_12070 fit=-3.44 t=-11.7 imidazoleglycerol-phosphate dehydratase [Imidazoleglycerol-phosphate dehydratase ] lactate (WoM: E) — 706 significant genes -------------------------------------------------------------------------------- AO353_20695 fit=-5.27 t=-5.1 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_05705 fit=-5.09 t=-8.5 oxidoreductase [Predicted L-lactate dehydrogenase, Fe-S ] AO353_02070 fit=-5.04 t=-4.9 prephenate dehydratase [Chorismate mutase I (EC 5.4.99.5) / Prep] AO353_13070 fit=-4.98 t=-4.9 phosphoserine phosphatase [Phosphoserine phosphatase (EC 3.1.3.3)] AO353_20695 fit=-4.87 t=-6.7 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] lysine (WoM: E) — 219 significant genes -------------------------------------------------------------------------------- AO353_24130 fit=-4.92 t=-4.8 hypothetical protein [no SEED] AO353_20695 fit=-4.61 t=-7.0 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_20540 fit=-4.55 t=-8.1 aromatic amino acid aminotransferase [Biosynthetic Aromatic amino acid aminotr] AO353_05115 fit=-4.35 t=-9.7 ATP phosphoribosyltransferase [ATP phosphoribosyltransferase (EC 2.4.2.] AO353_07220 fit=-4.29 t=-7.1 anthranilate synthase [Anthranilate synthase, amidotransferase ] phenylalanine (WoM: I) — 201 significant genes -------------------------------------------------------------------------------- AO353_07220 fit=-4.75 t=-8.5 anthranilate synthase [Anthranilate synthase, amidotransferase ] AO353_12085 fit=-4.53 t=-9.2 1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)me [Phosphoribosylformimino-5-aminoimidazole] AO353_13165 fit=-4.48 t=-6.1 ATP phosphoribosyltransferase regulatory subunit [ATP phosphoribosyltransferase regulatory] AO353_12070 fit=-4.33 t=-10.5 imidazoleglycerol-phosphate dehydratase [Imidazoleglycerol-phosphate dehydratase ] AO353_13070 fit=-4.27 t=-10.3 phosphoserine phosphatase [Phosphoserine phosphatase (EC 3.1.3.3)] proline (WoM: I) — 138 significant genes -------------------------------------------------------------------------------- AO353_20540 fit=-4.47 t=-5.3 aromatic amino acid aminotransferase [Biosynthetic Aromatic amino acid aminotr] AO353_20665 fit=-4.14 t=-7.9 N-(5'-phosphoribosyl)anthranilate isomerase [Phosphoribosylanthranilate isomerase (EC] AO353_08185 fit=-4.02 t=-8.1 methionine biosynthesis protein MetW [Homoserine O-acetyltransferase (EC 2.3.1] AO353_10670 fit=-4.02 t=-7.2 shikimate dehydrogenase [Shikimate 5-dehydrogenase I alpha (EC 1.] AO353_07230 fit=-3.96 t=-13.8 anthranilate synthase [Anthranilate synthase, aminase component] thymine (WoM: E) — 96 significant genes -------------------------------------------------------------------------------- AO353_20665 fit=-4.72 t=-11.0 N-(5'-phosphoribosyl)anthranilate isomerase [Phosphoribosylanthranilate isomerase (EC] AO353_12070 fit=-4.70 t=-7.2 imidazoleglycerol-phosphate dehydratase [Imidazoleglycerol-phosphate dehydratase ] AO353_12085 fit=-4.62 t=-8.3 1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)me [Phosphoribosylformimino-5-aminoimidazole] AO353_07220 fit=-4.60 t=-8.3 anthranilate synthase [Anthranilate synthase, amidotransferase ] AO353_26570 fit=-4.58 t=-7.0 phenylhydantoinase [Dihydropyrimidinase (EC 3.5.2.2)] trehalose (WoM: I) — 262 significant genes -------------------------------------------------------------------------------- AO353_13070 fit=-4.52 t=-4.4 phosphoserine phosphatase [Phosphoserine phosphatase (EC 3.1.3.3)] AO353_20695 fit=-4.23 t=-5.0 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_15980 fit=-4.08 t=-7.8 trehalose permease IIC protein [PTS system, trehalose-specific IIB compo] AO353_20540 fit=-4.05 t=-5.5 aromatic amino acid aminotransferase [Biosynthetic Aromatic amino acid aminotr] AO353_08885 fit=-3.97 t=-13.6 polyphosphate kinase [Polyphosphate kinase (EC 2.7.4.1)] tryptophan (WoM: I) — 231 significant genes -------------------------------------------------------------------------------- AO353_20695 fit=-5.14 t=-7.0 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_13070 fit=-4.94 t=-8.9 phosphoserine phosphatase [Phosphoserine phosphatase (EC 3.1.3.3)] AO353_02075 fit=-4.79 t=-10.7 3-phosphoserine/phosphohydroxythreonine aminotrans [Phosphoserine aminotransferase (EC 2.6.1] AO353_05115 fit=-4.74 t=-11.9 ATP phosphoribosyltransferase [ATP phosphoribosyltransferase (EC 2.4.2.] AO353_13165 fit=-4.73 t=-5.6 ATP phosphoribosyltransferase regulatory subunit [ATP phosphoribosyltransferase regulatory] tyrosine (WoM: E) — 104 significant genes -------------------------------------------------------------------------------- AO353_13070 fit=-5.04 t=-7.7 phosphoserine phosphatase [Phosphoserine phosphatase (EC 3.1.3.3)] AO353_07220 fit=-4.82 t=-8.0 anthranilate synthase [Anthranilate synthase, amidotransferase ] AO353_20695 fit=-4.38 t=-7.8 O-succinylhomoserine sulfhydrylase [O-acetylhomoserine sulfhydrylase (EC 2.5] AO353_08185 fit=-4.32 t=-11.8 methionine biosynthesis protein MetW [Homoserine O-acetyltransferase (EC 2.3.1] AO353_12070 fit=-4.31 t=-11.1 imidazoleglycerol-phosphate dehydratase [Imidazoleglycerol-phosphate dehydratase ] valine (WoM: E) — 160 significant genes -------------------------------------------------------------------------------- AO353_20620 fit=-4.09 t=-5.6 isopropylmalate isomerase [3-isopropylmalate dehydratase large subu] AO353_20540 fit=-4.04 t=-4.8 aromatic amino acid aminotransferase [Biosynthetic Aromatic amino acid aminotr] AO353_20635 fit=-4.03 t=-6.7 3-isopropylmalate dehydrogenase [3-isopropylmalate dehydrogenase (EC 1.1.] AO353_20665 fit=-3.99 t=-6.7 N-(5'-phosphoribosyl)anthranilate isomerase [Phosphoribosylanthranilate isomerase (EC] AO353_26635 fit=-3.95 t=-4.7 2-oxoisovalerate dehydrogenase [Branched-chain alpha-keto acid dehydroge]
4. Visualization¶
# Bar chart: number of significant fitness genes per metabolite
plot_data = met_summary.groupby('wom_compound').agg(
n_genes=('n_genes', 'sum'),
n_detrimental=('n_detrimental', 'sum'),
n_beneficial=('n_beneficial', 'sum'),
).reset_index().sort_values('n_genes', ascending=True)
fig, ax = plt.subplots(figsize=(10, max(6, len(plot_data) * 0.35)))
y_pos = range(len(plot_data))
ax.barh(y_pos, plot_data['n_detrimental'], color='#d62728', label='Detrimental (gene needed)', alpha=0.8)
ax.barh(y_pos, plot_data['n_beneficial'], left=plot_data['n_detrimental'],
color='#2ca02c', label='Beneficial (gene inhibits)', alpha=0.8)
ax.set_yticks(y_pos)
ax.set_yticklabels(plot_data['wom_compound'])
ax.set_xlabel('Number of significant genes (|fit|>1, |t|>4)')
ax.set_title('Fitness Browser Gene Hits for WoM-Produced Metabolites\n(FW300-N2E3)')
ax.legend(loc='lower right')
ax.grid(axis='x', alpha=0.3)
plt.tight_layout()
plt.savefig(f'{FIG_DIR}/fitness_hits_per_metabolite.png', dpi=150, bbox_inches='tight')
plt.show()
print(f"Saved to {FIG_DIR}/fitness_hits_per_metabolite.png")
Saved to ../figures/fitness_hits_per_metabolite.png
# Heatmap: metabolite × average fitness per experiment
# Use average fitness across all genes as an indicator of growth quality
avg_fit = fitness_df.groupby(['condition_1']).agg(
mean_fit=('fit', 'mean'),
median_fit=('fit', 'median'),
n_sig_genes=('fit', lambda x: ((x.abs() > 1)).sum()),
n_total_genes=('fit', 'count'),
).reset_index()
avg_fit['wom_compound'] = avg_fit['condition_1'].map(fb_to_wom)
avg_fit = avg_fit.dropna(subset=['wom_compound'])
avg_fit['pct_sig'] = (avg_fit['n_sig_genes'] / avg_fit['n_total_genes'] * 100).round(1)
print("Experiment-level fitness summary:")
print(avg_fit[['wom_compound', 'condition_1', 'mean_fit', 'median_fit',
'n_sig_genes', 'pct_sig']].sort_values('n_sig_genes', ascending=False).to_string(index=False))
Experiment-level fitness summary:
wom_compound condition_1 mean_fit median_fit n_sig_genes pct_sig
arginine L-Arginine -0.104687 -0.018210 771 3.8
glutamic acid L-Glutamic acid monopotassium salt monohydrate -0.092822 -0.020009 607 3.0
trehalose D-Trehalose dihydrate -0.109553 -0.018642 458 4.6
phenylalanine L-Phenylalanine -0.098087 -0.017988 433 4.3
tryptophan L-Tryptophan -0.102204 -0.012934 413 4.1
carnitine Carnitine Hydrochloride -0.101921 -0.005891 390 3.9
alanine D-Alanine -0.115532 -0.020967 387 3.9
glycine Glycine -0.099377 -0.018285 363 3.6
valine L-Valine -0.139146 -0.022304 353 7.0
lactate Sodium D-Lactate -0.099922 -0.010447 352 3.5
Uracil Uridine -0.119199 -0.038009 342 3.4
lactate Sodium D,L-Lactate -0.084858 -0.006912 314 3.1
aspartate L-Aspartic Acid -0.074512 -0.021660 308 3.1
lysine L-Lysine -0.075192 -0.016797 304 3.0
alanine L-Alanine -0.150089 -0.047819 293 5.8
lactate Sodium L-Lactate -0.088307 -0.010417 279 2.8
Malate L-Malic acid disodium salt monohydrate -0.079977 -0.013888 274 2.7
proline L-Proline -0.058721 -0.005994 252 2.5
Adenosine Adenosine -0.064738 -0.004311 195 3.9
Cytosine Cytidine -0.061758 -0.010202 168 3.3
inosine Inosine -0.080101 -0.014102 161 3.2
tyrosine L-tyrosine disodium salt -0.073965 0.001027 158 3.1
Adenine Adenine hydrochloride hydrate -0.101043 -0.023860 156 3.1
thymine Thymine -0.090147 -0.018033 152 3.0
5. Production vs. Utilization Gene Overlap¶
Key analysis: For each metabolite that FW300-N2E3 produces (WoM), which genes are essential when growing on that metabolite (FB)? If the same genes appear in both biosynthesis and catabolism contexts, they may be bifunctional or central metabolic genes.
# Find genes that are fitness-important across multiple metabolites
gene_metabolite_counts = sig.groupby('locusId').agg(
n_metabolites=('wom_compound', 'nunique'),
metabolites=('wom_compound', lambda x: ', '.join(sorted(x.dropna().unique()))),
mean_fit=('fit', 'mean'),
).reset_index()
# Merge with gene descriptions
gene_info = sig[['locusId', 'sysName', 'gene', 'desc']].drop_duplicates('locusId')
gene_metabolite_counts = gene_metabolite_counts.merge(gene_info, on='locusId')
# Pleiotropic genes (important for 3+ metabolites)
pleiotropic = gene_metabolite_counts[gene_metabolite_counts['n_metabolites'] >= 3].sort_values(
'n_metabolites', ascending=False
)
print(f"Genes with significant fitness in 3+ metabolite conditions: {len(pleiotropic)}")
if len(pleiotropic) > 0:
print("\nTop pleiotropic genes:")
for _, g in pleiotropic.head(20).iterrows():
gene_name = g['gene'] if pd.notna(g['gene']) else g['sysName']
print(f" {gene_name:15s} ({g['n_metabolites']} metabolites, mean fit={g['mean_fit']:+.2f})")
print(f" {g['desc'][:70]}")
print(f" Metabolites: {g['metabolites']}")
Genes with significant fitness in 3+ metabolite conditions: 231
Top pleiotropic genes:
AO353_08180 (21 metabolites, mean fit=-3.43)
homoserine O-acetyltransferase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_08345 (21 metabolites, mean fit=-3.53)
dihydroxy-acid dehydratase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_08015 (21 metabolites, mean fit=-2.71)
5,10-methylenetetrahydrofolate reductase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_08185 (21 metabolites, mean fit=-3.73)
methionine biosynthesis protein MetW
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_05115 (21 metabolites, mean fit=-4.11)
ATP phosphoribosyltransferase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_05110 (21 metabolites, mean fit=-3.41)
histidinol dehydrogenase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_10670 (21 metabolites, mean fit=-3.19)
shikimate dehydrogenase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_14495 (21 metabolites, mean fit=-3.04)
acetolactate synthase 3 catalytic subunit
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_14505 (21 metabolites, mean fit=-3.66)
ketol-acid reductoisomerase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_12075 (21 metabolites, mean fit=-3.32)
imidazole glycerol phosphate synthase subunit HisH
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_12085 (21 metabolites, mean fit=-3.85)
1-(5-phosphoribosyl)-5-[(5- phosphoribosylamino)methylideneamino] imid
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_20635 (21 metabolites, mean fit=-3.86)
3-isopropylmalate dehydrogenase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_20695 (21 metabolites, mean fit=-4.06)
O-succinylhomoserine sulfhydrylase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_15925 (21 metabolites, mean fit=-2.76)
2-isopropylmalate synthase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_20620 (21 metabolites, mean fit=-3.59)
isopropylmalate isomerase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_13165 (21 metabolites, mean fit=-3.66)
ATP phosphoribosyltransferase regulatory subunit
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_12070 (21 metabolites, mean fit=-3.89)
imidazoleglycerol-phosphate dehydratase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_12360 (21 metabolites, mean fit=-2.86)
phosphoribosyl-ATP pyrophosphatase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
AO353_07220 (20 metabolites, mean fit=-3.81)
anthranilate synthase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, arginine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tyrosine, valine
AO353_04105 (20 metabolites, mean fit=-2.88)
argininosuccinate synthase
Metabolites: Adenine, Adenosine, Cytosine, Malate, Uracil, alanine, aspartate, carnitine, glutamic acid, glycine, inosine, lactate, lysine, phenylalanine, proline, thymine, trehalose, tryptophan, tyrosine, valine
# Summary statistics
print("=" * 60)
print("NB02 SUMMARY: WoM ↔ FB Integration")
print("=" * 60)
print(f"\nWoM-produced metabolites with FB match: {len(matched)}")
print(f"Total gene-experiment records queried: {len(fitness_df)}")
print(f"Significant fitness hits: {len(sig)}")
print(f"Unique genes with significant fitness: {sig['locusId'].nunique()}")
print(f"Conditions with any significant hit: {sig['condition_1'].nunique()}")
if len(pleiotropic) > 0:
print(f"Pleiotropic genes (3+ metabolites): {len(pleiotropic)}")
print(f"\nGenes annotated with SEED subsystem: {seed_df['locusId'].nunique()} / {len(sig_loci)}")
print(f"\nFiles saved:")
print(f" {DATA_DIR}/wom_fb_gene_table.tsv")
print(f" {DATA_DIR}/wom_fb_summary.tsv")
print(f" {FIG_DIR}/fitness_hits_per_metabolite.png")
============================================================ NB02 SUMMARY: WoM ↔ FB Integration ============================================================ WoM-produced metabolites with FB match: 28 Total gene-experiment records queried: 221056 Significant fitness hits: 4764 Unique genes with significant fitness: 601 Conditions with any significant hit: 24 Pleiotropic genes (3+ metabolites): 231 Genes annotated with SEED subsystem: 565 / 601 Files saved: ../data/wom_fb_gene_table.tsv ../data/wom_fb_summary.tsv ../figures/fitness_hits_per_metabolite.png
spark.stop()
print("Spark session closed.")
Spark session closed.