05 Metal Responsive Modules
Jupyter notebook from the Pan-Bacterial Metal Fitness Atlas project.
NB 05: Metal-Responsive ICA Modules¶
Identify ICA modules that are activated/repressed under metal stress conditions. Test whether metal-responsive modules are enriched in core or accessory genes.
Runs locally — reads ICA module data from fitness_modules/data/modules/.
Inputs:
data/metal_experiments.csv(from NB01)fitness_modules/data/modules/{org}_module_conditions.csvfitness_modules/data/modules/{org}_gene_membership.csvconservation_vs_fitness/data/fb_pangenome_link.tsv
Outputs:
data/metal_modules.csvfigures/metal_module_activity_heatmap.png
In [1]:
import pandas as pd
import numpy as np
from pathlib import Path
import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt
import seaborn as sns
PROJECT_DIR = Path('..').resolve()
DATA_DIR = PROJECT_DIR / 'data'
FIGURES_DIR = PROJECT_DIR / 'figures'
FM_MODULES = PROJECT_DIR.parent / 'fitness_modules' / 'data' / 'modules'
# Load metal experiments
metal_exps = pd.read_csv(DATA_DIR / 'metal_experiments.csv')
exclude = ['Platinum', 'Metal_limitation']
metal_exps = metal_exps[~metal_exps['metal_element'].isin(exclude)]
print(f'Metal experiments: {len(metal_exps)}')
# Load FB-pangenome link
fb_link = pd.read_csv(
PROJECT_DIR.parent / 'conservation_vs_fitness' / 'data' / 'fb_pangenome_link.tsv',
sep='\t'
)
fb_link['locusId'] = fb_link['locusId'].astype(str)
print(f'FB-pangenome links: {len(fb_link):,}')
Metal experiments: 483
FB-pangenome links: 177,863
1. Score Module Activity Under Metal Conditions¶
In [2]:
ZSCORE_THRESHOLD = 2.0 # |z-score| > 2 = responsive (standardized across all conditions)
all_metal_modules = []
organisms_with_modules = []
for org_id in sorted(metal_exps['orgId'].unique()):
prof_path = FM_MODULES / f'{org_id}_module_profiles.csv'
if not prof_path.exists():
continue
# Load module profiles (module × experiment matrix)
prof_df = pd.read_csv(prof_path, index_col=0)
# Get metal experiment names for this organism
org_metal_exps = metal_exps[metal_exps['orgId'] == org_id]
metal_exp_names = set(org_metal_exps['expName'].tolist())
# Find metal experiments in profiles
metal_cols = [c for c in prof_df.columns if c in metal_exp_names]
if not metal_cols:
continue
# Z-score each module's activity across ALL experiments
# Then extract metal experiment z-scores
prof_mean = prof_df.mean(axis=1)
prof_std = prof_df.std(axis=1)
prof_std = prof_std.replace(0, np.nan) # avoid division by zero
for exp in metal_cols:
metal_info = org_metal_exps[org_metal_exps['expName'] == exp].iloc[0]
metal = metal_info['metal_element']
for module_id in prof_df.index:
raw_activity = prof_df.loc[module_id, exp]
std = prof_std.loc[module_id]
if pd.isna(std) or std == 0:
z_score = 0.0
else:
z_score = (raw_activity - prof_mean.loc[module_id]) / std
all_metal_modules.append({
'orgId': org_id,
'module_id': module_id,
'expName': exp,
'metal_element': metal,
'raw_activity': float(raw_activity),
'z_score': float(z_score),
'is_responsive': abs(z_score) > ZSCORE_THRESHOLD,
})
organisms_with_modules.append(org_id)
if all_metal_modules:
metal_modules = pd.DataFrame(all_metal_modules)
print(f'Organisms with module data: {len(set(organisms_with_modules))}')
print(f'Module × metal-experiment records: {len(metal_modules):,}')
n_responsive = metal_modules['is_responsive'].sum()
print(f'Metal-responsive (|z| > {ZSCORE_THRESHOLD}): {n_responsive:,} '
f'({100*n_responsive/len(metal_modules):.1f}%)')
print(f'Z-score range: {metal_modules["z_score"].min():.2f} to {metal_modules["z_score"].max():.2f}')
else:
metal_modules = pd.DataFrame()
print('No module-condition data found matching metal experiments')
Organisms with module data: 31 Module × metal-experiment records: 19,453 Metal-responsive (|z| > 2.0): 600 (3.1%) Z-score range: -13.01 to 13.93
In [3]:
if len(metal_modules) > 0:
# Summarize responsive modules per organism × metal
responsive = metal_modules[metal_modules['is_responsive']]
# Count unique responsive modules per organism
resp_by_org = responsive.groupby('orgId').agg(
n_responsive_modules=('module_id', 'nunique'),
n_metals=('metal_element', 'nunique'),
metals=('metal_element', lambda x: ','.join(sorted(x.unique()))),
).sort_values('n_responsive_modules', ascending=False)
print('Metal-responsive modules per organism:')
print('=' * 70)
for _, row in resp_by_org.iterrows():
print(f' {row.name:25s} {int(row.n_responsive_modules):3d} modules '
f'{int(row.n_metals)} metals [{row.metals}]')
# Which metals activate the most modules?
resp_by_metal = responsive.groupby('metal_element')['module_id'].nunique()
print(f'\nResponsive modules per metal:')
for metal, count in resp_by_metal.sort_values(ascending=False).items():
print(f' {metal:12s}: {count} modules')
else:
print('No module data to summarize')
Metal-responsive modules per organism: ====================================================================== DvH 47 modules 12 metals [Aluminum,Chromium,Cobalt,Iron,Manganese,Mercury,Molybdenum,Nickel,Selenium,Tungsten,Uranium,Zinc] psRCH2 11 modules 8 metals [Aluminum,Cadmium,Chromium,Cobalt,Copper,Nickel,Uranium,Zinc] acidovorax_3H11 11 modules 5 metals [Aluminum,Cobalt,Copper,Nickel,Zinc] Btheta 10 modules 2 metals [Copper,Zinc] Korea 9 modules 5 metals [Aluminum,Cobalt,Copper,Nickel,Zinc] Cola 9 modules 4 metals [Aluminum,Cobalt,Copper,Nickel] Cup4G11 8 modules 4 metals [Cobalt,Copper,Nickel,Zinc] Methanococcus_S2 8 modules 1 metals [Iron] Marino 7 modules 5 metals [Aluminum,Cobalt,Copper,Nickel,Zinc] Caulo 7 modules 3 metals [Cobalt,Copper,Nickel] SB2B 7 modules 3 metals [Cobalt,Copper,Zinc] Dino 6 modules 3 metals [Aluminum,Cobalt,Nickel] pseudo5_N2C3_1 6 modules 4 metals [Aluminum,Cobalt,Copper,Nickel] Methanococcus_JJ 6 modules 1 metals [Iron] MR1 5 modules 3 metals [Aluminum,Cobalt,Nickel] Koxy 5 modules 3 metals [Cobalt,Copper,Nickel] pseudo1_N1B4 5 modules 3 metals [Aluminum,Cobalt,Copper] Ponti 4 modules 3 metals [Cobalt,Copper,Nickel] pseudo3_N2E3 4 modules 3 metals [Cobalt,Copper,Nickel] pseudo6_N2E2 4 modules 2 metals [Aluminum,Copper] Phaeo 4 modules 4 metals [Cobalt,Copper,Nickel,Zinc] Pedo557 4 modules 5 metals [Aluminum,Cobalt,Copper,Nickel,Zinc] PV4 3 modules 5 metals [Aluminum,Cobalt,Copper,Nickel,Zinc] Kang 3 modules 2 metals [Cobalt,Nickel] BFirm 2 modules 2 metals [Cobalt,Zinc] Keio 2 modules 3 metals [Aluminum,Cobalt,Nickel] WCS417 2 modules 2 metals [Aluminum,Cobalt] ANA3 1 modules 1 metals [Cobalt] SynE 1 modules 1 metals [Zinc] pseudo13_GW456_L13 1 modules 1 metals [Aluminum] Responsive modules per metal: Zinc : 34 modules Aluminum : 33 modules Tungsten : 32 modules Nickel : 30 modules Cobalt : 28 modules Copper : 22 modules Molybdenum : 15 modules Iron : 14 modules Selenium : 11 modules Chromium : 9 modules Uranium : 5 modules Mercury : 4 modules Cadmium : 1 modules Manganese : 1 modules
2. Conservation of Metal-Responsive Module Genes¶
In [4]:
if len(metal_modules) > 0 and len(responsive) > 0:
# Get unique responsive modules
responsive_module_ids = responsive[['orgId', 'module_id']].drop_duplicates()
# For each responsive module, get its member genes and their conservation
module_cons_records = []
for _, row in responsive_module_ids.iterrows():
org_id = row['orgId']
module_id = row['module_id']
mem_path = FM_MODULES / f'{org_id}_gene_membership.csv'
if not mem_path.exists():
continue
mem_df = pd.read_csv(mem_path)
mem_df_idx = mem_df.set_index(mem_df.columns[0])
if str(module_id) in mem_df_idx.columns:
member_genes = mem_df_idx.index[mem_df_idx[str(module_id)] == 1].tolist()
elif module_id in mem_df_idx.columns:
member_genes = mem_df_idx.index[mem_df_idx[module_id] == 1].tolist()
else:
continue
member_genes_str = [str(g) for g in member_genes]
# Look up conservation
org_link = fb_link[(fb_link['orgId'] == org_id) &
(fb_link['locusId'].isin(member_genes_str))]
if len(org_link) > 0:
pct_core = org_link['is_core'].mean()
metals = responsive[
(responsive['orgId'] == org_id) &
(responsive['module_id'] == module_id)
]['metal_element'].unique()
module_cons_records.append({
'orgId': org_id,
'module_id': module_id,
'n_genes': len(member_genes),
'n_mapped': len(org_link),
'pct_core': pct_core,
'metals': ','.join(sorted(metals)),
'n_metals': len(metals),
})
if module_cons_records:
module_cons_df = pd.DataFrame(module_cons_records)
print(f'Metal-responsive modules with conservation data: {len(module_cons_df)}')
print(f'Mean core fraction of metal-responsive module genes: '
f'{module_cons_df["pct_core"].mean():.3f}')
print(f'\nDistribution of core fraction:')
print(module_cons_df['pct_core'].describe().to_string())
else:
module_cons_df = pd.DataFrame()
print('No module conservation data computed')
else:
module_cons_df = pd.DataFrame()
print('No responsive modules to analyze')
Metal-responsive modules with conservation data: 183 Mean core fraction of metal-responsive module genes: 0.826 Distribution of core fraction: count 183.000000 mean 0.826100 std 0.233131 min 0.000000 25% 0.723636 50% 0.928571 75% 1.000000 max 1.000000
3. Figures¶
In [5]:
if len(metal_modules) > 0 and len(responsive) > 0:
# Heatmap of module activity across metals (for top organisms)
mod_metal_activity = responsive.groupby(
['orgId', 'module_id', 'metal_element']
)['z_score'].mean().reset_index()
# Pick top 3 organisms by number of responsive modules
top_orgs = resp_by_org.head(3).index.tolist()
fig, axes = plt.subplots(1, min(3, len(top_orgs)), figsize=(5*min(3, len(top_orgs)), 6))
if len(top_orgs) == 1:
axes = [axes]
for ax, org in zip(axes, top_orgs):
org_data = mod_metal_activity[mod_metal_activity['orgId'] == org]
if len(org_data) == 0:
continue
pivot = org_data.pivot(index='module_id', columns='metal_element', values='z_score')
if len(pivot) > 20:
pivot = pivot.loc[pivot.abs().max(axis=1).sort_values(ascending=False).head(20).index]
sns.heatmap(pivot, cmap='RdBu_r', center=0, ax=ax,
cbar_kws={'label': 'Z-score'}, linewidths=0.5)
ax.set_title(f'{org}', fontsize=12)
ax.set_ylabel('Module')
plt.suptitle('Metal-Responsive Module Activity (z-scored)', fontsize=14, y=1.02)
plt.tight_layout()
fig.savefig(FIGURES_DIR / 'metal_module_activity_heatmap.png', dpi=150, bbox_inches='tight')
plt.show()
print(f'Saved: figures/metal_module_activity_heatmap.png')
else:
print('No responsive module data for figure')
Saved: figures/metal_module_activity_heatmap.png
4. Save Results¶
In [6]:
if len(metal_modules) > 0:
metal_modules.to_csv(DATA_DIR / 'metal_modules.csv', index=False)
print(f'Saved: data/metal_modules.csv ({len(metal_modules):,} records)')
if len(module_cons_df) > 0:
module_cons_df.to_csv(DATA_DIR / 'metal_module_conservation.csv', index=False)
print(f'Saved: data/metal_module_conservation.csv ({len(module_cons_df)} modules)')
n_responsive = metal_modules['is_responsive'].sum() if len(metal_modules) > 0 else 0
n_orgs = len(set(organisms_with_modules))
print('\n' + '=' * 80)
print('NB05 SUMMARY: Metal-Responsive ICA Modules')
print('=' * 80)
print(f'Organisms with module data: {n_orgs}')
print(f'Module x metal-experiment records: {len(metal_modules):,}')
print(f'Metal-responsive records (|z|>{ZSCORE_THRESHOLD}): {n_responsive:,}')
if len(responsive) > 0:
print(f'Unique responsive modules: {responsive["module_id"].nunique()}')
print(f'Organisms with responsive modules: {responsive["orgId"].nunique()}')
if len(module_cons_df) > 0:
print(f'Mean core fraction of metal module genes: {module_cons_df["pct_core"].mean():.3f}')
print('=' * 80)
Saved: data/metal_modules.csv (19,453 records) Saved: data/metal_module_conservation.csv (183 modules) ================================================================================ NB05 SUMMARY: Metal-Responsive ICA Modules ================================================================================ Organisms with module data: 31 Module x metal-experiment records: 19,453 Metal-responsive records (|z|>2.0): 600 Unique responsive modules: 52 Organisms with responsive modules: 30 Mean core fraction of metal module genes: 0.826 ================================================================================