Paramvir S. Dehal ORCID

Research Question

Among the 12,838 metal-important genes identified by the Metal Fitness Atlas, which are specifically required for metal tolerance vs general stress survival — and do the metal-specific genes show the expected accessory-genome enrichment?

Research Plan

Hypothesis

  • H0: Metal-important genes are no more condition-specific than expected by chance; the 87.4% core enrichment reflects the true genetic architecture of metal tolerance.
  • H1: The 87.4% core enrichment is driven by general stress genes (cell envelope, DNA repair, central metabolism) that are needed for many stresses. After removing genes also important under non-metal conditions, the remaining metal-specific genes are enriched in the accessory genome and represent the true metal resistance repertoire (efflux, sequestration, detoxification).

Sub-hypotheses

  • H1a: Genes important for metals AND non-metal stresses (antibiotics, osmotic, oxidative) are >90% core — these are general stress response genes.
  • H1b: Genes important for metals but NOT for any non-metal stress are <80% core — recovering the accessory enrichment expected for specialized resistance mechanisms.
  • H1c: Metal-specific genes are enriched for known metal resistance functions (efflux pumps, metal-binding proteins, CDF transporters) relative to general stress genes.
  • H1d: The 149 novel metal candidates from the atlas are disproportionately metal-specific (not general stress genes), confirming they represent genuine metal biology discoveries.
  • H1e (exploratory): Different metals differ in their ratio of metal-specific to general stress genes. Note: essential metals (Fe, Mo, W) have limited organism coverage (1-3 organisms each), so this comparison is underpowered and treated as exploratory.

Revision History

  • v1 (2026-02-27): Initial plan
  • v2 (2026-02-27): Incorporated plan review feedback: added rate-based thresholds alongside absolute counts to address experiment-count bias; clarified total experiment count (6,504 total = 559 metal + 5,945 non-metal); added specificphenotype table as validation; added ICA module specificity analysis; downgraded H1e to exploratory; added essential gene bias caveat; added counter_ion_effects cross-validation checkpoint

Overview

The Metal Fitness Atlas found that metal-important genes are 87.4% core (OR=2.08), the opposite of the expected accessory enrichment for metal resistance. This project tests whether the core enrichment is driven by general stress genes (cell envelope, DNA repair) that are needed for many stresses, not just metals. By comparing each metal-important gene's fitness across all 5,945 non-metal experiments in the Fitness Browser, we classify genes as metal-specific, shared-stress, or generally sick. If metal-specific genes are accessory-enriched, this resolves the paradox and identifies the true metal resistance determinants.

Key Findings

1. 55% of Metal-Important Genes Are Metal-Specific

Specificity breakdown by organism

Of the 7,609 metal-important gene records with fitness matrix data across 24 organisms, 4,177 (54.9%) are metal-specific — they show significant fitness defects under metal stress but a <5% sick rate across 5,945 non-metal experiments. The remaining genes split into general sick (2,888, 38.0%) and metal+stress (544, 7.2%). This classification is robust across thresholds: at 2% sick rate, ~41% are metal-specific; at 10%, ~67% are.

Coverage note: 7 of 31 metal-tested organisms (ANA3, Dino, Keio, MR1, Miya, PV4, SB2B) could not be processed because their metal-important gene locusIds did not match the fitness matrix index format. The 24 included organisms account for 7,609 of 12,838 metal-important gene records (59.3%). The excluded organisms are taxonomically diverse and their absence is not expected to introduce systematic bias.

Threshold sensitivity

Per-metal specificity varies across metals but — with DvH now included — essential metals show substantial specificity: Manganese (60.6%), Molybdenum (60.5%), Tungsten (56.5%), Selenium (46.4%). Toxic metals range from 42-56% metal-specific. Iron is lowest at 21.9%, likely reflecting its central role in core metabolism.

(Notebook: 02_gene_specificity.ipynb)

2. Metal-Specific Genes Are Core-Enriched but Less So Than General Sick Genes

Conservation by specificity

Core fractions across the 22 organisms with pangenome links, reported as both pooled (total core / total genes) and organism-mean (mean of per-organism core fractions):

Category Pooled Core Fraction Organism-Mean Core Fraction Mean Delta vs Baseline Positive/Total Significant/Total
Metal-specific 84.8% (2,969/3,500) 88.0% +6.9% 19/22 12/22
Metal+stress 94.3% (467/495) 93.6% +10.9% 13/13 1/13
General sick 90.2% (2,183/2,420) 90.2% +9.0% 21/21 8/21
Baseline 79.8% (73,957/92,650) 81.1%

The pooled metal-specific core fraction (84.8%) is lower than the organism-mean (88.0%), reflecting the influence of organisms with large gene sets and lower core fractions. Both metrics show the same pattern: metal-specific genes are core-enriched above baseline but less so than general sick genes.

All three categories are significantly core-enriched above baseline. Metal-specific genes are the least core-enriched of the three, consistent with specialized metal resistance mechanisms being slightly more likely to reside in the accessory genome than general stress functions — but the difference is modest.

CMH test: The Cochran-Mantel-Haenszel test comparing metal-specific vs general-sick core enrichment across organisms is statistically significant (p=0.011), confirming that metal-specific genes are less core-enriched than general sick genes across organisms. This is consistent with specialized metal resistance mechanisms being modestly more likely to reside in the accessory genome than general cellular functions.

Essential gene caveat: ~14% of protein-coding genes (~82% core) are putatively essential and absent from fitness data. This biases all categories toward core enrichment — the true baseline core fraction is likely lower than 81%, making the reported deltas conservative.

(Notebook: 03_conservation_analysis.ipynb)

3. Metal-Specific Genes Are Enriched for Metal Resistance Functions

Functional comparison

Metal-specific genes are 1.64x more likely to match metal-resistance keywords (efflux, transporter, metal, CDF, siderophore, etc.) than general sick genes (12.2% vs 7.8%, Fisher exact OR=1.64, p=2.4e-8). Conversely, general sick genes show slightly higher enrichment for general stress keywords (DNA repair, cell wall, chaperone, etc.) at 11.5% vs 13.7%. This confirms the specificity classification captures biologically meaningful categories.

(Notebook: 04_functional_enrichment.ipynb)

4. Top Novel Candidate Specificity

Candidate Metal-Specific / Total Fraction Mean Sick Rate
UCP030820 (OG01015, 3 orgs, 7 metals) 2/3 67% 0.021
YebC (OG01383, 11 orgs, 6 metals) 7/12 58% 0.056
DUF1043/YhcB (OG03264, 6 orgs, 5 metals) 3/6 50% 0.054
UPF0042/RapZ (OG02094, 8 orgs, 7 metals) 2/8 25% 0.130
MlaD (OG04003, 4 orgs, 4 metals) 1/4 25% 0.113
YfdZ (OG00391, 7 orgs, 9 metals) 2/13 15% 0.268
YrbC (OG02233, 8 orgs, 4 metals) 1/9 11% 0.234
DUF39 (OG08209, 2 orgs, 8 metals) 0/2 0% 0.637
YrbE (OG03534, 6 orgs, 5 metals) 0/6 0% 0.190

Three candidates show strong metal-specificity: UCP030820 (67%, oxidoreductase involved in sulfite reduction, important for 7 metals including Cd and Cr), YebC (58%, transcriptional regulator/translation factor spanning 11 organisms and 6 metals), and DUF1043/YhcB (50%, cell division/envelope coordination protein). These are primarily metal-specific rather than general stress genes.

YebC's metal-specificity is mechanistically intriguing. Ignatov et al. (2025) showed YebC functions as a translation factor for proline-rich proteins, resolving ribosome stalling at polyproline motifs. Many metal homeostasis proteins — including P-type ATPases (CopA, ZntA), CDF transporters (CzcD), and metal-binding chaperones — contain proline-rich regions in their cytoplasmic loops. A plausible hypothesis is that YebC is specifically required under metal stress because metal-induced demand for these proline-rich metal transporters creates a translation bottleneck that YebC resolves. Under non-metal conditions, these transporters are not highly expressed and YebC is dispensable — explaining its metal-specific fitness profile.

YfdZ and the Mla/Yrb system (YrbC/D/E) are more pleiotropic — sick under many non-metal conditions. YfdZ's high sick rate (0.268) reflects its known role in alanine biosynthesis. The Mla system's pleiotropic fitness defects are consistent with its established function in maintaining outer membrane integrity under diverse stresses.

DUF39 shows 0% metal-specificity (sick rate 0.637) — it is important for many conditions, not just metals. Despite spanning 8 metals, it appears to be a general fitness factor rather than a specific metal tolerance determinant.

(Notebook: 04_functional_enrichment.ipynb)

5. Novel Candidates Are Not Disproportionately Metal-Specific

Across all 149 novel metal candidate families, 45.6% have a dominant specificity of "metal-specific" — compared to 58.2% for annotated families (Fisher exact OR=0.60, p=0.003). Novel candidates are less metal-specific than annotated ones. This reflects the composition of the novel set: many novel candidates were identified in deeply-profiled organisms (DvH, Btheta, psRCH2) where the high experiment count provides more opportunities to detect pleiotropic effects, pushing genes toward "general sick."

(Notebook: 04_functional_enrichment.ipynb)

6. ICA Module Analysis: Inconclusive

The module-level specificity analysis using z-scored activity profiles found 0 metal-specific modules. The per-module z-normalization produces max |z| values < 2.0 for most metal experiments because metal experiments are a small fraction of total experiments per organism. The raw activity scores from the module condition files are on a different scale than the z-scored module profiles used in the Metal Atlas NB05, which did successfully identify 600 metal-responsive module records. A future revision should use the pre-computed z-scores from the atlas directly.

(Notebook: 04_functional_enrichment.ipynb)

7. Cross-Validation Against Counter Ion Effects

The counter_ion_effects project found 39.8% overlap between metal-important and NaCl-stress genes. This analysis finds 14.7% of metal-important genes are sick under osmotic stress — a 2.7x discrepancy. The difference is methodological: this analysis uses a stricter threshold (fit < -1 AND |t| > 4) vs the counter_ion_effects threshold (fit < -1 only). Additionally, the organism sets differ partially. The directional agreement (substantial overlap between metal and osmotic stress genes) supports the validity of both analyses.

(Notebook: 02_gene_specificity.ipynb)

Results

Gene Specificity Summary

Metric Value
Total experiments classified 6,504
Metal experiments 559 (8.6%)
Non-metal experiments 5,945 (91.4%)
Metal-important gene records analyzed 7,609 (of 12,838 atlas total, 59.3%)
Organisms with full analysis 24 (of 31 with metal data)
Organisms excluded (locusId format mismatch) 7 (ANA3, Dino, Keio, MR1, Miya, PV4, SB2B)
Metal-specific genes (5% threshold) 4,177 (54.9%)
Metal+stress genes 544 (7.2%)
General sick genes 2,888 (38.0%)

Per-Metal Specificity

Metal Category Metal-Specific / Total % Metal-Specific
Manganese essential 20/33 60.6%
Molybdenum essential 185/306 60.5%
Cadmium toxic 52/93 55.9%
Tungsten essential 173/306 56.5%
Copper toxic 1,346/2,594 51.9%
Cobalt toxic 1,167/2,324 50.2%
Chromium toxic 132/268 49.3%
Uranium toxic 88/181 48.6%
Zinc toxic 843/1,786 47.2%
Selenium essential 64/138 46.4%
Nickel toxic 993/2,271 43.7%
Aluminum toxic 752/1,772 42.4%
Mercury toxic 35/107 32.7%
Iron essential 144/659 21.9%

Conservation by Specificity (Organism-Mean)

Category Org-Mean Core Mean Delta Positive/Total Sig/Total CMH p-value
Metal-specific 88.0% +6.9% 19/22 12/22
Metal+stress 93.6% +10.9% 13/13 1/13
General sick 90.2% +9.0% 21/21 8/21
Metal-specific vs general-sick 0.011 *

Functional Enrichment

Category Metal-Resistance Keywords General-Stress Keywords N Annotated
Metal-specific 12.2% 13.7% 3,344
Metal+stress 8.9% 6.5% 495
General sick 7.8% 11.5% 2,573

H1c: Fisher exact (metal-resistance keywords: metal-specific vs general-sick): OR=1.64, p=2.4e-8
H1d: Fisher exact (novel vs annotated metal-specificity): OR=0.60, p=0.003

Interpretation

Metal-Specific Genes Exist and Are Functionally Distinct

The most important finding is that the specificity classification works — it separates biologically meaningful categories. Metal-specific genes are 1.64x enriched for metal-resistance annotations (p=2.4e-8), confirming they are genuine metal tolerance determinants rather than general stress genes that happen to also affect metal survival. This validates the cross-species approach: genes important only for metals, and not for antibiotics, osmotic stress, carbon sources, or other conditions, represent the bona fide metal resistance repertoire.

The Core Genome Robustness Model Holds

Metal-specific genes are core-enriched (88.0% vs 81.1% baseline), but significantly less so than general sick genes (90.2%; CMH p=0.011). This confirms a modest but real difference: specialized metal resistance genes are slightly more likely to be in the accessory genome than general stress genes. This means:

  1. The Metal Atlas's 87.4% core finding is not an artifact. It was not inflated by general stress genes. Even after removing all pleiotropic genes, the remaining metal-specific set is 88% core.
  2. Metal resistance is predominantly a core genome function, but significantly less so than general stress response (CMH p=0.011). Specialized metal resistance genes are still 88% core — overwhelmingly conserved — but the 2% gap from general sick genes (90.2%) represents a detectable signal of accessory genome contribution.
  3. The two-tier model from the Metal Atlas is partially supported. Tier 1 (general stress) is more core than Tier 2 (specific resistance), as hypothesized. But the effect is modest: both tiers are strongly core-enriched, and the accessory genome contributes only a small fraction of metal resistance machinery.

Candidate Prioritization

The specificity analysis reshuffles the priority of the novel candidates identified by the Metal Atlas:

  • UCP030820, YebC, and DUF1043 emerge as the strongest candidates: high metal-specificity (50-67%), low pleiotropic effects, validated across multiple species
  • YfdZ and Mla/Yrb are deprioritized: high pleiotropic effects suggest they are general fitness factors with incidental metal phenotypes
  • DUF39 despite spanning 8 metals, is a general fitness factor (sick rate 0.637), not metal-specific

Limitations

  • 40.7% gene attrition: 7 organisms and 5,229 gene records were excluded due to locusId format mismatches between the metal atlas and fitness matrices. These excluded organisms include Keio (E. coli), MR1 (Shewanella), and ANA3 — all important model organisms. The excluded genes may have different specificity profiles.
  • ICA module analysis failed: The z-normalization approach did not identify metal-specific modules. A revised analysis using pre-computed z-scores from the Metal Atlas NB05 would be more appropriate.
  • Counter-ion cross-validation shows 2.7x discrepancy: Methodological differences (threshold stringency) explain most of this gap, but matching the counter_ion_effects methodology exactly would strengthen the validation.
  • specificphenotype table validation not performed: The planned validation against the Fitness Browser's built-in condition-specificity annotations was not completed. This remains a valuable future validation step.
  • Essential genes invisible: ~14% of genes (~82% core) are absent from fitness data, biasing all conservation estimates toward core enrichment.
  • Threshold sensitivity: The 5% sick-rate threshold is arbitrary. Results are qualitatively stable across 1-20% but exact fractions vary.

Future Directions

  1. Fix locusId format for excluded organisms: Resolve the integer-vs-string mismatch for ANA3, Dino, Keio, MR1, Miya, PV4, SB2B to recover the remaining 40% of gene records.
  2. Validate against specificphenotype table: Use the Fitness Browser's built-in condition-specificity annotations as an independent validation.
  3. Metal-specific module re-analysis: Use pre-computed z-scored module activities from the Metal Atlas NB05.
  4. Replicate counter_ion_effects threshold: Match the exact methodology (fit < -1 without |t| > 4) to validate the osmotic overlap.
  5. Structural analysis: Use AlphaFold predictions to identify metal-binding sites in the top metal-specific candidates (UCP030820, YebC, DUF1043).

Data

Generated Data

File Rows Description
data/experiment_classification.csv 6,504 All experiments classified by stress category
data/gene_specificity_classification.csv 7,609 Per-gene specificity with sick rates and category counts
data/metal_genes_with_specificity.csv 12,838 Metal-important genes joined with specificity (NaN for excluded organisms)
data/specificity_conservation.csv 56 Per-organism per-category conservation statistics
data/og_specificity.csv 2,891 Per-OG family specificity (dominant category + fraction)

Figures

Figure Description
experiment_classification.png Experiment category distribution and per-organism breakdown
specificity_breakdown.png Stacked bar: metal-specific vs general sick by organism
threshold_sensitivity.png Classification stability across 1-20% sick rate thresholds
conservation_by_specificity.png Core fraction boxplots by specificity category
functional_comparison.png Keyword enrichment and module specificity

References

  • Price MN et al. (2018). "Mutant phenotypes for thousands of bacterial genes of unknown function." Nature 557:503-509. PMID: 29769716
  • Wu et al. (2019). "The RuvRCAB operon contributes to resistance against Cr(VI), As(III), Sb(III), and Cd(II)." Appl Microbiol Biotechnol 103:2489-2500. PMID: 30729256
  • Ignatov et al. (2025). "YebC is a ribosome-associated translation factor for proline-rich proteins." Nature Communications. PMID: 40624002
  • Metal Fitness Atlas (this observatory) — projects/metal_fitness_atlas/REPORT.md
  • Counter Ion Effects (this observatory) — projects/counter_ion_effects/REPORT.md

Discoveries

Of 7,609 metal-important gene records across 24 organisms, 4,177 (55%) are sick under metal stress but have a <5% sick rate across 5,945 non-metal experiments (antibiotics, osmotic, carbon sources, etc.). The remaining 38% are "general sick" (important across many conditions) and 7% are "metal+stres

Read more →

Metal-specific genes are 84.8% core (pooled) vs 90.2% for general sick genes — a statistically significant difference (Cochran-Mantel-Haenszel p=0.011). Both categories are enriched above the 79.8% baseline. This partially supports the Metal Atlas's two-tier model: general stress response (Tier 1) i

Read more →

Among the top novel candidates from the Metal Atlas, three stand out as metal-specific rather than pleiotropic: UCP030820/OG01015 (67% metal-specific, oxidoreductase, 7 metals), YebC/OG01383 (58%, transcriptional regulator/translation factor, 6 metals, 11 organisms), and DUF1043-YhcB/OG03264 (50%, c

Read more →

YebC was recently shown to be a translation factor for proline-rich proteins (Ignatov et al. 2025). Many metal homeostasis proteins (P-type ATPases CopA/ZntA, CDF transporters) contain proline-rich cytoplasmic loops. Hypothesis: YebC is specifically needed under metal stress because upregulation of

Read more →

Initial analysis showed 0% metal-specificity for essential metals (Mo, W, Se, Mn), seemingly due to DvH's 608 non-metal experiments making specificity impossible. After fixing a locusId type mismatch that had silently excluded DvH, essential metals show 47-61% metal-specificity: Manganese (61%), Mol

Read more →

Review

AI Review BERIL Automated Review 2026-04-03 Reviewed

Summary

This project classifies 7,609 metal-important genes across 24 organisms as metal-specific (55%), metal+stress (7%), or general sick (38%) based on fitness profiles across 5,945 non-metal experiments, then tests whether metal-specific genes show distinct conservation and functional enrichment patterns. All critical issues from the prior review have been resolved: the CMH p-value now correctly reads 0.011 (matching NB03 cell 9 output), the Fisher exact tests use distinct variables and report correct values, DvH is included, and the novel candidate table is complete. Only minor issues remain.

This review was generated by an AI system. It should be treated as advisory input, not a definitive assessment.

Visualizations

Conservation By Specificity

Conservation By Specificity

Experiment Classification

Experiment Classification

Functional Comparison

Functional Comparison

Specificity Breakdown

Specificity Breakdown

Threshold Sensitivity

Threshold Sensitivity

Notebooks