ColabFold Memory Consumption Analysis for Proteome Structure Prediction#

Objective: Assess memory and GPU requirements for bacterial ColabFold workflows to inform hardware procurement for large-scale proteome analysis.
Samples provided by: Najwa T
Parameters provided by: Benjamin B
Environment: GenSoft HPC cluster (Slurm-managed)

🔧 Software Versions#

module load MMseqs2/15-6f452 Kalign/2.04 cuda/11.6 cudnn/11.x-v8.7.0.84 ColabFold/1.5.3
module load blast/2.2.26 dssp/2.2.1 psipred/4.02 gcc/9.2.0 openmpi/4.0.5 hhsuite/3.3.0

🧪 Step 1: CPU-Based Search (MSA Generation)#

Command:#

srun -p common -q fast --mem=128G -c 8 \
  colabfold_search \
    --threads 8 \
    --db-load-mode 2 \
    --use-templates 1 \
    --db2 pdb100_230517 \
    Proteomes_for_Structure/GCA000196215_Borreliella_bavariensis.fasta \
    ${COLABFOLD_DB} \
    msas

Key Observations:#

The proteome is split into ~900 subsamples.
Output: ~900 .a3m files and ~900 .m8 files.
Some .m8 files are empty — these sequences will not proceed to the folding stage.
Performance does not scale well with more than 8–16 CPU cores due to I/O bottlenecks.
Recommended: Use 8–16 cores for optimal throughput.

🚀 Step 2: GPU-Based Folding (Structure Prediction)#

Command:#

srun --input none -A admin -p gpu -q gpu \
  --gres=gpu:A40:1 --mem=64G \
  colabfold_batch \
    --num-recycle 12 \
    --templates \
    --pdb-hit-file msas/AAU07627.1_pdb100_230517.m8 \
    --local-pdb-path ${COLABFOLD_DB}/pdb/divided/ \
    msas/AAU07627.1.a3m \
    xxresults7627a40

Performance on A40 / A100:#

Runtime: ~10 minutes
VRAM Usage: 4–9 GB (measured via nvtop)
RAM Usage: ~16 GB
⚠️ Note: nvtop values differ from reportseff — trust nvtop for real-time GPU memory usage.

✅ All models successfully completed.

🧫 Case Study: Mycobacterium Proteome (Larger Samples)#

Dataset Overview:#

4,000+ samples post-MSA generation
Largest sample: 2093.a3m (~235 MB)

❌ Failure on A40 (40 GB GPU)#

srun --input none -p gpu -q gpu \
  --gres=gpu:A40:1 --mem=128G \
  colabfold_batch \
    --num-recycle 12 \
    --templates \
    --pdb-hit-file msasmyc/pdb100_230517.m8 \
    --local-pdb-path ${COLABFOLD_DB}/pdb/divided/ \
    msasmyc/2093.a3m \
    results2093

Error:#

2025-04-16 09:48:32.161022: E ...cuda_driver.cc:628] failed to get PTX kernel "fusion_413" from module: CUDA_ERROR_OUT_OF_MEMORY: out of memory
...
Could not predict 2093. Not Enough GPU memory? INTERNAL: Could not find the corresponding function

🛑 Failed: A40 GPU cannot handle this large MSA due to VRAM limits.

✅ Success on A100 (80 GB GPU)#

srun --input none -p gpu -q gpu \
  --gres=gpu:1,gmem:50G --mem=128G \
  colabfold_batch \
    --num-recycle 12 \
    --templates \
    --pdb-hit-file msasmyc/pdb100_230517.m8 \
    --local-pdb-path ${COLABFOLD_DB}/pdb/divided/ \
    msasmyc/2093.a3m \
    results80G-2093

Output:#

2025-04-16 12:24:41,378 reranking models by 'plddt' metric
2025-04-16 12:24:41,378 rank_001_alphafold2_ptm_model_3_seed_000 pLDDT=78.5 pTM=0.482
...
2025-04-16 12:25:17,233 Done

VRAM Usage:#

Model 1: ~56 GB
Model 5 (last): ~71 GB
All 5 models completed successfully.

✅ Conclusion: Large MSAs (>200 MB) require 80 GB GPUs (A100).

🔍 Additional Test: `103.a3m` (Second Largest)#

❌ Failure on L40S (likely due to outdated CUDA driver)#

srun: job 21269959 has been allocated resources
...
2025-04-17 00:12:23.656181: E ...cuda_driver.cc:628] failed to get PTX kernel "concatenate" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
...
Execution of replica 0 failed: INTERNAL: Could not find the corresponding function

⚠️ Likely caused by incompatible CUDA driver version (old ptxas, unsupported compute capability 8.9).

✅ Success on A40#

... runs successfully ...
2025-04-16 21:15:27,852 reranking models by 'plddt' metric
2025-04-16 21:15:27,853 rank_001_alphafold2_ptm_model_3_seed_000 pLDDT=82.7 pTM=0.509
...
2025-04-16 21:15:41,488 Done

✅ A40 handles 103.a3m fine. L40S may need CUDA driver update.

⚙️ Multi-Instance GPU Usage#

Running two ColabFold jobs simultaneously on the same GPU:

# Job 1
srun -p common -q fast --mem=128G -c 8 \
  colabfold_search ... msas

# Job 2
srun --input none -p gpu -q gpu --gres=gpu:1 --mem=128G \
  colabfold_batch ... results

Result:#

✅ Jobs run concurrently
⚠️ Folding slows down by ~2x
Not recommended for performance-critical workflows

💡 Best Practice: Avoid overlapping GPU jobs unless necessary.

📊 Summary & Recommendations#

Task	Required GPU	VRAM	Notes
Small MSA (<100 MB)	A40	4–9 GB	Works well
Large MSA (>200 MB)	A100 (80 GB)	56–71 GB	A40 fails
Multiple jobs on same GPU	A100	≥80 GB	Slower; avoid
L40S usage	❌	—	Requires CUDA update

Hardware Procurement Recommendations:#

Primary: A100 80 GB GPUs for large proteome analysis.
Secondary: A40 for smaller/medium-sized MSAs.
Avoid: L40S unless CUDA driver is updated.
Memory: Ensure ≥128 GB RAM per node for CPU+GPU workflows.

📌 Final Notes#

Use --input none in srun to prevent hanging during kalign calls.
Monitor VRAM with nvtop — reportseff underreports.
Always test large MSAs on target hardware before full-scale runs.

✉️ Contact: For further testing or support, reach out to the bioinformatics team.

ColabFold Memory Consumption Analysis for Proteome Structure Prediction#

🔧 Software Versions#

🧪 Step 1: CPU-Based Search (MSA Generation)#

Command:#

Key Observations:#

🚀 Step 2: GPU-Based Folding (Structure Prediction)#

Command:#

Performance on A40 / A100:#

🧫 Case Study: Mycobacterium Proteome (Larger Samples)#

Dataset Overview:#

❌ Failure on A40 (40 GB GPU)#

Error:#

✅ Success on A100 (80 GB GPU)#

Output:#

VRAM Usage:#

🔍 Additional Test: 103.a3m (Second Largest)#

❌ Failure on L40S (likely due to outdated CUDA driver)#

✅ Success on A40#

⚙️ Multi-Instance GPU Usage#

Result:#

📊 Summary & Recommendations#

Hardware Procurement Recommendations:#

📌 Final Notes#

🔍 Additional Test: `103.a3m` (Second Largest)#