Skip to content

ColabFold Memory Consumption Analysis for Proteome Structure Prediction#

Objective: Assess memory and GPU requirements for bacterial ColabFold workflows to inform hardware procurement for large-scale proteome analysis.
Samples provided by: Najwa T
Parameters provided by: Benjamin B
Environment: GenSoft HPC cluster (Slurm-managed)


๐Ÿ”ง Software Versions#

module load MMseqs2/15-6f452 Kalign/2.04 cuda/11.6 cudnn/11.x-v8.7.0.84 ColabFold/1.5.3
module load blast/2.2.26 dssp/2.2.1 psipred/4.02 gcc/9.2.0 openmpi/4.0.5 hhsuite/3.3.0

๐Ÿงช Step 1: CPU-Based Search (MSA Generation)#

Command:#

srun -p common -q fast --mem=128G -c 8 \
  colabfold_search \
    --threads 8 \
    --db-load-mode 2 \
    --use-templates 1 \
    --db2 pdb100_230517 \
    Proteomes_for_Structure/GCA000196215_Borreliella_bavariensis.fasta \
    ${COLABFOLD_DB} \
    msas

Key Observations:#

  • The proteome is split into ~900 subsamples.
  • Output: ~900 .a3m files and ~900 .m8 files.
  • Some .m8 files are empty โ€” these sequences will not proceed to the folding stage.
  • Performance does not scale well with more than 8โ€“16 CPU cores due to I/O bottlenecks.
  • Recommended: Use 8โ€“16 cores for optimal throughput.

๐Ÿš€ Step 2: GPU-Based Folding (Structure Prediction)#

Command:#

srun --input none -A admin -p gpu -q gpu \
  --gres=gpu:A40:1 --mem=64G \
  colabfold_batch \
    --num-recycle 12 \
    --templates \
    --pdb-hit-file msas/AAU07627.1_pdb100_230517.m8 \
    --local-pdb-path ${COLABFOLD_DB}/pdb/divided/ \
    msas/AAU07627.1.a3m \
    xxresults7627a40

Performance on A40 / A100:#

  • Runtime: ~10 minutes
  • VRAM Usage: 4โ€“9 GB (measured via nvtop)
  • RAM Usage: ~16 GB
  • โš ๏ธ Note: nvtop values differ from reportseff โ€” trust nvtop for real-time GPU memory usage.

โœ… All models successfully completed.


๐Ÿงซ Case Study: Mycobacterium Proteome (Larger Samples)#

Dataset Overview:#

  • 4,000+ samples post-MSA generation
  • Largest sample: 2093.a3m (~235 MB)

โŒ Failure on A40 (40 GB GPU)#

srun --input none -p gpu -q gpu \
  --gres=gpu:A40:1 --mem=128G \
  colabfold_batch \
    --num-recycle 12 \
    --templates \
    --pdb-hit-file msasmyc/pdb100_230517.m8 \
    --local-pdb-path ${COLABFOLD_DB}/pdb/divided/ \
    msasmyc/2093.a3m \
    results2093

Error:#

2025-04-16 09:48:32.161022: E ...cuda_driver.cc:628] failed to get PTX kernel "fusion_413" from module: CUDA_ERROR_OUT_OF_MEMORY: out of memory
...
Could not predict 2093. Not Enough GPU memory? INTERNAL: Could not find the corresponding function

๐Ÿ›‘ Failed: A40 GPU cannot handle this large MSA due to VRAM limits.


โœ… Success on A100 (80 GB GPU)#

srun --input none -p gpu -q gpu \
  --gres=gpu:1,gmem:50G --mem=128G \
  colabfold_batch \
    --num-recycle 12 \
    --templates \
    --pdb-hit-file msasmyc/pdb100_230517.m8 \
    --local-pdb-path ${COLABFOLD_DB}/pdb/divided/ \
    msasmyc/2093.a3m \
    results80G-2093

Output:#

2025-04-16 12:24:41,378 reranking models by 'plddt' metric
2025-04-16 12:24:41,378 rank_001_alphafold2_ptm_model_3_seed_000 pLDDT=78.5 pTM=0.482
...
2025-04-16 12:25:17,233 Done

VRAM Usage:#

  • Model 1: ~56 GB
  • Model 5 (last): ~71 GB
  • All 5 models completed successfully.

โœ… Conclusion: Large MSAs (>200 MB) require 80 GB GPUs (A100).


๐Ÿ” Additional Test: 103.a3m (Second Largest)#

โŒ Failure on L40S (likely due to outdated CUDA driver)#

srun: job 21269959 has been allocated resources
...
2025-04-17 00:12:23.656181: E ...cuda_driver.cc:628] failed to get PTX kernel "concatenate" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
...
Execution of replica 0 failed: INTERNAL: Could not find the corresponding function

โš ๏ธ Likely caused by incompatible CUDA driver version (old ptxas, unsupported compute capability 8.9).

โœ… Success on A40#

... runs successfully ...
2025-04-16 21:15:27,852 reranking models by 'plddt' metric
2025-04-16 21:15:27,853 rank_001_alphafold2_ptm_model_3_seed_000 pLDDT=82.7 pTM=0.509
...
2025-04-16 21:15:41,488 Done

โœ… A40 handles 103.a3m fine. L40S may need CUDA driver update.


โš™๏ธ Multi-Instance GPU Usage#

Running two ColabFold jobs simultaneously on the same GPU:

# Job 1
srun -p common -q fast --mem=128G -c 8 \
  colabfold_search ... msas

# Job 2
srun --input none -p gpu -q gpu --gres=gpu:1 --mem=128G \
  colabfold_batch ... results

Result:#

  • โœ… Jobs run concurrently
  • โš ๏ธ Folding slows down by ~2x
  • Not recommended for performance-critical workflows

๐Ÿ’ก Best Practice: Avoid overlapping GPU jobs unless necessary.


๐Ÿ“Š Summary & Recommendations#

Task Required GPU VRAM Notes
Small MSA (<100 MB) A40 4โ€“9 GB Works well
Large MSA (>200 MB) A100 (80 GB) 56โ€“71 GB A40 fails
Multiple jobs on same GPU A100 โ‰ฅ80 GB Slower; avoid
L40S usage โŒ โ€” Requires CUDA update

Hardware Procurement Recommendations:#

  • Primary: A100 80 GB GPUs for large proteome analysis.
  • Secondary: A40 for smaller/medium-sized MSAs.
  • Avoid: L40S unless CUDA driver is updated.
  • Memory: Ensure โ‰ฅ128 GB RAM per node for CPU+GPU workflows.

๐Ÿ“Œ Final Notes#

  • Use --input none in srun to prevent hanging during kalign calls.
  • Monitor VRAM with nvtop โ€” reportseff underreports.
  • Always test large MSAs on target hardware before full-scale runs.

โœ‰๏ธ Contact: For further testing or support, reach out to the bioinformatics team.