ColabFold Memory Consumption Analysis for Proteome Structure Prediction#
Objective: Assess memory and GPU requirements for bacterial ColabFold workflows to inform hardware procurement for large-scale proteome analysis.
Samples provided by: Najwa T
Parameters provided by: Benjamin B
Environment: GenSoft HPC cluster (Slurm-managed)
๐ง Software Versions#
module load MMseqs2/15-6f452 Kalign/2.04 cuda/11.6 cudnn/11.x-v8.7.0.84 ColabFold/1.5.3
module load blast/2.2.26 dssp/2.2.1 psipred/4.02 gcc/9.2.0 openmpi/4.0.5 hhsuite/3.3.0
๐งช Step 1: CPU-Based Search (MSA Generation)#
Command:#
srun -p common -q fast --mem=128G -c 8 \
colabfold_search \
--threads 8 \
--db-load-mode 2 \
--use-templates 1 \
--db2 pdb100_230517 \
Proteomes_for_Structure/GCA000196215_Borreliella_bavariensis.fasta \
${COLABFOLD_DB} \
msas
Key Observations:#
- The proteome is split into ~900 subsamples.
- Output: ~900
.a3m
files and ~900.m8
files. - Some
.m8
files are empty โ these sequences will not proceed to the folding stage. - Performance does not scale well with more than 8โ16 CPU cores due to I/O bottlenecks.
- Recommended: Use 8โ16 cores for optimal throughput.
๐ Step 2: GPU-Based Folding (Structure Prediction)#
Command:#
srun --input none -A admin -p gpu -q gpu \
--gres=gpu:A40:1 --mem=64G \
colabfold_batch \
--num-recycle 12 \
--templates \
--pdb-hit-file msas/AAU07627.1_pdb100_230517.m8 \
--local-pdb-path ${COLABFOLD_DB}/pdb/divided/ \
msas/AAU07627.1.a3m \
xxresults7627a40
Performance on A40 / A100:#
- Runtime: ~10 minutes
- VRAM Usage: 4โ9 GB (measured via
nvtop
) - RAM Usage: ~16 GB
- โ ๏ธ Note:
nvtop
values differ fromreportseff
โ trustnvtop
for real-time GPU memory usage.
โ All models successfully completed.
๐งซ Case Study: Mycobacterium Proteome (Larger Samples)#
Dataset Overview:#
- 4,000+ samples post-MSA generation
- Largest sample:
2093.a3m
(~235 MB)
โ Failure on A40 (40 GB GPU)#
srun --input none -p gpu -q gpu \
--gres=gpu:A40:1 --mem=128G \
colabfold_batch \
--num-recycle 12 \
--templates \
--pdb-hit-file msasmyc/pdb100_230517.m8 \
--local-pdb-path ${COLABFOLD_DB}/pdb/divided/ \
msasmyc/2093.a3m \
results2093
Error:#
2025-04-16 09:48:32.161022: E ...cuda_driver.cc:628] failed to get PTX kernel "fusion_413" from module: CUDA_ERROR_OUT_OF_MEMORY: out of memory
...
Could not predict 2093. Not Enough GPU memory? INTERNAL: Could not find the corresponding function
๐ Failed: A40 GPU cannot handle this large MSA due to VRAM limits.
โ Success on A100 (80 GB GPU)#
srun --input none -p gpu -q gpu \
--gres=gpu:1,gmem:50G --mem=128G \
colabfold_batch \
--num-recycle 12 \
--templates \
--pdb-hit-file msasmyc/pdb100_230517.m8 \
--local-pdb-path ${COLABFOLD_DB}/pdb/divided/ \
msasmyc/2093.a3m \
results80G-2093
Output:#
2025-04-16 12:24:41,378 reranking models by 'plddt' metric
2025-04-16 12:24:41,378 rank_001_alphafold2_ptm_model_3_seed_000 pLDDT=78.5 pTM=0.482
...
2025-04-16 12:25:17,233 Done
VRAM Usage:#
- Model 1: ~56 GB
- Model 5 (last): ~71 GB
- All 5 models completed successfully.
โ Conclusion: Large MSAs (>200 MB) require 80 GB GPUs (A100).
๐ Additional Test: 103.a3m
(Second Largest)#
โ Failure on L40S (likely due to outdated CUDA driver)#
srun: job 21269959 has been allocated resources
...
2025-04-17 00:12:23.656181: E ...cuda_driver.cc:628] failed to get PTX kernel "concatenate" from module: CUDA_ERROR_NOT_FOUND: named symbol not found
...
Execution of replica 0 failed: INTERNAL: Could not find the corresponding function
โ ๏ธ Likely caused by incompatible CUDA driver version (old
ptxas
, unsupported compute capability 8.9).
โ Success on A40#
... runs successfully ...
2025-04-16 21:15:27,852 reranking models by 'plddt' metric
2025-04-16 21:15:27,853 rank_001_alphafold2_ptm_model_3_seed_000 pLDDT=82.7 pTM=0.509
...
2025-04-16 21:15:41,488 Done
โ A40 handles
103.a3m
fine. L40S may need CUDA driver update.
โ๏ธ Multi-Instance GPU Usage#
Running two ColabFold jobs simultaneously on the same GPU:
# Job 1
srun -p common -q fast --mem=128G -c 8 \
colabfold_search ... msas
# Job 2
srun --input none -p gpu -q gpu --gres=gpu:1 --mem=128G \
colabfold_batch ... results
Result:#
- โ Jobs run concurrently
- โ ๏ธ Folding slows down by ~2x
- Not recommended for performance-critical workflows
๐ก Best Practice: Avoid overlapping GPU jobs unless necessary.
๐ Summary & Recommendations#
Task | Required GPU | VRAM | Notes |
---|---|---|---|
Small MSA (<100 MB) | A40 | 4โ9 GB | Works well |
Large MSA (>200 MB) | A100 (80 GB) | 56โ71 GB | A40 fails |
Multiple jobs on same GPU | A100 | โฅ80 GB | Slower; avoid |
L40S usage | โ | โ | Requires CUDA update |
Hardware Procurement Recommendations:#
- Primary: A100 80 GB GPUs for large proteome analysis.
- Secondary: A40 for smaller/medium-sized MSAs.
- Avoid: L40S unless CUDA driver is updated.
- Memory: Ensure โฅ128 GB RAM per node for CPU+GPU workflows.
๐ Final Notes#
- Use
--input none
insrun
to prevent hanging duringkalign
calls. - Monitor VRAM with
nvtop
โreportseff
underreports. - Always test large MSAs on target hardware before full-scale runs.
โ๏ธ Contact: For further testing or support, reach out to the bioinformatics team.