GPU VRAM for LLMs
Discussions center on GPU memory (VRAM) requirements and suitable hardware like RTX 3090, 4090, A100 for running large language models locally, including inference, training, multi-GPU setups, and consumer vs enterprise options.
Activity Over Time
Top Contributors
Keywords
Sample Comments
Can 3090 GPUs share their memory with one another to fit such a large model? Or is the enterprise grade hardware required?
Not sure what you mean or new to llms, but two RTX 3090 will work for this, and even lower-end cards will (RTX3060) once it's GGUF'd
True but try to find a 96GB GPU.
Out of curiosity, would an A100 80GB work for this ?
GPU VRAM is the bottleneck currently, check out r/localLlama for benchmarks and calculators for what models can fit into what cards approximately
Training or inference? How's training performance compared to 8GB NVIDIA, if you have one?
might be relevant if your model size is large - two gtx 1080's would give you twice the gpu RAM
Any suggestions on what GPU to use to train large models?
I have 2x 3090 do you know if it's feasible to use that 48GB total for running this?
probably the memory requirements mean that you do need (multiple) V100s though