LLM Inference Costs

The cluster focuses on discussions about the costs of running and using large language models, including API pricing from OpenAI and other providers, comparisons between models like GPT-3, GPT-4, and cheaper alternatives, and debates on self-hosting versus cloud inference expenses.

➡️ Stable 1.8x AI & Machine Learning

3,140

Comments

Years Active

Top Authors

#2744

Topic ID

Activity Over Time

2016

2017

2018

2019

2020

2021

2022

2023

750

2024

706

2025

1,398

2026

106

Top Contributors

simonw (54) minimaxir (42) jsnell (20) ben_w (16) dragonwriter (14)

Keywords

e.g LLM M1 AWS H100 TPU Fireworks.ai openai.com GPU AI cost inference gpt models openai model costs tokens cheaper expensive

Sample Comments

antman • Nov 12, 2021 • View on HN

Also is the gpt3 an important part of the cost? Gave you tried if e.g. gpt2 was enough for your use case?

shagie • Jan 14, 2023 • View on HN

Likely yes. You could even switch to a less intensive model for doing that (e.g. Curie). The Curie model is 1/10th as costly as Davinci to run. Running 8k tokens through Davinci is $0.16 while Curie would only be $0.016 - and that's likely showing up in back end compute and should be considered if someone was building their own chat bot on top of gpt3.

sacred_numbers • Sep 12, 2023 • View on HN

Based on my research, GPT-3.5 is likely significantly smaller than 70B parameters, so it would make sense that it's cheaper to run. My guess is that OpenAI significantly overtrained GPT-3.5 to get as small a model as possible to optimize for inference. Also, Nvidia chips are way more efficient at inference than M1 Max. OpenAI also has the advantage of batching API calls which leads to better hardware utilization. I don't have definitive proof that they're not dumping, but econo

dtquad • Aug 17, 2024 • View on HN

Self-hosting LLMs is expensive at scale. It's cheaper to use VC subsidized model inference like the OpenAI APIs.

jerpint • Apr 29, 2023 • View on HN

Cost to use openAI is also pretty low compared to hosting models

joenot443 • Mar 30, 2023 • View on HN

Wow! Is it really that cheap? GPT4 is much more expensive, I imagine?

andai • Aug 15, 2023 • View on HN

Cheaper than GPT-3? Can you give a comparison of the costs?

judahpaul16 • May 7, 2024 • View on HN

Basically nothing especially if you use cheaper models and set the max token limit in `settings.json`. GPT-4 is $30.00 / 1M tokens for input and $60.00 / 1M tokens for output. For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens. See https://openai.com/api/pricing and <a h

yosito • Jan 18, 2024 • View on HN

How much is it costing you to run the LLM? Is it OpenAI?

altdataseller • Sep 18, 2024 • View on HN

What % of your reveneue goes towards LLM costs?