LLM Inference Costs

The cluster focuses on discussions about the costs of running and using large language models, including API pricing from OpenAI and other providers, comparisons between models like GPT-3, GPT-4, and cheaper alternatives, and debates on self-hosting versus cloud inference expenses.

➡️ Stable 1.8x AI & Machine Learning
3,140
Comments
11
Years Active
5
Top Authors
#2744
Topic ID

Activity Over Time

2016
2
2017
7
2018
10
2019
15
2020
29
2021
24
2022
95
2023
750
2024
706
2025
1,398
2026
106

Keywords

e.g LLM M1 AWS H100 TPU Fireworks.ai openai.com GPU AI cost inference gpt models openai model costs tokens cheaper expensive

Sample Comments

antman Nov 12, 2021 View on HN

Also is the gpt3 an important part of the cost? Gave you tried if e.g. gpt2 was enough for your use case?

shagie Jan 14, 2023 View on HN

Likely yes. You could even switch to a less intensive model for doing that (e.g. Curie). The Curie model is 1/10th as costly as Davinci to run. Running 8k tokens through Davinci is $0.16 while Curie would only be $0.016 - and that's likely showing up in back end compute and should be considered if someone was building their own chat bot on top of gpt3.

sacred_numbers Sep 12, 2023 View on HN

Based on my research, GPT-3.5 is likely significantly smaller than 70B parameters, so it would make sense that it's cheaper to run. My guess is that OpenAI significantly overtrained GPT-3.5 to get as small a model as possible to optimize for inference. Also, Nvidia chips are way more efficient at inference than M1 Max. OpenAI also has the advantage of batching API calls which leads to better hardware utilization. I don't have definitive proof that they're not dumping, but econo

dtquad Aug 17, 2024 View on HN

Self-hosting LLMs is expensive at scale. It's cheaper to use VC subsidized model inference like the OpenAI APIs.

jerpint Apr 29, 2023 View on HN

Cost to use openAI is also pretty low compared to hosting models

joenot443 Mar 30, 2023 View on HN

Wow! Is it really that cheap? GPT4 is much more expensive, I imagine?

andai Aug 15, 2023 View on HN

Cheaper than GPT-3? Can you give a comparison of the costs?

judahpaul16 May 7, 2024 View on HN

Basically nothing especially if you use cheaper models and set the max token limit in `settings.json`. GPT-4 is $30.00 / 1M tokens for input and $60.00 / 1M tokens for output. For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens. See https://openai.com/api/pricing and <a h

yosito Jan 18, 2024 View on HN

How much is it costing you to run the LLM? Is it OpenAI?

altdataseller Sep 18, 2024 View on HN

What % of your reveneue goes towards LLM costs?