Skip to main content
Original: Simon Willison · 31/01/2026

Summary

Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with 8/hour/TPUv3backthen,foratotalcostofapprox.8/hour/TPUv3 back then, for a total cost of approx. 43K. Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with 8/hour/TPUv3backthen,foratotalcostofapprox.8/hour/TPUv3 back then, for a total cost of approx. 43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc. As of the last f

Key Insights

“Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with 8/hour/TPUv3backthen,foratotalcostofapprox.8/hour/TPUv3 back then, for a total cost of approx. 43K.” — Discussing the initial costs of training GPT-2.
“As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node.” — Highlighting the significant cost reduction and efficiency improvements in training models.

Topics


Full Article

# Quoting Andrej Karpathy
Author: Simon Willison
Published: 2026-01-31
Source: https://simonwillison.net/2026/Jan/31/andrej-karpathy/#atom-everything

<blockquote cite=“https://twitter.com/karpathy/status/2017703360393318587”>&lt;p>Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with 8/hour/TPUv3backthen,foratotalcostofapprox.8/hour/TPUv3 back then, for a total cost of approx. 43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc.</p> <p>As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node. This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year.</p></blockquote> <p class=“cite”>— <a href=“https://twitter.com/karpathy/status/2017703360393318587”>Andrej Karpathy</a></p> <p>Tags: <a href=“https://simonwillison.net/tags/andrej-karpathy”>andrej-karpathy&lt;/a>, <a href=“https://simonwillison.net/tags/gpt-2”>gpt-2&lt;/a>, <a href=“https://simonwillison.net/tags/generative-ai”>generative-ai&lt;/a>, <a href=“https://simonwillison.net/tags/ai”>ai&lt;/a>, <a href=“https://simonwillison.net/tags/llms”>llms&lt;/a>, <a href=“https://simonwillison.net/tags/openai”>openai&lt;/a>&lt;/p>

Key Takeaways

Notable Quotes

Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with 8/hour/TPUv3backthen,foratotalcostofapprox.8/hour/TPUv3 back then, for a total cost of approx. 43K.
Context: Discussing the initial costs of training GPT-2.
As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node.
Context: Highlighting the significant cost reduction and efficiency improvements in training models.
  • [[topics/generative-ai]]
  • [[topics/ai]]
  • [[topics/openai]]

[AINews] Anthropic's Agent Autonomy study

Swyx · explanation · 67% similar

[AINews] OpenAI closes $110B raise from Amazon, NVIDIA, SoftBank in largest startup fundraise in history @ $840B post-money

Swyx · reference · 66% similar

We gotta talk about AI as a programming tool for the arts

Simon Willison · explanation · 65% similar