Original: Simon Willison · 31/01/2026
Summary
Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with 43K. Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with 43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc. As of the last fKey Insights
“Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with 43K.” — Discussing the initial costs of training GPT-2.
“As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node.” — Highlighting the significant cost reduction and efficiency improvements in training models.
Topics
Full Article
Published: 2026-01-31
Source: https://simonwillison.net/2026/Jan/31/andrej-karpathy/#atom-everything
<blockquote cite=“https://twitter.com/karpathy/status/2017703360393318587”><p>Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with 43K. It achieves 0.256525 CORE score, which is an ensemble metric introduced in the DCLM paper over 22 evaluations like ARC/MMLU/etc.</p> <p>As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node. This is a 600X cost reduction over 7 years, i.e. the cost to train GPT-2 is falling approximately 2.5X every year.</p></blockquote> <p class=“cite”>— <a href=“https://twitter.com/karpathy/status/2017703360393318587”>Andrej Karpathy</a></p> <p>Tags: <a href=“https://simonwillison.net/tags/andrej-karpathy”>andrej-karpathy</a>, <a href=“https://simonwillison.net/tags/gpt-2”>gpt-2</a>, <a href=“https://simonwillison.net/tags/generative-ai”>generative-ai</a>, <a href=“https://simonwillison.net/tags/ai”>ai</a>, <a href=“https://simonwillison.net/tags/llms”>llms</a>, <a href=“https://simonwillison.net/tags/openai”>openai</a></p>
Key Takeaways
Notable Quotes
Originally in 2019, GPT-2 was trained by OpenAI on 32 TPU v3 chips for 168 hours (7 days), with 43K.Context: Discussing the initial costs of training GPT-2.
As of the last few improvements merged into nanochat (many of them originating in modded-nanogpt repo), I can now reach a higher CORE score in 3.04 hours (~$73) on a single 8XH100 node.Context: Highlighting the significant cost reduction and efficiency improvements in training models.
Related Topics
- [[topics/generative-ai]]
- [[topics/ai]]
- [[topics/openai]]
Related Articles
[AINews] Anthropic's Agent Autonomy study
Swyx · explanation · 67% similar
[AINews] OpenAI closes $110B raise from Amazon, NVIDIA, SoftBank in largest startup fundraise in history @ $840B post-money
Swyx · reference · 66% similar
We gotta talk about AI as a programming tool for the arts
Simon Willison · explanation · 65% similar
Originally published at https://simonwillison.net/2026/Jan/31/andrej-karpathy/#atom-everything.