Original: Swyx · 05/02/2026
Summary
*AI News for 2/3/2026-2/4/2026. We checked 12 subreddits, 544 Twitters and 24 Discords (254 channels, and 10187 messages) for you. Estimated reading time saved (at 200wpm): 795 minutes. AINews’ website lets youKey Insights
“It is our policy to give the title story to AI companies that cross into decacorn status, to celebrate their rarity.” — Discussing the significance of AI companies achieving decacorn valuation.
“VS Code shipped a major update positioning itself as home for coding agents.” — Highlighting the integration of AI agents into VS Code.
“GitHub announced you can use Claude and OpenAI Codex agents within GitHub/VS Code via Copilot Pro+.” — Detailing the expansion of GitHub Copilot to include multiple AI agents.
Topics
Full Article
Published: 2026-02-05
Source: https://www.latent.space/p/ainews-elevenlabs-500m-series-d-at
AI News for 2/3/2026-2/4/2026. We checked 12 subreddits, 544 Twitters and 24 Discords (254 channels, and 10187 messages) for you. Estimated reading time saved (at 200wpm): 795 minutes. AINews’ website lets you search all past issues. As a reminder, AINews is now a section of Latent Space. You can opt in/out of email frequencies!It is our policy to give the title story to AI companies that cross into decacorn status, to celebrate their rarity and look back at their growth, but it seems that it is less rare these days… today not only did Sequoia, a16z and ICONIQ lead the Eleven@11 round (WSJ), but it was promptly upstaged by Cerebras which, after their 750MW OpenAI deal (valued at 1B at 8B. It’s also the 1 year anniversary of Vibe Coding, and Andrej has nominated [Agentic Engineering](http://1 year anniversary of vibe coding) as the new meta of the year, even as METR anoints GPT 5.2 High as the new 6.6 hour human task model, beating Opus 4.5, and sama announces 1m MAU of Codex.
- RPL locomotion: a unified policy for robust perceptive locomotion across terrains, multi-direction, and payload disturbances—trained in sim and validated long-horizon in real world (Yuanhang__Zhang).
- DreamZero (NVIDIA): Jim Fan describes “World Action Models” built on a world-model backbone enabling zero-shot open-world prompting for new verbs/nouns/environments, emphasizing diversity-over-repetition data recipes and cross-embodiment transfer via pixels; claims open-source release and demos (DrJimFan, DrJimFan).
/r/LocalLlama + /r/localLLM Recap
1. Qwen3-Coder-Next Model Release
- danielhanchen discusses the release of dynamic Unsloth GGUFs for Qwen3-Coder-Next, highlighting upcoming releases of Fp8-Dynamic and MXFP4 MoE GGUFs. These formats are designed to optimize model performance and efficiency, particularly in local environments. A guide is also provided for using Claude Code / Codex locally with Qwen3-Coder-Next, which could be beneficial for developers looking to integrate these models into their workflows.
- Ok_Knowledge_8259 raises skepticism about the claim that a 3 billion activated parameter model can match the quality of larger models like Sonnet 4.5. This comment reflects a common concern in the AI community about the trade-off between model size and performance, suggesting that further empirical validation is needed to substantiate such claims.
- Septerium notes that while the original Qwen3 Next performed well in benchmarks, the user experience was lacking. This highlights a critical issue in AI model deployment where high benchmark scores do not always translate to practical usability, indicating a need for improvements in user interface and interaction design.
- A user inquired about the model’s performance, questioning if it truly reaches ‘sonnet 4.5 level’ and whether it includes ‘agentic mode’, or if the model is simply optimized for specific tests. This suggests a curiosity about the model’s real-world applicability versus benchmark performance.
- Another user shared a quick performance test using LM Studio, reporting a processing speed of ‘6 tokens/sec’ on a setup with an RTX 4070 and 14700k CPU with 80GB DDR4 3200 RAM. They also noted a comparison with ‘llama.cpp’ achieving ‘21.1 tokens/sec’, indicating a significant difference in performance metrics between the two setups.
- A technical question was raised about the feasibility of running the model with ‘64GB of RAM’ and no VRAM, highlighting concerns about hardware requirements and accessibility for users without high-end GPUs.
2. ACE-Step 1.5 Audio Model Launch
- A user compared the performance of ACE-Step-1.5 with Suno V5 using the same prompt, highlighting that while ACE-Step-1.5 is impressive for an open-source model, it does not yet match the quality of Suno V5. The user specifically noted that the cover feature of ACE-Step-1.5 is currently not very useful, indicating room for improvement in this area. They provided audio links for direct comparison: Suno V5 and ACE 1.5.
- Another user pointed out that the demo prompts for ACE-Step-1.5 seem overly detailed, yet the model appears to ignore most of the instructions. This suggests potential issues with the model’s ability to interpret and execute complex prompts accurately, which could be a limitation in its current implementation.
- TheRealMasonMac highlights that ACE-Step 1.5 shows a significant improvement over its predecessor, though it still lags behind Suno v3 in terms of instruction following and coherency. However, the audio quality is noted to be good, and the model is described as creative and different from Suno, suggesting it could serve as a solid foundation for future development.
- Different_Fix_2217 provides examples of audio generated by ACE-Step 1.5, indicating that the model performs well with long, detailed prompts and can handle negative prompts. This suggests a level of flexibility and adaptability in the model’s design, which could be beneficial for users looking to experiment with different input styles.
3. Voxtral-Mini-4B Speech-Transcription Model
- The Voxtral Realtime model is designed for live transcription with configurable latency down to sub-200ms, which is crucial for applications like voice agents and real-time processing. However, it lacks speaker diarization, which is available in the batch transcription model, Voxtral Mini Transcribe V2. This feature is particularly useful for distinguishing between different speakers in a conversation, but its absence in the open model may limit its utility for some users.
- Mistral has contributed to the open-source community by integrating the realtime processing component into vLLM, enhancing the infrastructure for live transcription applications. Despite this, the model does not include turn detection, a feature present in Moshi’s STT, which requires users to implement alternative methods such as punctuation, timing, or third-party text-based solutions for turn detection.
- Context biasing, a feature that allows the model to prioritize certain words or phrases based on context, is currently only supported through Mistral’s direct API. This feature is not available in the vLLM implementation for either the new Voxtral-Mini-4B-Realtime-2602 model or the previous 3B model, limiting its accessibility for developers using the open-source version.
- A storage engineer emphasizes the importance of a fast NVMe over Fabrics Parallel File System (FS) as a critical requirement for a training build, highlighting that without adequate storage to feed GPUs, there will be significant idle time. They also recommend using Infiniband for compute, noting that RoCEv2 is often preferable for storage. This comment underscores the often-overlooked aspect of shared storage in training workflows.
- A user expresses surprise at the storage write speed being a bottleneck, indicating that this is an unexpected issue for many. This highlights a common misconception in building training clusters, where the focus is often on compute power rather than the supporting infrastructure like storage, which can become a critical pinch point.
- Another user proposes a theoretical solution involving milli-second distributed RAM with automatic hardware mapping of page faults, suggesting that such an innovation could simplify cluster management significantly. This comment reflects on the broader issue of addressing the right problems in system architecture.
Less Technical AI Subreddit Recap
/r/Singularity, /r/Oobabooga, /r/MachineLearning, /r/OpenAI, /r/ClaudeAI, /r/StableDiffusion, /r/ChatGPT, /r/ChatGPTCoding, /r/aivideo, /r/aivideo
1. Anthropic vs OpenAI Ad-Free Debate
- BuildwithVignesh highlights the effectiveness of the Claude Ad Campaign, suggesting that it has successfully captured attention despite the competitive landscape. The campaign’s impact is implied to be significant, although specific metrics or outcomes are not detailed in the comment.
- LimiDrain provides a comparative analysis, stating that ‘more Texans use ChatGPT for free than total people use Claude in the US’. This suggests a significant disparity in user base size between ChatGPT and Claude, indicating ChatGPT’s broader reach and adoption in the market.
- Eyelbee references a past statement by Sam, noting that he found AI ads disturbing a year ago. This comment suggests a potential inconsistency or evolution in Sam’s stance on AI advertising, especially in light of Anthropic’s decision to remain ad-free, which could be seen as a critique of ad-based models.
- ostroia points out that while Claude is ad-free, it has strict limitations on its free tier, making it mostly unusable for anything beyond quick questions. This raises questions about the practicality of boasting about being ad-free when the product requires payment to be truly usable.
- seraphius highlights the potential negative impact of ads on platforms, noting that ads can shift the focus of executives towards ‘advertiser friendliness,’ which can weaken the platform’s integrity. This is compared to the situation on YouTube, where ad-driven decisions have significantly influenced content and platform policies.
- AuspiciousApple highlights the competitive tension between OpenAI and Anthropic, noting that Sam Altman’s detailed response to Anthropic’s ad suggests a deeper concern about competition. This reflects the broader industry dynamics where major AI companies are closely monitoring each other’s moves, indicating a highly competitive landscape.
- owlbehome criticizes OpenAI’s approach to AI control, pointing out the perceived hypocrisy in Sam Altman’s statement about Anthropic’s control over AI. The comment references OpenAI’s own restrictions in version 5.2, suggesting that both companies impose significant limitations on AI usage, which is a common critique in the AI community regarding the balance between safety and usability.
- RentedTuxedo discusses the importance of competition in the AI industry, arguing that more players in the market benefit consumers. The comment criticizes the tribalism among users who show strong allegiance to specific companies, emphasizing that consumer choice should be based on performance rather than brand loyalty. This reflects a broader sentiment that healthy competition drives innovation and better products.
- ClankerCore highlights the technical execution of the AI in the ad, noting the use of a human model with AI overlays. The comment emphasizes the subtle adjustments made to the AI’s behavior, particularly in eye movement, which adds a layer of realism to the portrayal. This suggests a sophisticated blend of human and AI elements to enhance the advertisement’s impact.
- The comment by ClankerCore also critiques the performance of Anthropic’s Claude, pointing out its inefficiency in handling simple arithmetic operations like ‘2+2’. The user mentions that such operations consume a significant portion of the token limit for plus users, indicating potential limitations in Claude’s design or token management system.
- ClankerCore’s analysis suggests that while the marketing execution is impressive, the underlying AI technology, specifically Claude, may not be as efficient or user-friendly for non-coding tasks. This highlights a potential gap between the marketing portrayal and the actual performance of the AI product.
2. Kling 3.0 and Omni 3.0 Launch
- The ability of Kling 3.0 to switch between different camera angles while maintaining subject consistency is a significant technical achievement. This feature is particularly challenging in video models, as it requires advanced understanding of spatial and temporal coherence to ensure that the subject remains believable across different perspectives.
- A notable issue with Kling 3.0 is the audio quality, which some users describe as sounding muffled, akin to being recorded with a barrier over the microphone. This is a common problem in video models, indicating that while visual realism is advancing, audio processing still lags behind and requires further development to match the visual fidelity.
- The visual quality of Kling 3.0 has been praised for its artistic merit, particularly in scenes that evoke a nostalgic, dream-like feel through color grading and highlight transitions. This suggests that the model is not only technically proficient but also capable of producing aesthetically pleasing outputs that resonate on an emotional level, similar to late 90s art house films.
- A user expressed frustration over the lack of clear information distinguishing the differences between the ‘Omni’ and ‘3’ models, highlighting a common issue in tech marketing where specifications and improvements are not clearly communicated. This can lead to confusion among users trying to understand the value proposition of new releases.
3. GPT 5.2 and ARC-AGI Benchmarks
- nivvis highlights a common issue during model training phases, where companies like OpenAI and Anthropic face GPU/TPU limitations. This necessitates reallocating resources from inference to training, which can temporarily degrade performance. This is not unique to OpenAI; Anthropic’s Opus has also been affected, likely in preparation for upcoming releases like DeepSeek v4.
- xirzon suggests that significant performance drops in technical services, such as those experienced with GPT 5.2, are often due to partial or total service outages. This implies that the observed ‘nerfing’ might not be a deliberate downgrade but rather a temporary issue related to service availability.
- ThadeousCheeks notes a similar decline in Google’s performance, particularly in tasks like cleaning up slide decks. This suggests a broader trend of performance issues across major AI services, possibly linked to resource reallocation or other operational challenges.
- The ARC-AGI benchmark, which was introduced less than a year ago, has seen rapid progress with the latest state-of-the-art (SOTA) result reaching 72.9%. This is a significant improvement from the initial release score of 4% and the previous best of 54.2%. The benchmark’s quick evolution highlights the fast-paced advancements in AI capabilities.
- The cost of achieving high performance on the ARC-AGI benchmark is a point of discussion, with current solutions costing around 1 per task while maintaining or improving the performance to over 90%, which would represent a significant efficiency improvement.
- The ARC-AGI benchmark uses an exponential scale on its x-axis, indicating that moving towards the top right of the graph typically involves increasing computational resources to achieve better results. The ideal position is the top left, which would signify high performance with minimal compute, emphasizing efficiency over brute force.
- NoWheel9556 highlights that the update to version 5.2 seems to have been aimed at preventing jailbreaks, which may have inadvertently affected other functionalities. This suggests a trade-off between security measures and user experience, potentially impacting how the model processes certain tasks.
- FilthyCasualTrader points out a specific usability issue in version 5.2, where users must explicitly direct the model to look at certain data, such as ‘attachments in Projects folder or entries in Saved Memories’. This indicates a regression in intuitive data handling, requiring more explicit instructions from users.
- MangoBingshuu mentions a problem with the Gemini pro model, where it tends to ignore instructions after a few prompts. This suggests a potential issue with instruction retention or prompt management, which could affect the model’s reliability in maintaining context over extended interactions.
A summary of Summaries of Summaries by gpt-5.11. Cutting-Edge Models, Coders, and Routers
- Engineers are pushing VRAM optimization hard by selectively offloading FFN layers to CPU via
-otflags (and asking for a “significance chart” to rank layers by importance) while others confirm smooth vLLM inference on an RTX 5080, making Qwen3-Coder-Next a practical workhorse across Unsloth, Hugging Face, and LM Studio setups.
- Users quickly started poking at Max’s behavior, noticing it sometimes claims Claude Sonnet 3.5 is backing responses while actually routing to Grok 4, prompting jokes like “Max = sonnet 5 in disguise” and raising questions about router transparency and evaluation methodology.
- Over on the Moonshot and Unsloth servers, engineers confirmed Kimi K2.5 can run as Kimi for Coding and discussed running it from VPS/datacenter IPs after Kimi itself green‑lit such use in a shared transcript, positioning it as a more permissive alternative to Claude for remote coding agents and OpenClaw‑style setups.
- They are soliciting adversarial attack scenarios around decisions AI must/never make, paying 10,000 prize pool challenge kicking off March 21, 2026 for multimodal (text/audio/vision) jailbreaks.
- The community pitched it as “high‑value technical reasoning” training material, complementary to other open datasets, for models that need long‑horizon, domain‑specific chain‑of‑thought in enterprise‑y systems and finance scenarios rather than generic math puzzles.
- GPU MODE’s #flashinfer channel confirmed the repo now includes all kernels and target shapes so contestants can train/eval model‑written CUDA/Triton code offline, while Modal credits and team‑formation logistics dominated meta‑discussion about running those workloads at scale.
- Parallel threads dissected performance gaps where Helion autotuned kernels only hit 0.66× baseline speedup on AMD GPUs versus torch inductor’s 0.92× for M=N=K=8192, and advised diffing the emitted Triton kernels to see what the AMD team tweaked for their own backend.
- At the same time, GPU MODE highlighted that Andrej Karpathy wired torchao into his nanochat project for FP8 training, via a commit (6079f78…), signalling that lightweight FP8 + activation‑optimized caching are moving from papers into widely‑copied reference code.
- Elsewhere in Unsloth and Hugging Face channels, practitioners compared Accelerate tensor parallelism for multi‑GPU fine‑tuning, discussed quantizing post‑bf16‑finetune models with domain‑specific imatrix statistics, and noted that community quantizers like mradermacher often publish GGUFs automatically once a fine‑tuned model gains traction on Hugging Face.
- Several EU users argued this silent downgrade might violate consumer transparency norms, citing that there is “no legal contract in the EU where the text practically forces the user to accept that the service is not transparent”, and began exploring open‑source or alternative stacks like Kimi, Z.Ai, and Qwen to recreate the old “medium‑effort” research workflow.
- The outage‑driven hiccup also impacted Claude’s API and Cursor users, some of whom had to roll back to Cursor 2.4.27 due to a broken SSH binary in 2.4.28, highlighting how tightly editor workflows and router services now depend on timely, stable frontier‑model releases.
- At the same time, Sam Altman defended ad funding in ChatGPT in a reply captured in his tweet, OpenAI’s own community ranted about GPT 5.2 regressions and Sora 2 glitches, and multiple communities noted that users are increasingly stitching together open‑weight models (DeepSeek/Kimi/Qwen) plus tools like OpenClaw rather than betting on a single closed provider.
- Concurrently, BASI’s #jailbreaking and #redteaming channels traded Gemini and Claude Code jailbreaks like ENI Lime (mirrored at ijailbreakllms.vercel.app and a Reddit thread), debated Anthropic’s activation capping as effectively “lobotomising” harmful behaviors, and discussed Windows rootkit attack surfaces via COM elevation and in‑memory execution.
- Hugging Face’s #i-made-this upped the stakes by showcasing cornerstone-autonomous-agent, an autonomous AI agent published on npm at cornerstone-autonomous-agent that can open real bank accounts via an MCP backend hosted on Replit and a Clawhub skill, triggering a wave of quiet “this is how you get regulators” energy among more security‑minded engineers.
- In stark contrast, Yannick’s #ml-news tracked the Moltbook database breach where Techzine reports that 35,000 emails and 1.5 million API keys were exposed, reinforcing why several communities refuse to trust SaaS tools with credentials and why ZK verification and tighter data‑handling guarantees are becoming more than academic curiosities.
Key Takeaways
Notable Quotes
It is our policy to give the title story to AI companies that cross into decacorn status, to celebrate their rarity.Context: Discussing the significance of AI companies achieving decacorn valuation.
VS Code shipped a major update positioning itself as home for coding agents.Context: Highlighting the integration of AI agents into VS Code.
GitHub announced you can use Claude and OpenAI Codex agents within GitHub/VS Code via Copilot Pro+.Context: Detailing the expansion of GitHub Copilot to include multiple AI agents.
Related Topics
- [[topics/github-copilot]]
- [[topics/agent-native-architecture]]
- [[topics/ai-agents]]
- [[topics/openai-api]]
Related Articles
[AINews] Context Graphs and Agent Traces
Swyx · explanation · 92% similar
[AINews] Qwen Image 2 and Seedance 2
Swyx · explanation · 89% similar
[AINews] Z.ai GLM-5: New SOTA Open Weights LLM
Swyx · reference · 83% similar
Originally published at https://www.latent.space/p/ainews-elevenlabs-500m-series-d-at.