Original: Swyx · 10/03/2026
Summary
NVIDIA’s engineers discuss advancements in AI inference at GTC, focusing on Dynamo’s capabilities and the importance of security in agent operations.Key Insights
“Agents can do three things. They can access your files, they can access the internet, and then now they can write custom code and execute it.” — Nader discusses the capabilities and security concerns of AI agents.
“We are blessed to have a unique relationship with our first ever NVIDIA guests.” — Swyx introduces the guests at the podcast.
“It’s like everyone who sponsors a conference comes, does their booth. They’re like, we are changing the future of ai or something, some generic bullshit.” — Swyx reflects on the typical conference experience and the need for creativity.
Topics
Full Article
Join Kyle, Nader, Vibhu, and swyx live at NVIDIA GTC next week!Now that AIE Europe tix are ~sold out, our attention turns to Miami and Worlds Fair!The definitive AI Accelerator chip company has more than 10xed this AI Summer:And is now a 5,000. Ill do anything. Really? I think so. I need, uh,swyx: my, uh, my borrow from Costco.Uh, but I think the best part is only the agent can book me, you know?Yeah.swyx: Its veryKyle: usually like,swyx: its just like another labor marketplace at Mechanical Turk was this.So definitely I have a weird story with why I did it. So back to your example of just giving agent access to compute, right? Yeah. You guys are GPU Rich at Nvidia. Yeah, I hooked up.Nader: Hes not shy about it.Local GPUs And Scaling InferenceI have, I have a 24 7 agent running, I hooked up to run pot.It doesnt shut down instances. And Im like, Ive tried prompting you, Ive given the instruction. Shut down when youre done. Its like I to keep it warm, Ill need it soon. And its horrible on time estimates too, cause like they realize its like. Yeah, Ill need it in 45 minutes. 45 minutes, Ill shut it down.45 minutes of human time is actually three minute of agent time, so its like Im booting it up, Im waiting, Ill just leave it on all night. And mo moos good at shutting down after something activity. I had it on my local server, like a little dual GPU thing. It just stays on. I have a little space heater at home now, but careful.[01:11:00] So basically, you know, they dont care about the concept of money just burn it. I need it. Its useful.Nader: And another DGX spark will be really nice. Like, I, I think Im looking at it as super useful for agents because Yeah, you buy it once you plug it in and they it can rip. Im gonna make a, Im gonna make an Nvidia ad here.Kyle: Okay. The Blackwell, like RTX 6,000 cards. Pro Pro only, like, I think its $8,000. Slightly cheaper. Yeah. Well, its much, its much cheaper than the data center cards.Vibhu: Yeah.Kyle: And its got 96 gigabytes of u gram. So if you and your, your crew want to go, like, run a local agent for you, you know, you, you in the home.I feel like, hmm. Its got a significant amount of vra m Ive thought about purchasing this and running in my basement, except my neighbors would hate me.Its just a single, like two, three slot. GPU. Its mostly,Kyle: yeah, its A-V-C-I-E.Yeah, itsKyle: UCI u. So GPU, you can go by that. I mean, the big difference against like the RTX, like gaming, GPUs, it, I mean, obviously its like blackball Pro, like its a pro GPU and it has a [01:12:00] lot of E round, which means you can run pretty large models on it.You can stack four of them for the Maxim Q in a system thats a beast.Kyle: Its beefy. You can run, uh, what is that, 96 ger or anything? 96, uh, youre on a loge.Uh, but also they, they are slow. Theyre not, I mean, performance of speed will be somewhat slower compared to API like,Kyle: oh yeah, that, thats true. So again, the big learning economy of scale allows you to do things that allow you to get both speed and throughput.Like you can run. Ill give you an example. Theres an optimization called Wide ep. Im not gonna go into it fully, but like it featured heavily in, in inference Maxim for Deep seek. And theres a, theres a great set of stories from Nvidia and from semi analysis about like why y EP is important, but for like MOE models, its like basically essential and you run it like the A Level app parallelism, the level scale up parallelism used for it is like 32.So it goes beyond that eight barrier. And it like really, really, really is important to have that M mbl, L [01:13:00] 72, GB 200 MD link to serve at scale. And like, its like, I dont remember the, the, you know, cost improvement I think against Hopper, right? Against Hopper. With this MBL L 72 system, youre getting like 35 times cheaper per token for like a lot of the curve.Yeah. Which is crazy.swyx: Yeah.Kyle: And Normalize per GPU obviously because the part of the GP is cost or the code, the GST part of the cost.swyx: One thing Im exploring is the sort of, this year is also the year at the subagent, um, where you have the main agent, but then that also kicks off tools, which are in themselves, agents that have limiteds.Yeah. And sort of context locally, whatever, right? Yeah. Different prompts. So for example, one thing that Ian does is before you kick off a search, they do like a fast context model where you kick off April or you just to search, uh, across the code base plus all that. That is better than indexing. A a lot of the times, not, not all the times, and, uh, you should sell index for some picks, but like the idea that agents should be able to command subagent and probably run [01:14:00] them like maybe close to inference as well.I dont know if thats like architecturally possible or evenKyle: Yeah, were, were thinking about that for dmo. Thats like our big theme for the year,swyx: because like you, like if you can design that into your stuff, then a lot of people, a lot more people will use it. Right now its like just kind of theoretical because.You do pay a lot of like back and forth, uh, coordination costs. Yes.Vibhu: I think itll net speed up though, right? Like even at a basic level, speculative decoding, youre running a small model, youre running two instances, but its not,swyx: that is one example. Yes.Kyle: Yeah. But this is like a little bit like different with like agents.Agents, yeah. This is not spec. I think, I think theres like a summarization of that trend that I like to do or I like to say to my team, its like, this is the year. So there are two things. This is the year system as model, right? Where like instead of having like a single model be a thing, you have a system of models and components that are working together to like emulate the black box model.So when you, when you make an API call to something thats like, like a multi-agent in the background, it still looks like an API called a model. Youre still getting back toswyx: grants, but under the hood.Kyle: Yeah, under the hood. Its like a [01:15:00] billion different models. And thats a lot of complexity, with Dynamo and with other libraries and media were, were looking to help manageNader: that complaint.Yeah. Its funny because we actually, for CES, we just released the model router. Uh, for DGX Spark where you can have a local model thats running on the spark and then also a foundational model and then the model router decides when to send queries to which one. So its no longer this like either or.Its used the best stuff for everything thats available to you. You have a good post-training bottle thats running onswyx: these. There are leads that are also the bread functionality of being able to manage the spark.Kyle: Oh, thatd be cool. Oh yeah,swyx: I did be able feature request. There we go.Long Running Agents And SF ReflectionsKyle: I actually like a question, like I, I like to like extend and flip over.How much longer do you guys think like agents are gonna be running? Because thats one thing Ive been throwing around, like, what happens when, Imean always areKyle: iteven affects the, like back to the prefilled d the decode, right? Like, yeah. Codex is, Id say, compared to cloud code, its much longer at tasks like, yeah, that thing, well, like to run 6, 7, 8 hours.Ill run it overnight.Kyle: Yeah.And Ill, Ill go back and I have like a little crappy logging software I use and theres just times where it wants to, like, Im gonna go deep on [01:16:00] research and itll, I eat up 80,000 tokens go on another go on another, yeah. Just eat through tokens and you know, thats part of it.Like, at the end it does, it does hit a long task. And I think you only see that, that expense. Yeah.Nader: I, yeah, theres insatiable demand for tokens and every improvement that comes kind of just makes our demand even higher. Its kind of funny, right? Like if you have like a teammate and you ask me to do a task and theyre like, should I save some effort and not think too hard about this task?Im like, fuck no.I mean, my favorite was like, you can, you can have four shots, right? Yeah. Like the original codex before the app. You, why do one call, like, give it four attempts? Just, just use all the token to out, right? Try Moreal try, try again. Try more. ItsKyle: like, its like the, the meta index right?Is the thing that tracks like how long models are able to run. I expect that well just see like log linear, if not log super linear growth. We will see before the end of the year an agent that is capable of running for longer than 24 hours with like self consistency the entire time.I, I would also poke at different domains, having different [01:17:00] desires, right?Like at a consumer level. Im getting slightly frustrated at 20 minutes per basic query. Sure. You can optimize, you know, six, eight hour. I dont see myself shooting off many one week agents. Right. Someone doing like, okay, GPU kernel research or medical or biological, like, you know, in, in those domains Sure.Shoot off a lot. That take a, so like I think it will be somewhat domain specific cause you also really need to turn that in. Right.Kyle: Its funny one, those was doing your taxes. Right. Like, thats tax. Yeah, thats, yeah. Okay. Yeah.Nader: Get it right. I wonder if like this major school say sort of like, uh, speculative decoding is like your agent figuring out what you might be prompting it the next day at night and like pre fetching.swyx: Yeah, you can dothat.Nader: Yeah. Really? Branch, branch prediction.swyx: Oh, well no, that, well, thats, thats too, thats too low level, but yes. Sorry. Yeah, yeah, yeah. One question I gotta get, so like, uh, we actually did record a part with the, the beat folks. Uh, with Sarah right here, their chart is the human equivalent work, uh, hours of work rather than how long it has themselves are, are being [01:18:00] autonomous.And that, thats a huge difference, right? Like human work, five hours agent work, 30 minutes, like its actually 30 minutes not, uh, yeah. Firearms, right? Like, so like that, that, that chart that you see is them estimating what the human equivalent replacement is. Um, I think the, I think actually Enro release a more recent chart.That showed cloud code autonomy from their production traffic numbers, and that was 20 to 45 minutes. Thats roughly where we are. So yeah. Yeah, thats the sort of realistic thing. I mean, I, I do think like theres experimental setups we can just like, Ralph with and like just prompt it to keep going, uh, when it stops.And obviously you can, that can go arbitrarily long,Nader: I feel likefrom myNader: experience. Yeah. I guess 20 to 40 minutes seems right for when Im using like Codex or cloud code. But then like what, I always try to just, like, if I wanna spin up like a new, theres a net new project, Ill, Ill often start to rep it and like itll end for I believe, yeah, yeah.Like spin up like the, their new, like from the V three agent. Like itll spin up a web browser and like click around and discover new bugs and just keep churning. Um, so I, I think like my longest was like over an hour that, hey, Ive been churningI think before [01:19:00] we see super long running. I think theres gonna be a bit of an efficiency hit.So. Sure you can take an hour and go down paths, but you also want you wanna be more efficient, you wanna be smarter in your reasoning, right? So I think thatll actually go down before we go back up. Like, you dont wanna scale non-optimized systems just for the heck of it. As much as I love saying, use all the tokens, um, you know, they are expensive.Like going from dance to reasoning models, thats an added cost, right? Youre paying for a lot of tokens and it doesnt make sense to just scale stuff thats not optimized. So theres, theres always that little balance.Nader: Yeah.But you know. I think youll see both sides of it.Nader: Yeah. So 2023 was super exciting.I think if you were in SF you were like, okay, uh, I know this is gonna be a huge world changing moment, but it seemed like, you know, no one had known yet. And maybe even before, was it 2022 maybe?swyx: Yeah, yeah. I would say, yeah, like RU had this tweet where like everyone was in SF from like 2021 to 2023. Yeah.Understood what it was like to be late, early.Nader: Totally. Um, yeah, 2021, thats when I made my first open AI account. Yeah, it went, um, it was crazy. [01:20:00] And I remember it was so funny cause at the time SF had not been doing well. So pretty much what it felt like was the concentration of founders in the city had ro had risen because, um, where my neighbors were used to doing a bunch of stuff, those people had all left.So the only people that were still in the city were people that really wanted to build It was cheap tech. It was, yeah. It was also way cheaper. I feel really bad anyone, uh, who is trying to get rent now, but there was, uh, cell was they had a huge office.swyx: So blockchain in Yeah, like took over the, the old Casper building.Nader: Yeah. They had the showroom and they had the, like the, what would, I think it was like the back warehouse. It was, and it was a huge office. Andswyx: its right across an opening Eyes in New Link.Nader: Yeah. It was inthe original arena.swyx: I named the Arena because of it.Nader: Yeah. Yeah. And so it was really exciting because like vo flow I think uh, I forgot the Minify.Yeah. Minify, uh, brev was there. You guys were there. I remember. That was actually, it was there that you bought the AI engineer domain.swyx: Yeah. I didnt know what I was gonna do in ai. I, I wanna do something,Nader: but it was kind of this, it was a really fun moment where we were kind of all in this solo space and it, um, I dont know.It was, [01:21:00] it was a really cool community, especially being soswyx: early. Yeah. And so it, then you got me early cruise access. Oh yeah. So there was a going period of time. They both cruises and Waymos were just free. Yeah, always.If you had, I mean, theyre, theyre so Back Cell is opened again.swyx: Yeah. So Nature Zoo.Zoo is Nature Zoo. Zoo Robot Taxi. Yeah. So Totally. Yeah.Nader: Oh. But yeah. And so its actually really cool that you guys have this studio so close to, uh, cell. Yeah. This rock climbing gin right around the corner. It was like, um, 2000. Oh yeah. Yeah. Its, its an awesome block.swyx: Cool. Yeah. Just, and you bit services partnership.Uh, I do think one, one thing I try to do with the podcast is like bring, like what is, I get to be a San Francisco to the rest of the world and also just like. Maybe give, uh, yeah.Nader: Yeah. My favorite talk was in the city, uh, andswyx: yeah, stick and stream. I know. Its very good.Nader: Yeah. And I guess what its like to be in San Francisco I think is just everyone seems to be super supportive.Uh, sometimes I feel like the city believes in you more than you do. And even, uh, I dont know if you remember, but I remember [01:22:00] posting my first blog post and I had met you on Twitter and you gave me like an hour of your time super randomly, and you kind of coached me through, uh, writing content for developers.And I was trying really hard not to come off salesy or plug myself. And so I kind of stripped all personality out of the blog post. Yeah. And you, you brought that out. Youre like, people dont, its, its okay to talk about what youre doing. Like you dont have to be weird about it. And I remember just that, I think that really helped me kind of figure out what our voice is and not shy away from it.And so always really grateful for you. Hey, you inject your voice into like, everything. Now its actually a huge advantage to be like veryKyle: genuine about what you care about.swyx: Yeah. Yeah. You imagine like summer, some infra in DMU and like, its like, can you gimme feedback on this blog post? And its pretty boring and youre like.Find like, you know, he looks interesting. Ill just do a zoom call and then you meet this guy. Yeah, right. Hes so energetic, so just be right. Theres, but like, I think people are trained to write a certain way in school and Yeah. They never totally see theres like a broader well,andNader: lots un unlearnKyle: writing.Writing is thinking and like everyone thinks differently. So [01:23:00] like, might as well as just like,swyx: yeah. Yeah.Kyle: Write your way.swyx: Cool. Well, thank you for, uh, in indulging with us, uh, really broad breaking discussion, but I love, like, you guys are like, sort of like the sort of young faces on video with so much energy and, but like also lot of technic death and I think, uh, people learn about for this session.So thank you.Nader: This was awesome. Thank you guys. So thank you for everything that youve done in the talk. Yeah, NG the podcast, all the above. And uh, C-O-T-C-I really forward to it. Yeah. Cool. Thanks. Thats awesome. Thank you. Thank you.Related Articles
Every Agent Needs a Box — Aaron Levie, Box
Swyx · explanation · 68% similar
[AINews] NVIDIA GTC: Jensen goes hard on OpenClaw, Vera CPU, and announces $1T sales backlog in 2027
Swyx · explanation · 64% similar
Cursor's Third Era: Cloud Agents
Swyx · explanation · 62% similar
Originally published at https://www.latent.space/p/nvidia-brev-dynamo.