5 Comments
User's avatar
Peter Mernyei's avatar

For the bio predictions saying "by an LLM (i.e. AF3 predictions or literature search contribution don’t count)", would you count an LLM agent that uses AF3 or other specialized models via tool calls?

Stephen Malina's avatar

Yep, although some judgement here. If it just called AF3 once or something, then no. But if it coordinated a bunch of tools or whatever then yes, absolutely.

Paul B.'s avatar

Cool predictions

> My default workflow for writing code will involve managing >=5 agents in parallel

This feels like more of a prediction about your relationship with AI tools than something about AI progress? I think their independent time horizon would need to very drastically increase before managing five becomes even feasible (eg by default you’re giving them tasks that require each 1+ hour to complete, and also the instructions don’t take more than 5 minutes each to give). However, that’s just the precondition. You’d need to also get used to it before you can juggle this much

Stephen Malina's avatar

Thanks!

> This feels like more of a prediction about your relationship with AI tools than something about AI progress? I think their independent time horizon would need to very drastically increase before managing five becomes even feasible (eg by default you’re giving them tasks that require each 1+ hour to complete, and also the instructions don’t take more than 5 minutes each to give). However, that’s just the precondition. You’d need to also get used to it before you can juggle this much

First off, note that the creator of Claude Code tweeted (https://x.com/bcherny/status/2007179832300581177) yesterday that he uses 5 Claude Code agents in his terminal (and more on the web) in his normal workflow now. I am still not there but people are already doing it. Second, I agree that it's largely a prediction about my workflow. I'd say that's partly intentional but I'd also think of myself as a member of the class of engineers who are early adopters, willing to try things, but also only willing to spend so much time crafting their workflow. I'm not sure how big that group is in practice, but I don't think it's literally just me either. Insofar as that's true, how quickly I get used to it is hopefully somewhat representative.

Paul B.'s avatar

Yeah, I've heard about these alleged wizards that talks about running multiple in parallel. I've been heavily working with Claude Code for the past half year. I'm coding for research, having it spin up experiments, and then I look at the results. I find that if I try doing 3+ things in parallel, I usually end up getting flustered and don't see any improvement compared to just running two while also feeling unpleasant. If they were more autonomous and took longer to get back to me, I could probably juggle one more

Maybe the viability of many Claude Codes just heavily depends on what they're being used for (e.g., different flavors of research/SWE)?