Stop Making AI Agents Think on Every Run

Stop Making AI Agents Think on Every Run

I keep seeing teams build AI agents that think on every run.

Put the agent in a browser. Let it look around. Let it click. Let it decide what to do next. Repeat until the work is done.

And honestly, it works. That is the annoying part.

It is also a very expensive way to do something 500 times.

Every step goes through the same loop:

capture -> send to model -> wait -> decide -> act -> repeat

If each loop takes 3-10 seconds, the latency is not a bug. It is the architecture. The model might be brilliant, the prompts might be tidy, the browser driver might be well behaved, and you are still paying the “think again” tax on every tiny action.

For one form, that is fine. For 500 invoices, it gets silly fast.

At 3 minutes per invoice, you are looking at 25 hours of runtime. The boring script that already knows where the fields are can do the same batch in about an hour, sometimes less. At that point the question is not whether the model is smart enough. The question is why it is being asked to rediscover the same page 500 times.

This is where I think agentic engineering gets interesting.

Not “make everything an agent.” More like:

  1. Let AI explore the messy thing once.
  2. Let it produce a deterministic plan, script, selector map, API call, or workflow.
  3. Test that artifact like normal software.
  4. Run the boring path in production.
  5. Bring the agent back when the world changes.

The agent becomes a scout, not the assembly line.

That distinction matters because browser agents are probabilistic. Same bot, same site, different day, slightly different behavior. That is fascinating when you are researching web agents. It is less charming when someone asks why invoice 417 was posted differently from invoice 416.

Production likes dull things. Logs. Inputs. Outputs. Versioned rules. Re-runs. A test case that fails the same way twice. Audit trails that do not require a seance.

AI is still useful here. Very useful. It can inspect a target system, infer field meaning, generate selectors, propose validation rules, write the first draft of the automation, explain the weird parts, and help maintain it when the UI shifts. That is real leverage.

But once the path is known, stop asking the model to walk it like it has never been there before.

This is why the boring automation toolbox still matters. Waits, assertions, traces, queues, transaction statuses, retry history, and run logs are not glamorous. They are how you make the work inspectable after the demo glow wears off.

There is also a cost trap hiding under the “models are getting cheaper” argument.

Yes, token prices improve. But retries are not free. Slow runs are not free. Support tickets are not free. The time you spend explaining why a probabilistic worker made a surprising choice is definitely not free.

The cleaner design is often more humble:

use AI to understand, then use code to execute.

Fast, predictable, and a bit boring.

Which is usually exactly what you want in production.

This is also close to the shape of something I am building quietly: AI as the builder and maintainer of deterministic automation, not as a tiny intern forced to re-read the same screen all day.

Further reading

  • Anthropic draws a useful line between predefined workflows and agents, and notes that agentic systems often trade latency and cost for task performance: Building effective agents.
  • Anthropic’s computer-use docs are blunt about browser-control limitations, including latency, hallucinated coordinates, incorrect tool choices, and unexpected actions: Computer use.
  • OpenAI’s latency guidance and agent-loop writeups are a useful reminder that repeated generation, context handling, API work, and tool execution add up: Latency optimization and Speeding up agentic workflows.
  • Playwright’s docs show the other side of the tradeoff: deterministic browser automation with auto-waits, assertions, traces, and scalable sharding: Playwright, Trace viewer, and sharding.
  • UiPath queues are a good example of production automation thinking in transactions, statuses, retry history, and monitoring rather than fresh reasoning every time: Queues and transactions.
  • BrowserGym highlights that robust and efficient web agents remain challenging because real-world web environments are complex and current models still have limits: The BrowserGym Ecosystem for Web Agent Research.