Skip to main content
← Back to thoughtsAI

Stop Paying Your AI Agents to Re-Learn the Same Site

Your AI agents pay full discovery cost on every run, even when the last run already figured the site out. Autobrowse writes a SKILL.md the next one reads.

··5 min read

Stop Paying Your AI Agents to Re-Learn the Same Site

TL;DR: Most AI agents have amnesia. Every run pays full discovery cost. Browserbase's open-source Autobrowse lets one run iterate until it converges, then writes what worked into a SKILL.md the next agent reads. Craigslist scrapes dropped from $0.22 to $0.12. Form-fill from $1.40 to $0.24. The unlock isn't a smarter model. It's a markdown file.


Why Do Your AI Agents Forget Everything They Just Learned?

If you've shipped any agent that touches the live web, you've watched this happen. Run one: load the homepage, find the search box, learn the pagination, finish the task. Run two: same site, same task, same discovery from zero. The model has a context window for one run. It has no notebook between runs.

That's the amnesia tax. It's fine when you run something ten times. It is not fine when you run it ten thousand times for the same site, every week. The fix isn't a bigger brain. It's a place to put what the brain already figured out.

What Did Browserbase Just Open-Source?

Autobrowse, shipped in early May 2026.  [1] First teased by Shrey Pandya on April 22  [2], detailed by Kyle Jeong on May 6.  [3]

The shape is simple. You give an agent a real task on a live site. It runs end to end, studies its own trace, refines, runs again. After three to five rounds, runs stop getting better. That's convergence. Autobrowse takes the converged approach and graduates it into a reusable SKILL.md, plus any helper scripts the workflow needs.  [1]

Pattern borrowed from Andrej Karpathy. I wrote about Karpathy's Autoresearch when it dropped: one editable file, one metric, one time-boxed loop.  [4] Browserbase pointed that loop at websites. The interesting question isn't what the loop does. It's what it leaves behind.

How Does One Agent Leave a Note for the Next?

Not a chat log. Not a transcript. The SKILL.md is a structured how-to written by the agent for whoever shows up next, human or AI.  [1]

Autobrowse's seven-step learning loop, from Kyle Jeong's thread: objective, run, study, strategize, iterate, converge, graduate.

Source: @kylejeong

For Craigslist, that file documents an undocumented sapi.craigslist.org JSON endpoint, the mandatory Referer header, the postal-code parameter that overrides IP geolocation, the category enum, the pagination batch size, and which neighborhood lookups misbehave. None of that is in Craigslist's docs. It was in the agent's network trace.

The next agent doesn't reason about Craigslist. It reads the file and runs the call. So what does that do to the cost line?

What Has Actually Gotten Cheaper?

The published benchmarks make the gap visceral.  [1]

TaskGeneric agentAfter convergenceCut
Craigslist search~$0.22 / 71s~$0.12 / 27s45%
Form-fill (4 iterations)$1.40$0.2483%
Federal grants portal28-page paginated scrapeone undocumented JSON callfrom many to one

The grants row is the one people share, and it deserves a beat. An Autobrowse run watched the network traffic on a federal grants portal and noticed an undocumented JSON endpoint that returned every current grant in a single call.  [1] Twenty-eight pages collapsed to one fetch. Humans had scraped that site for years and missed it.

Autobrowse found an undocumented JSON endpoint on a federal grants portal that humans had missed for years, collapsing a 28-page scrape into one call.

Humans don't iterate against the same surface fifty times. We use a site once, write the scraper, move on. The agent's loop runs the same task over and over and watches its own trace until something cleaner falls out. Patience, not intelligence.

If patience is what's helping, the next obvious question writes itself.

Why Doesn't a Bigger Model Fix This?

A bigger model is faster at reasoning, not faster at remembering. The discovery loop is still a discovery loop. Better intelligence applied to a stateless agent just makes the rediscovery faster, not unnecessary.

Frontier context windows hit a million tokens this year, and that doesn't help either. A million-token context is one run's working memory. It evaporates the moment the run ends. The artifact you actually need is something an agent in a different process, on a different day, can read.

So if SKILL.md is the artifact, where does it actually fail?

Where Does Autobrowse Still Break?

Iteration helps when each run produces signal. It does not help when the work is deterministic and the first run already has the answer.

Browserbase's own writeup is honest about this. A 167-row static HTML catalog burned roughly $24 over four iterations and still didn't return all rows.  [1] Two hundred lines of Python with BeautifulSoup would have done it in one pass.

The rule: if the page is static and the schema is fixed, write the parser. If the site is messy, gated, JS-heavy, or undocumented, run the loop. Knowing which is which is the new judgment call. Once you've made that call, the install path is short.

How Do You Try Autobrowse Today?

Open source at github.com/browserbase/skills, shipped via Browserbase's Claude Agent SDK plugin marketplace.  [5] Install:

/plugin marketplace add browserbase/skills

Point the agent at a real task on a real site, let it iterate, and harvest the SKILL.md when it converges. The graduated file lands at ~/.claude/skills/[task-name]/SKILL.md, reusable across runs and across agents.

That's the workflow. The deeper move is what that markdown file means.

Why Is a Markdown File the Real Unlock?

Everyone in 2026 is racing the same direction. Bigger context. More autonomy. Smarter agents. Autobrowse quietly argues the opposite.

The unlock here is the smallest possible artifact. A markdown file written by one agent, read by the next. It survives the death of the run that produced it. It can be inspected, edited, version-controlled, audited. The next agent doesn't need to be smarter; it needs to be literate.

That's the same instinct behind David Deutsch's argument that good explanations are hard to vary. Durable knowledge survives because every part is load-bearing. SKILL.md is that, for a website. The agent that wrote it can be replaced. The file outlives the model.

I've been writing about how AI is replacing forgetting, not thinking for a while. Autobrowse is that move made tangible. Forgetting was the bottleneck. Notes are the fix.


Key takeaways

  • Your AI agents pay a discovery tax on every run when nothing carries between runs.
  • Bigger models don't fix amnesia. They make rediscovery faster, not unnecessary.
  • Autobrowse converts run-time discovery into a durable, human-readable SKILL.md the next agent reads instead of rebuilding.
  • The receipts: Craigslist 45% cheaper per run, form-fill 83% in 4 iterations, federal grants 28 pages collapsed to one call.
  • The pattern is general. Anything you do repeatedly with high discovery cost gets cheaper if the first run leaves notes.
  • The smallest possible artifact wins. A markdown file outlives the model that wrote it.

Frequently asked questions

What is Autobrowse?

Autobrowse is an open-source workflow from Browserbase that runs an AI agent on a real website, watches it iterate until the run converges, then graduates the working approach into a reusable SKILL.md file the next agent reads instead of rediscovering the site from scratch.  [1]

Why does every AI agent run start from zero?

Most production agents are stateless across runs. The model has a context window for one run, but no shared notebook between runs. So the second time it visits the same site, it loads the homepage, learns the pagination, and figures out the pattern again. Autobrowse is the first widely shared answer to that gap.  [1]

How much does Autobrowse actually save?

Browserbase's published benchmarks show Craigslist scrapes dropping from about $0.22 per run at 71 seconds to about $0.12 at 27 seconds, and a form-fill task dropping from $1.40 to $0.24 across four iterations. The graduated agent uses the same model. What changed is that it didn't need to figure out the site again.  [1]

When does Autobrowse not work?

On problems that don't reward iteration. Browserbase's own writeup notes a 167-row static HTML catalog where Autobrowse burned roughly $24 over four iterations and still didn't return all rows. Two hundred lines of Python with BeautifulSoup would have done it in one pass. Loops only help when each run gives the agent new signal to use.  [1]

How is Autobrowse related to Karpathy's Autoresearch?

Same loop pattern, different domain. Karpathy's Autoresearch lets an AI agent run ML experiments overnight on a single GPU until the metric improves. Autobrowse takes the same try, study, refine, converge structure and points it at websites instead of training scripts. Browserbase explicitly cites Autoresearch as the inspiration.  [1]  [2]


I break down things like this on LinkedIn, X, and Instagram. Usually shorter, sometimes as carousels. If this resonated, you'd probably like those too.


Sources

Footnotes

  1. Autobrowse: The Mythos moment for Browser Agents is here, on Browserbase blog [] [ [2]] [ [3]] [ [4]] [ [5]] [ [6]] [ [7]] [ [8]] [ [9]] [ [10]] [ [11]]

  2. Shrey Pandya announces Autobrowse skill on X [] [ [2]]

  3. Kyle Jeong on the Autobrowse learning loop, on X []

  4. Andrej Karpathy's Autoresearch on GitHub []

  5. browserbase/skills on GitHub []

The Simple Take

One email when something in AI or tech deserves more than a headline.

Not a digest. Not a roundup. The one idea that week, fully worked out.