Why Small AI Experiments Compound Over Time

Amy Westlake
Apr 7
3 min read

Situation

I didn't start using AI because I had a transformation strategy.

I started because I was tired of friction.

Rewriting weekly executive status updates.

Cleaning up messy meeting notes before sending them out.

Pressure-testing a draft slide before a stakeholder review.

None of those are glamorous tasks. They're recurring. They matter. And they're easy to get slightly wrong.

Status reporting, in particular, always felt heavier than it should. Every week I would synthesize updates from multiple tracks, decide what actually belonged in the BLUF, balance signal vs noise, and try not to understate risk — or overreact. It wasn't broken. It was just cognitively expensive.

One week, I ran a small experiment.

Not a system. Not a framework. Just: let me see if this helps.

The AI Move

I pasted my raw notes into AI. Not the polished draft. The messy version.

I asked it to tighten the structure, separate signal from supporting detail, flag risks I might be soft-pedaling, and call out logical gaps. Not to write the update. Just to pressure-test it.

The first week, it mostly cleaned up wording. The second week, I ran the same experiment. The third week, same.

No optimization. No elaborate prompt engineering. Just repetition on the same recurring task: weekly status.

The Shift

At first, the improvement was mechanical. Cleaner paragraphs. Stronger structure. Less rework before sending.

But something else started happening.

AI would consistently highlight when my "update" was actually an activity list. It would surface when I buried a risk in paragraph three. It would question whether my headline actually reflected the real outcome at risk.

It wasn't always right. But it was consistently asking sharper structural questions than I was under time pressure.

Over a few months, the shift became clear:

I stopped using AI to edit status updates. I started using it to pressure-test my judgment.

Before finalizing an update, I'd run the same check:

Is the BLUF truly outcome-focused?
Are risks clearly separated from context?
Is this yellow because something moved — or because I'm uncomfortable?

The tool didn't get smarter. The repetition did.

And my baseline standard rose. I didn't notice until it had.

The Pattern

The pattern I'm seeing is this:

Small AI experiments compound when they're attached to repeated work.

Not one-off experiments. Not novelty use. Not scattered prompts across random tasks.

Repetition creates longitudinal signal. When you run the same type of AI assist against the same category of work — every week — you start to see where your structure drifts, where you normalize inefficiency, where you consistently underweight risk, where you over-explain.

That visibility compounds.

In my case, AI became less of a writing assistant and more of an operating layer for status reporting. It enforced separation between outcome and activity, signal and noise, risk and context.

That structural clarity now shows up even when I don't use it.

That's the compounding.

The leverage isn't speed. It's calibration. Small experiments compound because they create ongoing feedback loops inside real work.

The Implication

If you want to test this, don't start with a grand AI adoption plan.

Start with one repeated task. Something you do weekly, something with real stakes, something that requires judgment. Run the same AI check against it for 30 days — not

to automate it, not to perfect it, but to observe what patterns surface.

You're not optimizing output. You're building longitudinal signal about how you think and operate.

What I’m Testing Next

I'm now experimenting with being more deliberate about where AI sits in my workflow.

Status reporting gets a structural pass. Decision memos get a risk-visibility pass. Program kickoff decks get a logic-flow pass.

I'm watching for where the feedback meaningfully sharpens judgment — and where it adds noise.

The goal isn't to use AI everywhere. It's to attach it to repeated work in ways that improve over time.

That's the operating shift I didn't expect when I ran a 10-minute experiment on a weekly status update.