Designing the language AI agents use to earn trust
Built the trust framework AI agents use to earn user confidence at first run — and designed it to scale across the entire Copilot platform.
+38% engagement
+27% completion
+31% capability retention
+19% 7-day return
Solution in production

The FRE framework in production — one interaction model reused across multiple Copilot agents, without per-agent customization.
The problem
AI onboarding fails when it explains instead of proving value
AI agents aren't fixed-feature software. Their outputs are contextual, probabilistic, dependent on user data. You can't communicate that by listing capabilities — you have to demonstrate them. Most teams weren't doing that. They were shipping prompt starters and calling it onboarding.
Users were installing agents and abandoning them after the first session at a rate that didn't match the product's capability. The gap wasn't in the agents — it was in the mental model users formed on first contact.
A second problem was invisible until I mapped it: every agent team was building onboarding independently. Without a shared interaction model, every new agent added another inconsistent pattern to an already fragmented platform.

The research confirmed what the diagram made visible. Across 1,257 sessions, three patterns repeated without exception.
Users infer capability from agent names
Users assume functionality based solely on the title, often leading to mismatched expectations.
Prompt starters fail as learning tools
Users mistake limited examples for the agent’s entire capability set rather than learning tools.
Onboarding is needed to close the understanding gap
Users require structured guidance to bridge the gap between initial confusion and deep engagement.
Prompt starters were the wrong answer — and I said so before the data confirmed it
The team's initial position: build better prompt starters. I rejected it.
Finite examples in an infinite-output system don't communicate capability — they cap it. Every example is an implicit ceiling. I initially assumed users would see past the examples and explore further. They didn't. The examples became the mental model.
Users who received a meaningful output in session one showed dramatically higher retention than any cohort who only read onboarding copy. That ended the prompt starters conversation.
Understand
A simple explanation of what the agent is and what is capable of
Experience
One meaningful action that shows value on the same surface where work happens
Connect
See how it applies through role-based examples tied to their workflow.
Users learn through contextual action, not explanation.
That model became the architectural constraint. The business case followed directly from it.
The business case — made before a single screen was designed.
Designed to accelerate the early stages of the agent lifecycle — helping users understand what the agent is, experience its value, and begin integrating it into their workflow — it’s meant to ignite momentum.
4-step product opportunity model
Designed to accelerate the early stages of the agent lifecycle — helping users understand what the agent is, experience its value, and begin integrating it into their workflow — it’s meant to ignite momentum.
Exploration and decisions
Three approaches. All rejected for the same reason.
I evaluated three existing patterns. Each solved part of the problem. None of them scaled.
Dialog introduction: Appeared before users had context. Forced a decision they weren't equipped to make. Dismissed without reading.
Prompt starters: Already in the product. Easy to ship. The wrong direction — finite examples in an infinite-output system communicate capability poorly. I chose not to iterate on something structurally broken.
Structured tour: Best comprehension in testing. Also the most brittle — hardcoded flows break when agent capabilities expand. Requiring each agent team to build and maintain one was unsustainable.
The constraint that settled it: we weren't designing for one agent. Any solution requiring per-agent customization would fail as the platform grew. I rejected all three and proposed a framework instead.

The framework proposal almost didn't happen…
Product leadership wanted a single improved onboarding flow for the Researcher agent. I made the call to push for a shared pattern instead. One agent's onboarding solves one problem. A framework solves the ecosystem. It took three conversations and a prototype showing the same four steps hold across two agents with different capabilities before the team saw it.
What failed before the framework worked
The first version collapsed at Step 03. Real output delivered — but capability retention only moved for users who were already motivated. The ones who needed the framework most were still bouncing.
Session recordings showed the gap: Connect was surfacing prompts personalized by data signal — recent files, calendar — but not by intent. Users saw what the agent could do. They didn't understand why it mattered for their work.
The fix: lead Connect with the user's goal, not the agent's output. "You've been doing X — here's how this agent makes that faster" landed. "Here are things I can help with" didn't. That reframe added a week of engineering work. It was the right call.
The solution
A four-stage FRE framework: built for how humans actually trust AI
The framework is a cognitive model implemented as an interaction pattern. The four stages mirror how users build trust with AI systems, not how they learn traditional software. Each step depends on the previous one completing its job.
Step 01 — Understand
The agent defines its own purpose before asking users to do anything. Not features. A statement of what problem it solves and for whom. Mental model set at the moment it's most malleable.
Step 02 — Connect
Prompts generated from the user's actual Microsoft 365 context — recent files, meetings, calendar, role. Not generic examples. What it can do for them, now.
Step 03 — Experience
One real output. Not a demo. An actual result from the user's own data. This is where trust is built or lost. Everything before is setup.
Step 04 — Deepen
A second capability that builds on the first output. Expands the mental model progressively. Signals that value compounds with use — the behavioral cue that drives habit.




Result: The framework was embedded into the Copilot interface as reusable components. Agent teams adopted the shared pattern instead of building their own.

The FRE framework live — Researcher agent, Microsoft 365, 2026.
Adoption required more than shipping the framework. It required engineering integration and a deliberate rollout across agent teams.
Implementation Path
From framework to platform-wide adoption.
Impact
A/B tested against simpler onboarding patterns. ~5,000 users. p < 0.05.
+38% Initial engagement
Users engaged instead of dismissing
+27% Completion rate
Users completed the full onboarding flow
+31% Capability retention
Users returned to the capability introduced during FRE
+19% 7-day return
More second-session habits formed
The metric that mattered most
+31% capability retention isn't task completion. It's behavior change. Users came back and used the specific capability the framework introduced — not just completed the flow. Engagement and completion are proxies. Capability retention is the actual goal.
The framework's value wasn't in the four steps. It was in making those four steps reusable across every agent on the platform.
Reflection
The scope was right — but took longer to land than it should have.
I scoped to the first two lifecycle stages for the initial implementation. Narrow scope → cleaner data → broader adoption. What I'd do differently: propose the narrow scope on day one rather than arriving at it through negotiation.
The Connect step is still under-built.
We used recency signals — recent files, calendar. The more powerful version uses behavioral patterns: what users have been trying to accomplish, not just what they've opened. That's the gap between a personalized first-run and a contextually intelligent one.












