Tools

What You'll Build

An AI agent that joins your meetings as an actual participant. Not a note-taking bot in the corner. An active member of the call that:

  1. Speaks with a real voice in real time, with sub-second latency.
  2. Shares its screen when it needs to show a document, a dashboard, or a deck.
  3. Reads the room via the transcript stream and decides when to interject.
  4. Remembers everything from prior calls with the same people, the same accounts, the same deals.
  5. Hands off cleanly when a human needs to take over.

If you have ever wished you could clone yourself for the 12 standing meetings on your calendar that don't need you specifically, this is what that looks like in 2026.

Why This Works

Most companies still treat AI as a layer that runs around the meeting. The bot transcribes. The summarizer summarizes. The CRM updater updates. Everyone agrees that's useful. Nobody questions that the meeting itself was still a meeting with humans on both sides.

The new wave breaks that assumption. The agent is in the meeting. Once you accept that, a lot of standing calls stop needing you:

A real human still needs to be on calls where stakes are high, trust matters, or the conversation could go anywhere. But for the 60-70% of standing meetings that are predictable, the agent is the better fit. It is on time, it is consistent, it remembers, and it never has a bad week.

How the Architecture Works

The build has four moving parts. Each one is a separate service that does one thing well.

Layer 1: The Meeting Bridge

The agent has to actually join the call. There are two paths:

For the first version, use Recall.ai or a similar bot platform. Trade the cost for the simplicity.

Layer 2: The Voice Loop

Speech in, speech out. Three pieces:

The trick is the response decision. Most "AI in meetings" demos fall apart because the agent talks too much or interrupts at the wrong time. The fix is a turn-taking layer: a small classifier that watches the transcript stream and only fires the response generator when the room has stopped speaking and the conversation context calls for input from the agent.

Layer 3: The Screen Share

This is the part that separates the demo from the product. When the agent says "let me pull up the latest numbers," it actually shares a screen with the numbers. Two ways to do this:

For meetings where the content is the same every week (the standing client report, the team metrics review), pre-rendered is fine. For meetings where the conversation could go anywhere, on-the-fly is the only option.

Layer 4: Memory

The agent has to remember the last seven calls with this person, what was promised, what was blocked, and what the next steps are. Two stores:

The memory layer is what separates a useful agent from a parlor trick. Without it, every call starts from scratch and the human on the other side immediately notices. With it, the agent picks up exactly where the last call ended, including the things you forgot.

Step-by-Step Setup

Step 1: Pick the Meeting Type to Replace First

Don't try to send the agent into your most important call. Start with one of the standing meetings that drains time without producing much:

Whichever you pick, write down what actually has to happen on that call. Most of the time the answer is a small list: confirm status on 5 items, share next week's plan, take any new requests. That list becomes the agent's script.

Step 2: Set Up the Meeting Bridge

Sign up for Recall.ai. Their starter plan covers a few hundred bot-minutes per month. Get the API key. Test that you can spin up a bot that joins a meeting you scheduled.

The bot's identity matters. Give it a name that signals what it is. Something like "Apollo, Acme Account Manager (AI)" is better than "Notetaker." People accept an AI participant when they know it's an AI. They feel deceived when they realize halfway through that the helpful manager was a bot.

Step 3: Build the Voice Loop

Start with Retell. It handles the speech-to-text, the response generator, and the text-to-speech in one platform. You wire OpenClaw in as the response engine and Retell handles everything else.

Tune two things:

Step 4: Index the Pre-Rendered Content

Whatever the agent will need to share on screen, index it now. For a weekly account check-in, that might be:

Each one is a static URL or a deck slide. The agent's tool layer can fetch any of them by name.

Step 5: Wire the Memory Layer

After every meeting the agent attends, write three things into memory for the next call:

The Fathom transcript is your input source. A small OpenClaw skill reads the transcript after every meeting and updates the per-account memory. The next time the agent joins a call with this account, it pulls those three things into its context.

Step 6: Set Hard Limits

Before you let the agent loose, write down the things it must never do:

These become hardcoded rules in the agent's system prompt and in the tool layer. The agent should not even have access to tools that could violate the rules.

Step 7: Run One Real Call, Watch the Whole Thing

The first live call, you sit on it as a silent observer. Watch what the agent does well. Watch what it does poorly. Watch the human reactions on the other side. Most of the polish work happens in this step. Plan for 2-3 hours of cleanup after the first real call.

Step 8: Scale to the Calendar Slowly

Once the first meeting type runs cleanly for a few weeks, move the agent into the next meeting type. Each meeting type needs its own script, its own pre-rendered content library, and its own hard limits. Don't try to make one generic agent that does everything. Make a small fleet of specialized agents, one per meeting pattern.

Adapting This for Your Business

The pattern shows up wherever someone runs a high volume of similar meetings.

Customer success teams. The agent joins recurring check-ins with low-tier accounts. Surfaces upsell signals, flags churn risk, schedules the human CSM in when something needs real attention.

Agencies and consultancies. The agent joins weekly project status calls with established clients. Reports on the sprint, asks the four standard questions, captures requests. The senior consultant joins only the calls that escalate.

Sales teams. The agent runs early discovery calls where the script is well-known. Qualifies the prospect against the ICP, asks the standard questions, books the next call with a human AE if the lead is hot.

Recruiters. The agent runs initial screening calls with candidates. Walks through the role, asks the screening questions, captures the candidate's background. The human recruiter joins only the candidates who pass screening.

Property managers. The agent runs tenant check-in calls. Asks about maintenance issues, captures requests, walks through any open work orders. The property manager handles the calls that escalate.

Vendors and suppliers. The agent runs the quarterly business review with smaller customer accounts. Walks through usage stats, identifies expansion opportunities, schedules the AE for any account that's a fit.

The common thread is volume. If you have one weekly meeting of this type, the agent isn't worth building. If you have twenty, the agent saves a full FTE.

Gotchas and Tips

What This Replaces

Before this stack:

After this stack:

For an agency or CSM team running 100+ standing calls a month, the math is roughly a $40K-80K/year FTE swap for $400-800/month in stack costs. The remaining human time goes to the calls that actually need a human.

This is the part of the AI shift that most operators are sleeping on. Not "AI helps your team work faster." AI is one of your team members, sitting in the same meetings, doing the same work, while your humans do the work where humans are still better.


Keep Reading