Human in the loop by design: creating safe AI agents

AI tools have quickly become part of everyday work: summarising documents, drafting emails, preparing presentations, making sense of large volumes of information. But a new shift is underway: from AI that suggests to AI that acts.

Agents (sometimes referred to agentic AI) are systems that can plan, decide and execute multi-step tasks to achieve defined goals, often coordinating across tools and workflows with minimal human intervention.

That capability is powerful. It’s also exactly why the safest agents aren’t the most autonomous ones, they’re the ones designed with intentional boundaries.

This piece is a practical guide for corporate teams building or deploying AI agents without deep technical expertise.

Who's responsible when an agent makes a mistake?

As human involvement decreases, questions arise about who is responsible when an agent makes a mistake. Is it the developer, the deployer, or the user who allowed the agent to act? From a governance perspective, each AI agent should have a clearly designated business owner responsible for its purpose, approval settings and ongoing operation, regardless of who built, configured or maintains the underlying technology.

And as insurers respond to AI risk, organisations are increasingly expected to map how AI is being used and demonstrate active oversight. That’s why “human in the loop” needs to be built into the design, not added into policy after the fact.

In practice, the safety of an agent depends far less on how intelligent it is and far more on the clarity of its operating rules. Before deploying an agent, teams should be able to answer three fundamental questions.

1. When is the agent allowed to act, and when must it stop or hand back to a human?

This question brings together triggers, stopping rules and handoffs. Teams should be clear about what initiates the agent: manual (a user clicks “run”), scheduled (e.g. a daily scan), or event-based (a new email or document); what conditions require escalation; and when the agent must halt entirely. High risk triggers such as external communications, final decisions, policy thresholds, or irreversible actions should always require a handoff to a human.

Equally important are clear stop rules so the agent does not loop, overreach or continue operating when blocked or uncertain.

2. What authority does the agent have and what is out of scope?

The agent’s scope of authority should be defined as carefully as you would define delegation to a person. Teams should know which systems the agent can access, what actions it can perform, and what limits apply (for example, approval thresholds, data sensitivity, or external recipients).

Permission tiers are a useful way to enforce this in practice:

Tier 1 (read-only): retrieve, summarise, draft
Tier 2 (propose): recommend actions and prepare outputs, but cannot execute
Tier 3 (execute with approval): can act only after an explicit human sign-off
Tier 4 (execute autonomously): reserved for low-risk, internal, reversible tasks with documented approval and continuous monitoring

The broader the permissions, the greater the risk, especially where agents can orchestrate actions across multiple systems.

3. What information does the agent use and retain, and can what happened be evidenced?

This question combines memory, data flow, and auditability. Teams should be clear about what the agent can “remember”, whether that memory is temporary, and what happens to prompts, retrieved documents, and logs once they are processed.

Just as importantly, organisations must be able to reconstruct what occurred: what data was used, what rules or playbooks (a guide or manual outlining processes) were applied, what actions were taken, and who approved them. If you cannot evidence those points, you cannot demonstrate effective oversight to regulators, insurers, or your own board.

Together, these three questions anchor “human in the loop” in design rather than intention. They focus attention on control points, authority and traceability, which are the elements that turn agentic AI from a conceptual risk into a manageable, auditable system.

All agents need guardrails

Employing guardrails is not intended to make agents powerless, it’s to make them predictable. Here are some suggested guardrails:

Guardrail 1: Constrain tools, not just prompts

A well-written instruction won’t matter if the agent can still click “send” or write into a system with broad permissions.

Limit what APIs/tools it can call.
Limit what each tool can do (e.g. “draft email” allowed, “send email” not allowed).
Gate sensitive tools behind approval. This is the practical equivalent of ensuring an agent cannot act outside its pre-approved authority.

Guardrail 2: Use playbooks and policy-as-configuration

For repeatable work (contracts and compliance checks), encode rules into:

checklists,
playbooks,
clause libraries,
decision trees. This reduces “creative drift” and makes outcomes explainable.

Guardrail 3: Build confidence gates

Require the agent to surface uncertainty:

“I found three conflicting clauses”
“I only have partial information”
“This falls outside the playbook” Then route to a human.

Guardrail 4: Design for least-privilege access

If a tool doesn’t integrate with your identity/access controls, people start copying and pasting and that’s when oversight erodes. For agents, least-privilege means:

single sign-on (SSO) where possible,
role-based access,
matter-level permissions,
separation between development/test/production environments.

Guardrail 5: Plan for change (models evolve)

AI tools change: models are updated, safety systems shift, behaviour can change over time. Safe agent design includes:

regression testing (“did the agent behave the same way after the update?”),
versioning (“which model was used?”),
rollback plans.

Real-world use case: a regulatory scanning agent

Let's look at a real example of an AI agent. A regulatory scanning agent is designed to help organisations stay across regulatory change in a controlled and low-risk way. The agent runs on a defined schedule, monitoring pre-approved sources for updates and developments relevant to the organisation.

When changes are detected, it summarises the updates, maps them against internal obligations or frameworks, and flags items that require closer attention. Strong guardrails are built into its design: the agent operates in a read-only capacity, has clear stopping rules, and cannot publish updates or communications externally.

Human oversight is central to the workflow. Any change classified as material is automatically escalated for human review, and a designated owner is responsible for confirming and tracking follow-up actions.

Safe agents are a product decision, not a policy

If you want agents to be safe at scale, you need an evidence trail that answers questions such as who approved what, what data was used, what actions were taken, and why did it did what it did?

You may wish to maintain a log highlighting activities per task such as trigger inputs, sources, steps taken, outputs, human approvals, actions executed, exceptions, and final status.

This level of operational visibility is increasingly critical for boards, risk committees and executive teams as AI governance expectations continue to evolve.

At the end of the day, if an agent’s triggers, permissions, memory, handoffs and logs are designed properly, you don’t have to rely on wishful thinking that people will use it responsibly.

Ultimately, you’ll have an agent that can move fast without overstepping. Because it can’t.

All information on this site is of a general nature only and is not intended to be relied upon as, nor to be a substitute for, specific legal professional advice. No responsibility for the loss occasioned to any person acting on or refraining from action as a result of any material published can be accepted.