Orchestrating LLM Agents: Building Trustworthy Agentic Workflows in Production

Alternate Titles

Chaos to Control: Mastering LLM Agents in ML Pipelines
Agent Reliability Engineering: Building Trustworthy Agentic Workflows
Autonomous but Accountable: Taming LLM Agents at Scale

Abstract

Machine learning teams are increasingly experimenting with AI “agents” powered by large language models (LLMs) to autonomously complete tasks or make decisions. These agentic AI workflows promise dynamic, adaptive behavior, but they also introduce serious challenges for MLOps in production: unpredictable decision paths, novel failure modes (like hallucinated outputs or endless loops), and difficulty in monitoring and debugging. How can we harness the power of LLM-driven agents without losing control of our pipelines?

This talk tackles that question head-on by sharing lessons learned from implementing agent-based workflows at scale and demonstrating a structured approach to keep them reliable. Attendees will learn how to design workflows that delegate work to LLM agents in a controlled manner – defining clear task boundaries, adding guardrails (such as timeouts and retries), and capturing detailed traces of agent decisions. They will also see how an open-source framework (built on Prefect) makes it easier to orchestrate and observe these complex workflows. By the end of this session, you’ll know how to turn the “black box” of an AI agent into a transparent, trustworthy part of your ML operations.

Key Takeaways

Recognize how LLM-based agentic workflows differ from traditional ML pipelines and the unique challenges they pose in production.
Learn best practices for structuring AI agent tasks with clear boundaries and fallback logic to minimize unpredictable behavior.
Discover techniques for observing and debugging AI agents in a workflow, including capturing decision traces and handling errors (e.g. hallucinations or infinite loops) gracefully.
Understand how to leverage MLOps tools – including new open-source frameworks like ControlFlow – to orchestrate and manage autonomous AI agents alongside your existing pipelines.
Balance AI autonomy with human control, gaining insights on giving agents freedom to innovate while still ensuring reliable, trustworthy outcomes.

Questions

What safeguards and monitoring would you put in place before trusting an LLM agent to make decisions in a critical pipeline?
How can we ensure reproducibility and reliability in workflows where AI agents may choose different paths each run?
Where do you draw the line between giving an AI agent autonomy and maintaining human or deterministic control in an ML workflow?
How do current MLOps tools need to evolve to support the unique demands of agent-driven workflows?