Post-completion side effects for AI agents

Score model output, log token costs, and notify your team after an agent run finishes without blocking the response or losing context.

AI agent functions do expensive, latency-sensitive work: calling models, executing tool loops, generating responses. After the agent finishes, you need to do secondary things that the user should never wait for: score the output quality, log token costs to track spend per customer, send a summary to Slack, update a CRM record.

If you add those as steps at the end of the function, the parent run stays open while secondary work completes. A failure in your analytics step marks the entire run as failed and triggers retries of a run that already did its job. The parent's success status ends up depending on work the user doesn't care about.

If you send events to separate functions, you lose the typed connection to the data the agent just produced.

Deferred functions solve this. Register side effects inline with typed payloads. The agent returns immediately. Each side effect runs as its own function after the parent finalizes, with independent retries and no impact on the parent's success status.

§How this works

Define deferred functions for each post-completion task. Each receives a typed payload containing the AI-specific data it needs.

typescript

01import { createDefer } from "inngest/experimental";
02import { z } from "zod";
03
04const scoreOutput = createDefer(inngest, {
05  id: "score-agent-output",
06  schema: z.object({
07    response: z.string(),
08    model: z.string(),
09    ticketId: z.string(),
10  }),
11}, async ({ event, step }) => {
12  // Use a second model as a judge
13  const evaluation = await step.run("llm-as-judge", async () => {
14    return await openai.chat.completions.create({
15      model: "gpt-4o-mini",
16      messages: [
17        {
18          role: "system",
19          content: "Rate this support response 1-5 for helpfulness, accuracy, and tone. Return JSON: { helpfulness: number, accuracy: number, tone: number }",
20        },
21        { role: "user", content: event.data.response },
22      ],
23    });
24  });
25
26  await step.run("persist-scores", async () => {
27    const scores = JSON.parse(evaluation.choices[0].message.content);
28    await inngest.score({ name: "helpfulness", value: scores.helpfulness / 5, runId: event.data.ticketId });
29    await inngest.score({ name: "accuracy", value: scores.accuracy / 5, runId: event.data.ticketId });
30  });
31});
32
33const trackCosts = createDefer(inngest, {
34  id: "track-ai-costs",
35  schema: z.object({
36    model: z.string(),
37    promptTokens: z.number(),
38    completionTokens: z.number(),
39    customerId: z.string(),
40  }),
41}, async ({ event, step }) => {
42  await step.run("log-usage", async () => {
43    const costPer1k = event.data.model === "gpt-4o" ? 0.005 : 0.00015;
44    const totalTokens = event.data.promptTokens + event.data.completionTokens;
45    const cost = (totalTokens / 1000) * costPer1k;
46
47    await analytics.track("ai.cost.incurred", {
48      model: event.data.model,
49      tokens: totalTokens,
50      cost_usd: cost,
51      customer_id: event.data.customerId,
52    });
53  });
54});
55
56const notifyTeam = createDefer(inngest, {
57  id: "notify-agent-completion",
58  schema: z.object({
59    channel: z.string(),
60    summary: z.string(),
61    ticketId: z.string(),
62    model: z.string(),
63  }),
64}, async ({ event, step }) => {
65  await step.run("post-to-slack", async () => {
66    await slack.chat.postMessage({
67      channel: event.data.channel,
68      text: `Agent resolved ticket ${event.data.ticketId} using ${event.data.model}:\n>${event.data.summary}`,
69    });
70  });
71});

typescript

01serve({
02  client: inngest,
03  functions: [handleTicket, scoreOutput, trackCosts, notifyTeam],
04});

In the agent function, call defer() after the work is done. The agent returns the response to the user. Scoring, cost tracking, and notifications happen in the background.

typescript

01const handleTicket = inngest.createFunction(
02  { id: "handle-support-ticket", triggers: { event: "support/ticket.created" } },
03  async ({ event, step, defer }) => {
04    const response = await step.run("generate-response", async () => {
05      return await openai.chat.completions.create({
06        model: "gpt-4o",
07        messages: [
08          { role: "system", content: "You are a support agent. Be concise and helpful." },
09          { role: "user", content: event.data.content },
10        ],
11      });
12    });
13
14    const reply = response.choices[0].message.content;
15
16    await step.run("send-reply", async () => {
17      await supportPlatform.reply(event.data.ticketId, reply);
18    });
19
20    // Score the response with LLM-as-judge. Runs after the parent finishes.
21    defer("score-quality", {
22      function: scoreOutput,
23      data: {
24        response: reply,
25        model: "gpt-4o",
26        ticketId: event.data.ticketId,
27      },
28    });
29
30    // Track token costs per customer.
31    defer("track-spend", {
32      function: trackCosts,
33      data: {
34        model: "gpt-4o",
35        promptTokens: response.usage.prompt_tokens,
36        completionTokens: response.usage.completion_tokens,
37        customerId: event.data.customerId,
38      },
39    });
40
41    // Notify the team.
42    defer("notify-slack", {
43      function: notifyTeam,
44      data: {
45        channel: "#support-resolved",
46        summary: reply.slice(0, 200),
47        ticketId: event.data.ticketId,
48        model: "gpt-4o",
49      },
50    });
51
52    return { ticketId: event.data.ticketId, status: "resolved" };
53  }
54);

The agent responds to the customer in under a second. Three deferred functions fire after the parent finalizes: one scores the output with a cheaper model, one logs token costs per customer, one posts to Slack. Each has its own retries. If Slack is down, scoring and cost tracking still succeed. If the scoring model is slow, the customer already has their answer.

§Why not just add more steps?

Adding step.run("log-usage", ...) at the end of the function works, and completed steps are memoized so retries skip them. But the parent run stays open while secondary work completes. If the analytics step fails, the entire run is marked as failed even though the agent already delivered the response. The parent's success status ends up reflecting work the user doesn't care about.

Deferred functions keep the parent run clean. It succeeds or fails based on the agent work alone. The side effects run on their own timeline with their own success/failure status.

§Alternative approaches

Send events to trigger separate functions. Works, but you lose the typed schema and the parent/child linking in traces. You serialize data into event payloads manually.
External eval platforms (Braintrust, LangSmith, Arize). Scoring and cost tracking live in a separate system. Tightening the feedback loop between your agent code and its evaluation requires maintaining two platforms.
Fire-and-forget HTTP calls. No retries, no observability, no connection to the run that produced the data.

§Additional resources

← PreviousDeferred cleanup and rollbacks