# Runs & debugging

Every call to `/v1/evaluate/{slug}` creates a **run** — a durable record of execution you can inspect in the console or query from the tenant API.

## Run statuses

| Status | Meaning |
|--------|---------|
| **running** | Actively executing nodes |
| **waiting** | Paused at a `wait_for_event` node |
| **completed** | Reached `end` with a response |
| **failed** | Unhandled error or validation failure |
| **timedOut** | Hit the wait deadline without a reply |

## The Runs dashboard

Console → **Runs** opens a live dashboard:

- **KPI tiles** — total runs, success rate, p95 latency, currently-waiting count
- **Stacked time-series** — runs per bucket, broken down by status. **Click any bar** to drill into that bucket
- **Top flows** — busiest 10 flows by volume, sortable by failure count. **Click a flow** to scope the rest of the page
- **Latency chart** — p50 / p95 / p99 across the selected range
- **Filterable table** — search by run id or correlation key, filter by status, paginate cursor-style. Each row shows the targeted version (`v#`), the provenance source (`cli` / `console` / `cli·draft` / `console·draft`) as a badge, and the short git commit when the version was published via CLI. A **`via cli | console | all`** filter pill row sits next to the status chips so you can answer "did the last CI deploy break anything?" in one click. The same filter is available on the API as `GET /v1/runs?source=cli|designer`.

Time range presets: **1h · 24h · 7d · 30d · 90d** plus drilled-in custom ranges. The system picks the bucket size automatically (minute / hour / day) so the chart always renders 24–90 points.

Live presets (1h, 24h) auto-refresh every 30 seconds. Toggle in the top right.

## Inspecting a single run

Click any row to open the detail page:

- **Timeline** — nodes executed in order, with per-step duration and output JSON
- **Output** — final response or pause metadata
- **Per-step detail** — HTTP status codes, expression results, errors
- **Send reply** (waiting runs only) — simulate the resume event without curl

Use this to answer questions like *"Why was this user denied?"* or *"Which API call failed?"*

## Paused runs

When a flow hits **wait_for_event**:

1. Evaluate returns a response including the **run id** and pause info
2. Your UI shows a form or redirect (`uiAction` on the node)
3. When the user completes, call resume:

```bash
curl -X POST "https://api.fetchcatch.com/v1/runs/{runId}/resume" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "event": "user_confirmed",
    "payload": { "confirmed": true }
  }'
```

The `event` string must match the node's `eventName`.

## Correlation keys

Pass `correlationKey` in evaluate requests to tie runs to your own request IDs:

```json
{
  "correlationKey": "order-9821",
  "userId": "u-123"
}
```

The dashboard search box matches on correlation keys, so you can paste an order ID from your app logs and jump straight to the run.

## API — querying runs programmatically

All of the following are available on **both** policies:

- Console JWT (`Authorization: Bearer <jwt>`)
- Tenant API key (`X-FetchCatch-Key: <key>`)

| Method | Path | Purpose |
|---|---|---|
| `GET` | `/v1/runs` | List runs in the workspace. Filters: `flowId`, `status`, `fromUtc`, `toUtc`, `search`, `cursor`, `take` (max 200) |
| `GET` | `/v1/runs/{id}` | Detail + step trace |
| `GET` | `/v1/runs/stats?fromUtc=&toUtc=` | KPIs, time-series, top flows, p50/p95/p99 |
| `POST` | `/v1/runs/{id}/resume` | Resume a waiting run (tenant API) |

Stats response shape:

```json
{
  "fromUtc": "2026-05-24T12:00:00Z",
  "toUtc":   "2026-05-25T12:00:00Z",
  "bucket": "1h",
  "total": 1247,
  "completed": 1203,
  "failed": 41,
  "timedOut": 3,
  "running": 0,
  "waiting": 0,
  "waitingNow": 2,
  "successRate": 0.9648,
  "p50Ms": 124,
  "p95Ms": 280,
  "p99Ms": 612,
  "series": [
    { "bucketStartUtc": "...", "completed": 50, "failed": 1, "timedOut": 0, "waiting": 0, "running": 0, "p50Ms": 120, "p95Ms": 290, "p99Ms": 510 }
  ],
  "topFlows": [
    { "flowId": "...", "flowSlug": "checkout", "flowName": "Checkout", "total": 812, "failed": 4 }
  ],
  "truncated": false
}
```

`truncated: true` means the range exceeded 100k completed runs and percentiles are approximate. Narrow the range to get exact numbers.

## Pagination

The list endpoint returns at most `take` items (default 50, max 200) and a `nextCursor` opaque token. Pass it as `?cursor=...` to get the next page; the cursor encodes `(StartedAtUtc, Id)` to provide stable ordering even when new runs are inserted.

## Common failure modes

| Error | Likely cause |
|-------|--------------|
| Input validation failed | Missing required field in start `inputSchema` |
| HTTP node error | Upstream 4xx/5xx; check auth profile |
| Expression error | Invalid JSONata; test in designer |
| Response validation failed | Decision didn't set required response type fields |
| Event mismatch on resume | Wrong `event` name |

## Retention

Runs are stored in SQL indefinitely today. The dashboard's stats endpoint caps in-memory percentile computation at **100,000 completed runs per request** — beyond that it returns `truncated: true` and you should narrow the range. For high-traffic workspaces this means **plan retention before going to production scale**. A configurable retention worker is on the roadmap; until then, manual deletes from `FlowRuns` are safe (`FlowRunSteps` cascade).

## Related

- [Evaluate API](evaluate-api.md)
- [Wait for event node](flow-nodes.md#wait_for_event)
- [Core concepts — Runs](concepts.md#runs)