Skip to content

feat(table): chunked dispatcher + workflow cascade + UX polish#4672

Open
TheodoreSpeaks wants to merge 14 commits into
stagingfrom
feat/table-chunked-dispatcher
Open

feat(table): chunked dispatcher + workflow cascade + UX polish#4672
TheodoreSpeaks wants to merge 14 commits into
stagingfrom
feat/table-chunked-dispatcher

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

@TheodoreSpeaks TheodoreSpeaks commented May 20, 2026

Summary

  • chunked dispatcher for workflow-column runs with tableRunDispatches rows + windowed walk via batchTriggerAndWait
  • loop-in-cell cascade under a Redis lock per (tableId, rowId); reactor removed; every cell-enqueue routes through runWorkflowColumn
  • unified trigger.dev + inline paths behind JobQueueBackend.batchEnqueueAndWait
  • new mode: 'new' for auto-fire — skips rows with any prior executions[gid]; SQL pushdown via NOT jsonb_exists_all
  • dep-aware retrigger: editing a column a workflow depends on clears terminal-state downstream groups and cancels + re-runs in-flight ones
  • optimistic UI: cancel distinguishes optimistic vs real claims; create-row stamps eligible groups instantly; unmet-deps cells stay Waiting
  • active-dispatches overlay preserves queued badges across refresh during long Run-all
  • backend-counted "X running" via the same endpoint, kept live by cell SSE deltas
  • sidebar: 'Run after' defaults empty, requires ≥1 when auto-run, anchors to leftmost group column on edit; reorder scrubs deps server-side
  • server returns schema.columns sorted by metadata.columnOrder
  • /api/resume/poll routes through executeResumeJob so paused cells get cell-context restoration in local dev; paused cells render Pending pill and surface View Execution
  • typewriter reveal for SSE-delivered workflow values

Type of Change

  • New feature
  • Bug fix

Testing

Tested manually — CSV import, Run all + Stop + refresh, dep edits re-run downstream workflows, cancel scoped per-row, delete-cell flicker gone, reorder scrubs deps, paused (wait-block) cells resume via cron and complete cleanly. tsc + vitest (198/198) + check:api-validation:strict pass.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

TheodoreSpeaks and others added 13 commits May 16, 2026 23:09
Replaces the all-rows-at-once runWorkflowColumn with a row-window dispatcher
backed by a new table_run_dispatches row. Each user click inserts a dispatch
row and triggers a trigger.dev task that crawls the table 20 rows at a time,
re-enqueueing itself between windows. The HTTP/Mothership entrypoints return
{ dispatchId } immediately instead of holding the request open for minutes
on multi-thousand-row dispatches.

- Per-row cancel stamps cancelledAt; the dispatcher skips cells whose
  cancelledAt > dispatch.requestedAt so a mid-cascade cancel sticks even
  under isManualRun.
- Table-wide cancel marks active dispatches cancelled atomically so the
  dispatcher bails on its next iteration.
- New 'dispatch' SSE event variant plumbed; client ignores for v1.
Run-column with run-mode 'all' wasn't visually flipping rows that already
had data — the cell renderer's "value wins" branch kept showing the prior
output behind the queued/running state. The dispatcher only cleared one
window of rows at a time, so most of the column stayed stale until the
cursor walked to it.

Now:
- Dispatcher's `pending → dispatching` transition runs a single SQL UPDATE
  that wipes targeted `data` output columns and `executions[gid]` across
  every targeted row (mode-aware: 'incomplete' skips fully-filled rows).
- Per-window clear in `dispatcherStep` is gone — rows are pre-cleared,
  the loop only filters cancel tombstones / unmet deps and enqueues.
- Optimistic patch in `useRunColumn` mirrors the bulk clear by nulling
  output values in the cached row, so the UI flips queued/running
  instantly without waiting for the SSE catch-up.
The eager bulk clear for mode: 'incomplete' only skipped rows that were
already fully filled, so two overlapping dispatches could race — dispatch B
would nuke executions[gid] on a row dispatch A had just stamped 'queued',
flickering the cell and potentially confusing the worker.

Skip any row whose targeted group is currently queued/running/pending — an
'incomplete' run shouldn't touch what another dispatch is actively working
on. The per-walk 'in-flight' eligibility skip already handles rows that
flip in-flight between the clear and the cursor reaching them.
Switch the per-window cell fan-out from fire-and-forget tasks.trigger to
tasks.batchTriggerAndWait. The dispatcher is now a single long-lived
trigger.dev task that loops dispatcherStep until the table is exhausted;
trigger.dev CRIU-checkpoints the parent during each wait so we don't pay
compute while cells execute. Queue depth is bounded at WINDOW_SIZE per
dispatch — no more flooding trigger.dev with a million queued runs.

- dispatcher.ts builds payloads via the new shared buildPendingRuns helper
  and calls tasks.batchTriggerAndWait directly. Pre-stamps each cell to
  `queued` (jobId=null) so the UI flips instantly.
- table-run-dispatcher.ts is now a plain while-true loop. No
  RUN_BUDGET_MS, no self-re-enqueue, no cold-start tax per window.

Cancel:
- New cancelCellRunsByTags(tags) paginates runs.list + runs.cancel(id).
- cancelWorkflowGroupRuns fires the tag-sweep alongside the per-jobId
  queue.cancelJob path (preserved for auto-fire cells that have real
  jobIds from single tasks.trigger calls).
- Trigger.dev acks the cancel → batchTriggerAndWait resumes → dispatcher
  observes the dispatch-row cancel flag → exits.

Side fixes:
- getAsyncBackendType returns 'trigger-dev' whenever taskContext.isInsideTask
  is true, regardless of TRIGGER_DEV_ENABLED env. The preview/dev-sim
  worker silently routing cell jobs to DatabaseJobQueue (no poller) is
  fixed without any env config change.
- runWorkflowColumn skips the dispatcher entirely when trigger.dev is
  disabled, running cells inline via DatabaseJobQueue.runInline. HTTP
  response returns dispatchId: null in that mode.
- runColumnContract response schema updated to dispatchId.nullable().
isExecInFlight required a jobId for `pending` status, gating it as "real
backend pending" vs "optimistic flag only." The row-gutter Stop button
keyed on this — so a freshly clicked Play sat as `pending` (no jobId) and
the user couldn't cancel it until the server-side `queued` stamp arrived
via SSE. With the dispatcher pre-batch stamping cells as `queued` (not
`pending`) and no per-cell jobIds under batchTriggerAndWait, the gap was
worse.

Drop the jobId requirement. `pending` now counts as in-flight everywhere.
Cancel writes `cancelled` to the cell exec authoritatively whether or not
a real trigger.dev run exists yet — cancelling an optimistic cell means
"don't run this," which is correct.

Also collapse isOptimisticInFlight into isExecInFlight since the two
helpers are now identical.
Two coupled changes:

1. Cell-task runs the row's full cascade in-process. executeWorkflowGroupCellJob
   acquires a Redis lock per (tableId, rowId) with heartbeat (10s/30s TTL),
   then loops through eligible workflow groups for the row. One cell-task =
   one row's full cascade, not N. Resume worker holds the same lock and
   continues the cascade after a HITL resume. Shared withCascadeLock helper
   in lib/table/cascade-lock.ts.

2. Every cell-enqueue goes through the dispatcher. The implicit
   scheduleRunsForRows reactor in service.ts is removed — 8 callsites
   (insertRow, batchInsertRows, upsertRow, updateRowsByFilter,
   batchUpdateRows, addWorkflowGroup, updateWorkflowGroup) now fire
   runWorkflowColumn with mode: 'incomplete', isManualRun: false. HTTP
   routes that call updateRow directly also fire runWorkflowColumn
   afterwards. scheduleRunsForTable / scheduleRunsForRowIds deleted;
   scheduleRunsForRows demoted to private (only the TRIGGER_DEV_ENABLED=false
   fallback uses it). skipScheduler flag dropped from UpdateRowData /
   BatchUpdateByIdData — no longer meaningful since there's nothing implicit
   to suppress.

Plumbed isManualRun through the dispatch row (new is_manual_run column,
default true) so auto-fire callers honor autoRun: false and don't re-run
completed cells.

Stamp 'pending' (not 'queued', executionId: null) before
batchTriggerAndWait — cell-task writes its own 'queued' on lock acquire.

Small UI polish: row gutter Play button spacing, "Delete workflow" →
"Delete column" label, optimistic-pending cells now show Stop button
(isExecInFlight no longer requires jobId).
…Id cell

The dispatcher's pre-batch `pending` stamp leaves executionId unset so any
cell-task that wins the cascade lock can claim the cell. The cancellation-
guard SQL clause was rejecting these claims because it tested
`executions->gid IS NULL` (whole exec missing) but the pre-stamp leaves
the exec present with executionId=null.

Add a third carve-out: `executions->gid->>'executionId' IS NULL`. Now the
guard reads "write allowed if no exec exists, OR no executionId is set
yet, OR the executionId matches ours."

Symptom: every cell-task's first markWorkflowGroupPickedUp call would log
"SQL guard saw cancelled" and skip, leaving cells stuck at the dispatcher's
pending stamp.
The dispatcher's row-window SELECT is `position > cursor` for exclusive
lower-bound semantics. With cursor initialized to 0, position-0 rows were
never picked up — every dispatch silently skipped the table's first row.

Start cursor at -1 instead. First window's filter `position > -1` matches
position 0; subsequent iterations advance to `lastPosition` which then
correctly excludes already-processed rows.
…el via 'new' mode

Fix 0: new `DispatchMode = 'new'` for auto-fire callsites. Eligibility skips
rows with any prior `executions[gid]` entry — cancelled / errored / completed
cells stay sticky until a manual run. Dispatcher's windowed SELECT pushes
`NOT jsonb_exists_any(...)` to SQL so CSV imports into mostly-attempted
tables don't pay a per-window load+JS-filter. `batchInsertRows` drops its
`rowIds` payload (keeps dispatch scope tiny on big imports).

Fix A/B/D: client optimistic patches now mirror the backend's actual
invariants. `useCreateTableRow.onSuccess` stamps eligible groups via
`optimisticallyScheduleNewlyEligibleGroups` so newly-inserted rows show
`Queued` instantly. `useCancelTableRuns.onMutate` distinguishes optimistic-
only pending (`executionId == null` — strip silently) from real worker
claims (stamp cancelled; SSE will reconcile). Drop `onSettled` invalidation
on `useUpdateTableRow` / `useBatchUpdateTableRows` to kill the
delete-cell flicker.

Fix C: active-dispatches overlay. New `listActiveDispatches` helper,
contract, and `GET /api/table/[tableId]/dispatches` route. `kind:'dispatch'`
SSE events carry scope+cursor+mode on every transition. New
`useActiveDispatches` hook + `resolveCellExec` synthesize a virtual
`pending` exec for cells in an active dispatch's scope ahead of cursor —
queued indicators now survive page refresh during long Run-all dispatches.
`cancelWorkflowGroupRuns` emits `kind:'dispatch',status:'cancelled'`
events so the overlay clears without a refetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`runWorkflowColumn` now always inserts a `table_run_dispatches` row and
drives the dispatcher state machine. The trigger.dev / in-process branch
narrows to a single line: trigger.dev fires `tableRunDispatcherTask` (which
calls the new `runDispatcherToCompletion`), the inline path calls the same
helper fire-and-forget. Deletes `scheduleRunsForRows` and
`stampQueuedOrCancel` — the inline-fallback no longer duplicates window
walking, SSE emission, or cancel.

The dispatcher's window-execute call goes through `JobQueueBackend`:
- New `batchEnqueueAndWait` interface method.
- Trigger.dev impl wraps `tasks.batchTriggerAndWait` behind a
  `taskContext.isInsideTask` guard (clear error if called from outside a
  task).
- Database impl skips `async_jobs` entirely — `Promise.all` over
  `options.runner(payload, signal)` per item, with per-cell AbortControllers
  tracked by `cancelKey` for cancel.

`cancelInlineRun` moves to the interface as `cancelByKey` so
`cancelWorkflowGroupRuns` no longer reaches into the database backend.

Fix `mode: 'new'` SQL filter:
- `${array}::text[]` interpolated as a tuple-cast which Postgres rejected
  ("cannot cast type record to text[]") and every inline dispatch silently
  failed. Switched to `ARRAY[${sql.join(...)}]::text[]`.
- Predicate was `jsonb_exists_any` ("any one targeted group present"),
  which excluded rows that needed at least one group re-run after a
  downstream output was deleted. Switched to `jsonb_exists_all` — per-group
  JS eligibility handles the rest.

Cascade-loop workflowId bug: `runRowCascadeLoop` was not threading the new
group's `workflowId` when advancing across groups. The cell-task ran the
previous group's workflow against the next group's cell, terminating
`completed` with empty `accumulatedData`. Fixed by tracking
`currentWorkflowId` alongside `currentGroupId` / `currentExecutionId`.

Client optimistic-patch tightening:
- `useRunColumn.onMutate` mirrors server eligibility — skip cells with
  unmet deps so unmet rows don't flash Queued and get stuck (no SSE will
  arrive for cells the server skipped).
- `resolveCellExec` overlay synthesizes a virtual `pending` only when
  `areGroupDepsSatisfied` is true. Rows with unmet deps render Waiting,
  matching the dispatcher's actual behavior.

Cleanup from /simplify pass:
- Use `generateShortId(20)` instead of
  `generateId().replace(/-/g, '').slice(0, 20)`.
- Inline `batchEnqueueAndWait` no longer allocates synthetic ids
  (returned `string[]` is unused).
- Flattened the per-cell `tracked` array — only push entries that
  registered controllers, drop the null placeholders.
- Extracted `runDispatcherToCompletion` to share the loop between the
  trigger.dev wrapper and the in-process path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lish

Counter (Fix 1): top-right "X running" + per-row badge are now
backend-bootstrapped via a count on `user_table_rows.executions ->> 'status'
= 'running'` returned alongside active dispatches. SSE `kind: 'cell'` events
compute a delta from `prev → next` status to keep the cache live; cell
events for rows outside the loaded page slice trigger a run-state refetch.
On `pruned` we invalidate the cache. Counts only worker-claimed `running`
cells — optimistic queued/pending no longer inflate the badge, and rows
outside the loaded page slice are counted too.

Sidebar (Fix 2 + 3a): `Run after` no longer ticks every column by default
for new groups (empty list). Save is disabled with an inline error when
auto-run is on with zero deps. `edit-group` mode anchors the left-of-current
filter to the group's leftmost column, so a workflow can only depend on
columns to its left.

Reorder scrub (Fix 3b): `updateTableMetadata` walks the schema's workflow
groups when `columnOrder` is in the patch and drops any dep whose new
position lands at or after the group's leftmost column (uses the existing
`stripGroupDeps` helper). Metadata + schema updates land atomically.

Server returns ordered columns (Fix 3b cont'd): `getTableById` /
`listTables` now sort `schema.columns` by `metadata.columnOrder` before
returning, via a new `applyColumnOrderToSchema` helper. Every consumer
(grid, sidebar, copilot, mothership) gets one ordered list — the sidebar's
leftmost-group-column anchor now points at the right index.

Dep-aware retrigger (Fix 4): editing a value that a downstream workflow
depends on now re-runs that workflow.
- `deriveExecClearsForDataPatch` returns
  `{ executionsPatch, inFlightDownstreamGroups }`. Walks
  `schema.workflowGroups[].dependencies.columns` for every column in the
  patch, clears terminal-state downstream entries, and reports in-flight
  entries.
- `updateRow` calls `cancelWorkflowGroupRuns` + `runWorkflowColumn`
  (`mode: 'incomplete' + isManualRun: true`) for in-flight downstream
  groups, then always fires `runWorkflowColumn({ mode: 'new' })` for the
  cleared groups. Skips both when `executionsPatch` is provided by the
  caller — those are cell-task / cancel writes that would otherwise spawn
  a recursive flood of dispatches per partial-write.
- `cancelWorkflowGroupRuns(tableId, rowId, { groupIds? })` accepts a
  per-group filter so the cancel only touches the affected groups, not
  every in-flight cell on the row.
- `pickNextEligibleGroupForRow` now treats a dispatcher pre-stamp
  (`pending` + `executionId: null`) as claimable — the cascade-loop is the
  real owner. Without this, the dispatcher's pre-stamp of downstream
  groups made the cascade-loop see them as "in-flight" and skip them,
  stranding `pending` cells forever.
- `optimisticallyScheduleNewlyEligibleGroups` extends the cache patch to
  flip dep-touched groups to `pending` regardless of their current status,
  matching the server's cancel-then-rerun behavior.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…der Pending + viewable

Three connected issues with workflows that pause mid-cell (e.g. wait blocks):

1. `/api/resume/poll` (the time-pause auto-resumer) called
   `PauseResumeManager.startResumeExecution` directly, bypassing
   `executeResumeJob` from `background/resume-execution.ts`. The wrapper is
   where the cell-context restoration + cascade-loop continuation lives —
   without it, the resumed workflow ran to completion but never wrote the
   terminal state back to the table cell. Cell stays `pending` forever
   even though the underlying execution finished.

   Fix: dynamically import `executeResumeJob` and use it for the
   `'starting'` branch. Same primitive the trigger.dev `resumeExecutionTask`
   wraps — calling it directly handles both trigger.dev-disabled local dev
   and trigger.dev-enabled prod identically.

2. The cell renderer mapped `status: 'pending'` to `kind: 'queued'` (gray
   "Queued" badge) regardless of whether the run had started. A HITL-paused
   run has `status: 'pending'` + `jobId` prefixed `paused-` + a real
   `executionId` — semantically very different from "queued, hasn't run."
   Now renders as `pending-upstream` (the existing Pending pill) for
   paused-jobId rows.

3. Right-click "View execution" was disabled for `pending` cells (gated to
   `completed | error | running`), so users couldn't open the trace for a
   paused execution. Paused runs have a viewable trace (the executionId is
   real and the log row exists). Both the per-row context menu and the
   action-bar derivation now recognize `pending` + `paused-` jobId as a
   started run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Workflow-output cells now reveal their text character-by-character when an
SSE update lands, while page reloads and virtualization remounts still paint
the value instantly. A first-render guard inside the new useTypewriter hook
distinguishes hydration from live updates with no plumbing through the cell
tree.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented May 20, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
docs Skipped Skipped May 20, 2026 2:14am

Request Review

@cursor
Copy link
Copy Markdown

cursor Bot commented May 20, 2026

PR Summary

High Risk
High risk because it rewires table workflow execution/dispatching (including resume, cancellation, and auto-run) across server, background tasks, SSE events, and client optimistic state, which can affect correctness and concurrency. Failures could strand cells in incorrect statuses, skip runs, or cause duplicate executions under contention/redis issues.

Overview
Implements a chunked table-run dispatcher backed by table_run_dispatches, including bulk-clear behavior, windowed enqueueing via batchEnqueueAndWait, and a new trigger.dev task wrapper (table-run-dispatcher) to run dispatches to completion.

Adds a run-state surface: GET /api/table/[tableId]/dispatches + new TableRunState query cache, plus SSE support for kind: 'dispatch' events and backend-counted running-cell totals; the grid now uses this to preserve queued overlays across refresh and to drive per-row/total running counters.

Reworks workflow cascade + resume semantics: introduces a Redis withCascadeLock per (tableId,rowId); cell execution now runs a per-row cascade loop under the lock, and HITL resume paths route through executeResumeJob to restore cell context and continue the cascade (with contention fallback).

Tightens auto-run/optimistic behavior and validation: updateRow now clears/retriggers downstream groups when dependency columns change (including cancel+rerun for in-flight), auto-fire uses mode: 'new', and the client mirrors server eligibility (deps checks, output clears, optimistic stamps, improved cancel semantics). Workflow sidebar now requires at least one "Run after" dep when auto-run is enabled, and table fetches return schema.columns ordered by metadata.columnOrder (with server-side dep scrubbing on reorder).

Includes UI polish: paused HITL cells render as pending (and are viewable), workflow output updates get a typewriter reveal, and some menu/label tweaks (e.g., delete label).

Reviewed by Cursor Bugbot for commit b2b1a83. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread apps/sim/app/api/table/[tableId]/rows/[rowId]/route.ts Outdated
@TheodoreSpeaks TheodoreSpeaks changed the base branch from main to staging May 20, 2026 01:38
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 20, 2026

Greptile Summary

This PR rewrites the table workflow execution engine: a chunked dispatcher (tableRunDispatches) replaces the old per-row fan-out, a per-row cascade lock (withCascadeLock) serializes group execution, and a new batchEnqueueAndWait backend abstraction unifies the trigger.dev and in-process paths. Several UX fixes ship alongside — active-dispatch overlay, dep-aware retrigger, paused-cell rendering, and sidebar validation.

  • Chunked dispatcher: a table_run_dispatches row tracks cursor position; dispatcherStep walks windows of WINDOW_SIZE rows, stamps cells pending, then blocks with batchEnqueueAndWait until the window completes before advancing.
  • Double dispatch regression: updateRow now fires runWorkflowColumn({mode:'new'}) internally for every non-internal write, but both PATCH route handlers also fire a second runWorkflowColumn({mode:'incomplete'}) after the call returns. The mode:'incomplete' path's bulkClearWorkflowGroupCells clears every targeted group's output columns on rows where any output column is empty — no per-group completed check — so a completed group's output is silently wiped when a sibling group is still empty.

Confidence Score: 3/5

Two PATCH routes each fire a second runWorkflowColumn({mode:'incomplete'}) after updateRow which already dispatches internally; the mode:'incomplete' bulk-clear can permanently delete completed workflow-group outputs on rows where any sibling group is still empty.

The chunked dispatcher, cascade lock, and backend unification are well-architected and the logic in those layers is sound. The regressions are in the routing layer: updateRow took ownership of auto-dispatch but the two PATCH route files were not updated to remove their own calls, leaving a concurrent second dispatch that brings a destructive bulk-clear with it. Additionally, runWorkflowColumn now throws for tables with no workflow groups, generating error-level log noise on every row write for the majority of tables that don't use the workflow feature.

apps/sim/app/api/table/[tableId]/rows/[rowId]/route.ts and apps/sim/app/api/v1/tables/[tableId]/rows/[rowId]/route.ts both need their extra runWorkflowColumn calls removed. apps/sim/lib/table/workflow-columns.ts needs its early throw replaced with a silent return for the no-groups case.

Important Files Changed

Filename Overview
apps/sim/lib/table/dispatcher.ts New chunked dispatcher state machine — solid windowed walk and cancel logic, but bulkClearWorkflowGroupCells for mode:'incomplete' clears ALL targeted groups' output columns on any row where at least one output is empty, which can wipe completed groups' data when sibling groups are incomplete.
apps/sim/lib/table/service.ts updateRow now internally calls runWorkflowColumn({mode:'new'}) for all non-internal writes; combined with the extra runWorkflowColumn({mode:'incomplete'}) calls added in both PATCH routes this produces a double dispatch per user edit.
apps/sim/app/api/table/[tableId]/rows/[rowId]/route.ts New runWorkflowColumn({mode:'incomplete'}) call after updateRow creates a double dispatch; combined with bulkClearWorkflowGroupCells semantics this can wipe completed workflow outputs on a row.
apps/sim/app/api/v1/tables/[tableId]/rows/[rowId]/route.ts Same double-dispatch issue as the internal PATCH route — adds runWorkflowColumn({mode:'incomplete'}) after updateRow which already fires runWorkflowColumn({mode:'new'}) internally.
apps/sim/lib/table/workflow-columns.ts runWorkflowColumn now throws when targetGroups.length === 0, causing spurious error logs for every row operation on tables that have no workflow groups at all.
apps/sim/background/workflow-column-execution.ts Cascade-loop refactor is clean — runRowCascadeLoop correctly advances through eligible groups under a single lock hold; return type contract is well-defined.
apps/sim/background/resume-execution.ts Refactored into focused helpers — logic is sound; lock-contended resume path skips cascade continuation which is an acknowledged edge case.
apps/sim/lib/table/cascade-lock.ts New heartbeat-extended Redis lock helper — compare-and-delete release, heartbeat interval well within TTL, Redis-unavailable fallback is documented.
apps/sim/lib/core/async-jobs/backends/database.ts New batchEnqueueAndWait and cancelByKey methods cleanly bypass async_jobs for in-process execution; compare-and-delete guard on controller cleanup is correct.
apps/sim/hooks/queries/tables.ts New useTableRunState and activeDispatch plumbing look correct; isManualRun is hardcoded to false for SSE-inserted dispatches that aren't in the initial fetch snapshot.

Sequence Diagram

sequenceDiagram
    participant Client
    participant PATCHRoute as PATCH /rows/[rowId]
    participant ServiceUpdateRow as service.updateRow
    participant Dispatcher as dispatcherStep
    participant CellTask as executeWorkflowGroupCellJob
    participant CascadeLock as withCascadeLock

    Client->>PATCHRoute: PATCH row data
    PATCHRoute->>ServiceUpdateRow: updateRow(data, no executionsPatch)
    ServiceUpdateRow->>ServiceUpdateRow: deriveExecClearsForDataPatch
    ServiceUpdateRow-->>ServiceUpdateRow: void runWorkflowColumn mode new fires dispatch 1
    ServiceUpdateRow-->>PATCHRoute: updatedRow
    PATCHRoute-->>PATCHRoute: void runWorkflowColumn mode incomplete fires dispatch 2

    Note over Dispatcher: Dispatch 1 mode new no bulk clear
    Dispatcher->>Dispatcher: buildPendingRuns skips rows with existing exec
    Dispatcher->>Dispatcher: stampQueuedForBatch

    Note over Dispatcher: Dispatch 2 mode incomplete bulk clear runs
    Dispatcher->>Dispatcher: bulkClearWorkflowGroupCells wipes completed outputs

    Dispatcher->>CellTask: batchEnqueueAndWait
    CellTask->>CascadeLock: withCascadeLock(tableId, rowId, executionId)
    CascadeLock-->>CellTask: acquired
    CellTask->>CellTask: runRowCascadeLoop groups A then B then C
    CellTask->>CascadeLock: releaseLock
Loading

Reviews (1): Last reviewed commit: "feat(table): typewriter reveal for SSE-d..." | Re-trigger Greptile

Comment on lines +137 to +147
// Only `null` when a `cancellationGuard` is supplied and the SQL guard
// rejects the write — this route doesn't pass one, so reaching null is a bug.
if (!updatedRow) throw new Error('updateRow returned null without a cancellationGuard')
// Auto-fire any newly-eligible workflow groups (deps just became met).
void runWorkflowColumn({
tableId,
workspaceId: validated.workspaceId,
rowIds: [updatedRow.id],
mode: 'incomplete',
isManualRun: false,
requestId,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Double dispatch + completed-output wipe

updateRow in service.ts now unconditionally fires runWorkflowColumn({mode:'new', rowIds:[rowId]}) at its own exit for all non-internal writes. Adding a second runWorkflowColumn({mode:'incomplete'}) call here produces two concurrent dispatches for every PATCH.

The dangerous path is bulkClearWorkflowGroupCells for mode:'incomplete'. It clears all targeted groups' output columns on any row where at least one output column is empty — there is no per-group completed check. If a row has group A completed (col_a = 'foo') and group B not yet run (col_b = ''), the NOT allFilled filter fires and col_a is wiped. The same issue exists in the v1 PATCH route.

Suggested change
// Only `null` when a `cancellationGuard` is supplied and the SQL guard
// rejects the write — this route doesn't pass one, so reaching null is a bug.
if (!updatedRow) throw new Error('updateRow returned null without a cancellationGuard')
// Auto-fire any newly-eligible workflow groups (deps just became met).
void runWorkflowColumn({
tableId,
workspaceId: validated.workspaceId,
rowIds: [updatedRow.id],
mode: 'incomplete',
isManualRun: false,
requestId,
// Auto-dispatch is handled inside updateRow (mode: 'new') for user edits.

Comment on lines +148 to +155
void runWorkflowColumn({
tableId,
workspaceId: validated.workspaceId,
rowIds: [updatedRow.id],
mode: 'incomplete',
isManualRun: false,
requestId,
}).catch((err) => logger.error(`[${requestId}] auto-dispatch (v1 row update) failed:`, err))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Same double-dispatch issue as the internal PATCH route — updateRow already fires runWorkflowColumn({mode:'new'}) for user edits, so this second mode:'incomplete' call creates a concurrent dispatch that can wipe completed workflow outputs.

Suggested change
void runWorkflowColumn({
tableId,
workspaceId: validated.workspaceId,
rowIds: [updatedRow.id],
mode: 'incomplete',
isManualRun: false,
requestId,
}).catch((err) => logger.error(`[${requestId}] auto-dispatch (v1 row update) failed:`, err))
// Auto-dispatch is handled inside updateRow (mode: 'new') for user edits.

Comment thread apps/sim/lib/table/workflow-columns.ts Outdated
Comment on lines +463 to +465
const allGroups = table.schema.workflowGroups ?? []
const targetGroups = groupIds ? allGroups.filter((g) => groupIds.includes(g.id)) : allGroups
if (targetGroups.length === 0) return { triggered: 0 }
if (targetGroups.length === 0) throw new Error('No matching workflow groups for run')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Throws on tables with no workflow groups, causing error spam

runWorkflowColumn throws 'No matching workflow groups for run' when targetGroups.length === 0. Every caller wraps it in .catch(logger.error), so every row insert/update on a plain table generates an error-level log. The previous code returned { triggered: 0 } silently. The fix is if (targetGroups.length === 0) return { dispatchId: null }.

Suggested change
const allGroups = table.schema.workflowGroups ?? []
const targetGroups = groupIds ? allGroups.filter((g) => groupIds.includes(g.id)) : allGroups
if (targetGroups.length === 0) return { triggered: 0 }
if (targetGroups.length === 0) throw new Error('No matching workflow groups for run')
const allGroups = table.schema.workflowGroups ?? []
const targetGroups = groupIds ? allGroups.filter((g) => groupIds.includes(g.id)) : allGroups
if (targetGroups.length === 0) return { dispatchId: null }

Comment on lines +147 to +152
const applyDispatch = (event: Extract<TableEvent, { kind: 'dispatch' }>): void => {
const { dispatchId, status, scope, cursor, mode } = event
queryClient.setQueryData<TableRunState>(tableKeys.activeDispatches(tableId), (prev) => {
if (!prev) return prev
const list = prev.dispatches
// Terminal states drop the dispatch from the overlay; client renders
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 isManualRun hardcoded false for SSE-inserted dispatches

When a dispatch SSE event arrives for a dispatchId not already in the cache, the entry is inserted with isManualRun: false regardless of the actual run type. The preserved-from-initial-fetch path only applies to entries already present (merged[idx] = { ...next, isManualRun: list[idx].isManualRun }), so a late-arriving manual run renders with the wrong flag.

Two P1 issues + one cleanup from the bot reviewers:

1. **Double-dispatch + completed-output wipe.** Both PATCH row routes
   (`app/api/table/[tableId]/rows/[rowId]` and
   `app/api/v1/tables/[tableId]/rows/[rowId]`) were firing a second
   `runWorkflowColumn({ mode: 'incomplete' })` after `updateRow` returns.
   `updateRow` already fires `mode: 'new'` internally for user edits, so
   the second call created a concurrent dispatch. Worse, the
   `mode: 'incomplete'` path's `bulkClearWorkflowGroupCells` wipes ALL
   targeted output columns on any row where any one column is empty —
   meaning sibling-group completed outputs could be erased. Removed both
   route-level calls; auto-dispatch lives entirely in `updateRow`.

2. **`runWorkflowColumn` log-spamming on plain tables.**
   `if (targetGroups.length === 0) throw new Error(...)` fired on every
   row insert/update for tables without any workflow groups (the
   majority). Every caller wraps with `.catch(logger.error)`, so each
   PATCH produced an error-level log. Return `{ dispatchId: null }`
   silently — manual `runWorkflowColumn` callers pass `groupIds`
   explicitly so they can't reach this branch.

3. **`isManualRun` plumbed through dispatch SSE events.** Late-arriving
   `kind: 'dispatch'` events for dispatches not in the initial fetch
   were hardcoding `isManualRun: false`. Added the field to the event
   shape, emit it from `dispatcherStep` (pending → complete, dispatching
   transitions) and `markActiveDispatchesCancelled`, and consume it in
   the SSE handler with a sensible fallback for legacy emits.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit b2b1a83. Configure here.

requestId,
}).catch((err) =>
logger.error(`[${requestId}] auto-dispatch (Mothership update_row) failed:`, err)
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mothership double workflow dispatch

High Severity

After update_row, Mothership still calls runWorkflowColumn with mode: 'incomplete' even though updateRow already auto-dispatches with mode: 'new'. That second run races the first and can run bulkClearWorkflowGroupCells, wiping sibling workflow outputs on the row.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b2b1a83. Configure here.

const applyDispatch = (event: Extract<TableEvent, { kind: 'dispatch' }>): void => {
const { dispatchId, status, scope, cursor, mode, isManualRun } = event
queryClient.setQueryData<TableRunState>(tableKeys.activeDispatches(tableId), (prev) => {
if (!prev) return prev
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SSE dispatch dropped before fetch

Medium Severity

applyDispatch returns early when the activeDispatches query cache is still empty, so the first kind: 'dispatch' SSE events are discarded. The queued overlay and cursor may stay wrong until a full refetch, undermining live Run-all UX.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b2b1a83. Configure here.

// wiped AND downstream dep-changed terminal groups — same rationale as
// `updateRow`. In-flight downstream groups are dropped here (batch
// updates don't run the cancel+restart orchestration — those go through
// single-row `updateRow`).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch skips in-flight rerun

Medium Severity

batchUpdateRows uses deriveExecClearsForDataPatch but ignores inFlightDownstreamGroups, so dependency edits in a batch update never cancel and restart downstream workflows still running. Single-row updateRow does perform that cancel-and-rerun path.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit b2b1a83. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant