Skip to content

Wait for host page-cache flush before Ctrl-D on Linux (#229)#494

Merged
makermelissa merged 9 commits into
circuitpython:mainfrom
makermelissa-piclaw:fix/issue-229-linux-poll
May 13, 2026
Merged

Wait for host page-cache flush before Ctrl-D on Linux (#229)#494
makermelissa merged 9 commits into
circuitpython:mainfrom
makermelissa-piclaw:fix/issue-229-linux-poll

Conversation

@makermelissa-piclaw
Copy link
Copy Markdown
Contributor

@makermelissa-piclaw makermelissa-piclaw commented May 12, 2026

Closes #229

Problem

On Linux, saving code.py via the FS Access API (USB workflow) and then sending a soft-reboot would frequently cause CircuitPython to read a partially-flushed file, raising OSError: [Errno 5] Input/output error.

Root cause: Linux mounts vfat (CIRCUITPY) without sync by default. After the host writes a file, the kernel can hold the data in the page cache for up to dirty_expire_centisecs (default 30s) before flushing the actual data sectors to disk. There is no fsync available through the FS Access API, so the editor must wait until the device confirms it can see the full file before triggering a reload.

Crucially, os.stat() is not a sufficient flush detector: the kernel can update the FAT directory entry (giving the device the correct file size) before flushing the data sectors (so the device cannot yet read the file contents). Empirical testing showed os.stat() returns the correct size ~1s after a write, but open()+read() still returns -1 (OSError) for another ~30s.

Approach

This PR introduces a device-side flush detector that runs before every soft-reboot the editor sends to the device. The FSAPI client records the path, byte length, and xor checksum of the last write. Before softRestart(), the workflow polls the device every 500ms with a small Python snippet that:

  1. Calls os.stat() to read the FAT directory size.
  2. Opens the file and reads all bytes.
  3. Computes an xor checksum of the read bytes.

The wait completes when all three match the host-recorded values, confirming the data sectors are flushed. The poll is wrapped in showBusy() so the user sees the existing Blinka loader instead of a frozen UI. A 40s timeout falls through to the existing 3-retry save loop if something goes wrong.

The wait is gated on:

  • Linux only (isLinux() excludes ChromeOS and Android, which include "Linux" in their UA strings)
  • FSAPI workflow only (BLE/Web workflows write through the device and don't have this race)
  • Pending write tracked (skips the loader entirely when nothing is queued)

So macOS, Windows, ChromeOS, and the BLE/Web workflows are unaffected.

Soft-reboot paths covered

The wait fires before:

  1. The editor's Run button (runCurrentCode()softRestart()).
  2. The editor's Reboot button (restartDevice()softRestart()).
  3. Ctrl-D typed in the terminal panel (via serialTransmitWithFlushGuard() interceptor on the terminal's onData handler).

User-side workarounds (also documented in README)

For users who want to eliminate the wait or the underlying race entirely:

  • udev rule to mount CIRCUITPY with sync,flush (recommended on Linux).
  • supervisor.runtime.autoreload = False in boot.py to suppress the device's own auto-reload-on-filesystem-change.
  • vm.dirty_expire_centisecs sysctl for host-wide flush tuning.
  • ChromeOS users can't apply mount workarounds and should rely on the editor's wait or the boot.py workaround.

What this still does NOT fix

  • CircuitPython's own auto-reload on filesystem change. When the device detects CIRCUITPY changed (via its mass-storage watcher), it triggers its own soft-reload that the editor cannot intercept. The boot.py workaround above addresses this.

Test plan

Tested on Raspberry Pi 5 (kernel 6.12, vfat mounted async) with a Feather RP2040 running CircuitPython 10.2.0, and on macOS:

  • Linux: Save + Run button → no Errno (was reproducing every time before this fix)
  • Linux: Wait completes at ~33s with a visible Blinka loader
  • Linux: No visible delay or loader on subsequent saves with no pending write
  • macOS: Save + Run works with no delay (wait short-circuits on isLinux() === false)
  • Linux: Save + Ctrl-D in terminal → no Errno (please verify)
  • Windows: Save + Run works with no delay
  • ChromeOS: Save + Run works with no delay
  • BLE workflow: Save + Run unaffected
  • Web workflow: Save + Run unaffected

Files

  • js/common/utilities.jsisLinux() helper that excludes ChromeOS/Android
  • js/common/fsapi-file-transfer.js — last-write tracker (path, byteLength, checksum) on the FSAPI client
  • js/workflows/workflow.js_waitForHostFlush() / _waitForHostFlushImpl() gated at both softRestart() call sites, plus serialTransmitWithFlushGuard() for terminal-typed Ctrl-D
  • js/script.js — route terminal onData through the flush guard
  • README.md — Troubleshooting section with udev / boot.py / sysctl workarounds

)

On Linux, writes through the File System Access API land in the kernel
page cache and are flushed to a vfat-mounted CIRCUITPY drive on the
kernel's writeback timer (~30s by default). Sending Ctrl-D before that
flush completes makes CircuitPython try to import a half-written
code.py and fail with OSError: [Errno 5] Input/output error.

The File System Access API does not expose fsync, so we cannot force
the flush from JS. Instead, gate every soft restart on the device's
own view of the filesystem: poll os.stat(path)[6] via REPL until the
size matches the bytes we just wrote, then send Ctrl-D.

- FSAPI client now records {path, byteLength, at} on each writable.close()
  and exposes getLastWrite() / clearLastWrite().
- Workflow gains _waitForHostFlush() which is awaited before every
  softRestart() (run-current and reboot-button paths). It is a no-op
  on non-Linux, non-FSAPI, or when no write is pending.
- The wait is wrapped in showBusy() so the loader is visible during
  the (potentially up-to-35s) wait.
- Caps at 35s and falls through if the kernel never flushes; the
  existing 3-retry save logic recovers from a failed reboot.
- isLinux() added to utilities.js (and exported), with the same
  ChromeOS/Android exclusions used elsewhere.

Refs circuitpython#229
…uitpython#229)

On Linux vfat, the kernel can update the FAT directory entry before
flushing the actual file data sectors. os.stat() alone returns the
correct size before the device can actually read the file, so a poll
that only checks size is not a sufficient flush detector.

Instead, each poll opens the file on the device, reads all bytes, and
computes a small xor checksum. We compare it to a host-computed
checksum recorded at write time. Only when size, readable length, and
checksum all match do we proceed to softRestart.

Tested on Raspberry Pi 5 (kernel 6.12) with a Feather RP2040 running
CircuitPython 10.2.0. Polling typically resolves at ~33s (just inside
the kernel's 30s dirty_expire window); bumped timeout from 35s to 40s
for headroom.
@makermelissa-piclaw makermelissa-piclaw marked this pull request as ready for review May 12, 2026 22:31
…python#229)

The host-flush wait was only wired into the editor's Run and Reboot
button paths. A Ctrl-D typed directly in the terminal panel bypassed
the wait and still raced the kernel page cache flush.

Add serialTransmitWithFlushGuard() on the workflow base class. The
terminal panel routes onData through it; when the user transmits a
Ctrl-D (\x04) and there is a tracked pending FSAPI write, we run the
same _waitForHostFlush() before passing the byte through. The fast
path (no Ctrl-D or no pending write) has no extra overhead.

Also add a Troubleshooting section to README documenting:
  - udev rule to mount CIRCUITPY with sync,flush
  - supervisor.runtime.autoreload = False in boot.py
  - vm.dirty_expire_centisecs sysctl tuning
  - ChromeOS limitation note

Issue circuitpython#229.
The 'Choose a different workflow' back link (issue circuitpython#373) showed a
focus rectangle after a mouse click. Use :focus / :focus-visible to
suppress the outline on pointer activation while preserving a visible
focus ring for keyboard users.
:focus-visible was still drawing the rectangle in some browsers. Match
the pattern used by other anchors in the editor: no outline on any
focus state. The hover underline is sufficient affordance.
README Option A now explicitly notes that mounting CIRCUITPY with
sync,flush makes the flush-detector poll match on its first attempt,
so save/run/reboot/Ctrl-D feel instant rather than waiting up to
~30s for the kernel page cache to flush.
@makermelissa makermelissa requested a review from dhalbert May 12, 2026 23:44
The autoreload=False suggestion was a misdirect: CircuitPython's own
filesystem-change reload is a separate path that the editor's
flush-detector already handles correctly. Removing it leaves the two
workarounds that actually address the root kernel-flush race.
@makermelissa
Copy link
Copy Markdown
Collaborator

This basically fixes the issue by waiting until the file has finished being written.

Copy link
Copy Markdown
Contributor

@dhalbert dhalbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds good. I am surprised about the 30 seconds, but it's true that every editor I use does the equivalent of an fsync.

The user could be encouraged to type sync in a terminal window, which would speed up the waiting process.

The 40s timeout was calibrated against the default Linux
dirty_expire_centisecs of 3000 (=30s), which covers a Pi 5 + SSD setup
comfortably. On hosts running laptop-mode tools (which push the expire
window to 60s+), on slow/contended USB buses, or when writing larger
files, the 40s window could miss and fall through to the save-retry
loop. Bump to 60s for headroom and call out the trade-off in the
Troubleshooting section so users know to apply Options A-C if they hit
the timeout regularly.
@makermelissa makermelissa merged commit 33325e4 into circuitpython:main May 13, 2026
1 check passed
@makermelissa-piclaw makermelissa-piclaw deleted the fix/issue-229-linux-poll branch May 13, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

writing code.py on Linux causes: OSError: [Errno 5] Input/output error

3 participants