Skip to content

Python: fix: hyperlight skips symlinks when staging sandbox input#5919

Open
eavanvalkenburg wants to merge 3 commits into
microsoft:mainfrom
eavanvalkenburg:hyperlight_symlink_hardening
Open

Python: fix: hyperlight skips symlinks when staging sandbox input#5919
eavanvalkenburg wants to merge 3 commits into
microsoft:mainfrom
eavanvalkenburg:hyperlight_symlink_hardening

Conversation

@eavanvalkenburg
Copy link
Copy Markdown
Member

Motivation and Context

The helpers in agent-framework-hyperlight that populate the sandbox input
tree from workspace_root and file_mounts followed symlinks during
staging. When the source tree may contain symlinks, the resulting input
directory could include entries that did not physically live inside the
configured input source. This PR keeps the staged tree limited to real
entries that actually reside under the configured paths.

Description

Two helpers harden their walking behaviour:

  • _copy_path detects symlinks via Path.is_symlink() before any
    is_dir() / is_file() check, skips non-regular files (sockets, FIFOs,
    devices), and uses shutil.copy2(..., follow_symlinks=False) as defense
    in depth.
  • _path_tree_signature (used for the per-config cache key) now walks
    with a new _iter_real_entries helper that performs an iterdir() +
    is_symlink() check at every directory level, mirroring what
    _copy_path actually stages. It also switches to Path.lstat() so
    size/mtime are read from the link entry itself, never from a target.

Behaviour change: symlinks inside workspace_root or a directory file_mount
are silently skipped rather than dereferenced. Explicitly configured
file_mounts whose host_path is itself a symlink are already resolved to a
real path by the existing _resolve_existing_path call at mount-registration
time, so that flow is unaffected.

Regression tests:

  • Pre-placed file symlink in workspace_root (top level).
  • Pre-placed directory symlink in workspace_root.
  • Nested file symlink inside a real subdirectory.
  • _path_tree_signature ignoring symlinks so the cache key matches the
    staged tree.

Contribution Checklist

  • The code builds clean without any errors or warnings
  • The PR follows the Contribution Guidelines
  • All unit tests pass, and I have added new tests where possible
  • Is this a breaking change? No - behaviour change only affects
    callers who configured a source tree containing symlinks (now skipped
    instead of dereferenced).

…ndbox

The helpers that populate the sandbox input tree (``_copy_path`` and the
``_path_tree_signature`` walker used for cache invalidation) relied on
``Path.is_file()``, ``Path.is_dir()`` and ``shutil.copy2`` - all of which
follow symlinks by default. When the source tree contains symlinks, that
let entries from outside the configured input source surface inside the
sandbox.

Harden both code paths to never follow symlinks:

- ``_copy_path`` now bails out via ``Path.is_symlink()`` before any
  ``is_dir()`` / ``is_file()`` check, skips non-regular files, and uses
  ``shutil.copy2(..., follow_symlinks=False)`` as defense in depth.
- New ``_iter_real_entries`` walker replaces the previous ``Path.rglob``
  call inside ``_path_tree_signature`` (rglob follows directory symlinks).
- ``_path_tree_signature`` switches to ``Path.lstat()`` so size/mtime are
  never read through a symlink target.

Added regression tests covering:

- A pre-placed file symlink in ``workspace_root`` (top level).
- A pre-placed directory symlink in ``workspace_root``.
- A nested file symlink inside a real subdirectory.
- ``_path_tree_signature`` ignoring symlinks so the cache key reflects only
  what is actually staged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 18, 2026 11:45
@moonbox3
Copy link
Copy Markdown
Contributor

moonbox3 commented May 18, 2026

Python Test Coverage

Python Test Coverage Report •
FileStmtsMissCoverMissing
packages/hyperlight/agent_framework_hyperlight
   _execute_code_tool.py5778685%67, 169, 232, 264, 267, 302–303, 318, 320, 333, 351, 361, 386, 391, 398, 404, 412, 420–422, 424–429, 469, 474, 476, 478, 503–504, 510–511, 516–517, 536–537, 546–547, 582, 609–612, 618, 620, 630–631, 663–664, 667–668, 675, 721, 777–783, 852, 879, 949, 985, 991–993, 1022–1026, 1030–1031, 1036, 1053–1057, 1061–1062, 1119–1120
TOTAL34473392588% 

Python Unit Test Overview

Tests Skipped Failures Errors Time
6922 30 💤 0 ❌ 0 🔥 1m 45s ⏱️

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens the Hyperlight sandbox input staging logic to avoid following symlinks when copying workspace_root and computing cache-key signatures, preventing unintended inclusion of files/directories outside configured inputs.

Changes:

  • Update _copy_path to detect and skip symlinks and non-regular filesystem entries, and to copy with follow_symlinks=False.
  • Update _path_tree_signature to avoid traversing symlinks by walking the tree with a symlink-aware iterator and using lstat().
  • Add regression tests ensuring symlinked files/directories are not staged and do not affect cache-key signatures.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
python/packages/hyperlight/agent_framework_hyperlight/_execute_code_tool.py Implements symlink-avoiding staging and signature computation for sandbox inputs.
python/packages/hyperlight/tests/hyperlight/test_hyperlight_codeact.py Adds tests covering symlink skip behavior for staging and cache-key signatures.

Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review

Reviewers: 4 | Confidence: 90% | Result: All clear

Reviewed: Correctness, Security Reliability, Test Coverage, Design Approach


Automated review by eavanvalkenburg's agents

- _iter_real_entries now yields directories and regular files only,
  skipping non-regular entries (sockets/FIFOs/devices). Keeps the
  cache-key signature consistent with what _copy_path actually stages.
- The four new symlink regression tests skip when the platform does not
  support symlink creation (e.g. unprivileged Windows runners), via a
  local _symlinks_supported helper modelled on the one in
  packages/core/tests/core/test_skills.py. Prevents OSError /
  NotImplementedError from failing CI jobs that have nothing to do with
  the change under test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
eavanvalkenburg added a commit to eavanvalkenburg/agent-framework that referenced this pull request May 18, 2026
Apply the same Windows-CI safety guard as the hyperlight fix in PR microsoft#5919:
the three symlink integration tests create symlinks via Path.symlink_to(),
which fails with OSError / NotImplementedError on unprivileged Windows
runners. Add a local _symlinks_supported helper (mirroring the one in
packages/core/tests/core/test_skills.py) and pytest.skip when symlinks
aren't available, so the tests no longer fail for environment reasons.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…eedback

- _copy_path docstring: narrow the scope to "symlink entries present in
  the source tree at rest" and explicitly call out that the copy is NOT
  atomic with respect to concurrent mutation of the source tree.
  Callers who need that stronger guarantee should snapshot their
  workspace before passing it in. Avoids overpromising on a TOCTOU
  window that pathlib cannot express; closing it properly would need
  fd-based traversal (O_NOFOLLOW | O_DIRECTORY + os.scandir(fd)) with
  a separate Windows story, which is out of scope for this targeted
  fix.

- _path_tree_signature: drop the `if path.is_symlink(): return ()`
  short-circuit. Resolve a symlink root to its real target before
  walking instead. The public construction flow already resolves
  workspace_root / file_mounts[].host_path up front so this never
  affected user-facing code, but the short-circuit was misleading and
  would have produced an empty, stable signature for any direct
  caller that builds a _RunConfig without going through the public
  constructor. Defense in depth: even if a future call site forgets
  to resolve the root, the cache key still reflects real contents.

- Added regression test
  test_path_tree_signature_walks_through_symlinked_root: a symlinked
  workspace root must produce a non-empty signature, AND the signature
  must change when the real target's contents change so the cache key
  actually invalidates.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants