fix(shared-infra): record skipped files in speckit.manifest.json#2483
fix(shared-infra): record skipped files in speckit.manifest.json#2483eldar702 wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a shared-infrastructure tracking bug where install_shared_infra(..., force=False) could skip all existing .specify/scripts/ and .specify/templates/ files without recording them, resulting in speckit.manifest.json being saved with an empty files mapping (issue #2107).
Changes:
- Record skipped (already-existing) shared scripts/templates into
speckit.manifest.jsonduringinstall_shared_infrawhen they are not already tracked. - Add a regression test ensuring a “lost manifest” reinstall reconstructs a non-empty manifest that includes all previously tracked files.
Show a summary per file
| File | Description |
|---|---|
src/specify_cli/shared_infra.py |
Records skipped existing shared infra files into the shared manifest during install. |
tests/integrations/test_integration_claude.py |
Adds a regression test covering the “lost manifest + skip branch” scenario. |
Copilot's findings
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comments suppressed due to low confidence (1)
src/specify_cli/shared_infra.py:303
- Same issue as the scripts loop: calling
manifest.record_existing(rel_skip)in the skip branch will raise ifdstexists but is not a regular file (e.g., a directory). Add andst.is_file()guard or raise an explicit error so a non-file collision doesn’t fail with an opaque hashing/open error.
# ``.specify/`` tree does not silently drop it (#2107).
# Skip if already tracked — preserving the original hash
# keeps user-modification detection working downstream.
if rel_skip not in manifest.files:
manifest.record_existing(rel_skip)
- Files reviewed: 2/2 changed files
- Comments generated: 2
| # ``.specify/`` tree does not silently drop it (#2107). | ||
| # Skip if already tracked — preserving the original hash | ||
| # keeps user-modification detection working downstream. | ||
| if rel_skip not in manifest.files: | ||
| manifest.record_existing(rel_skip) |
| f"speckit.manifest.json not written at {manifest_path}" | ||
| ) | ||
| data = json.loads(manifest_path.read_text(encoding="utf-8")) | ||
| return data.get("files") or data.get("_files") or {} |
This comment was marked as outdated.
This comment was marked as outdated.
mnriem
left a comment
There was a problem hiding this comment.
Please address Copilot feedback
Address Copilot review feedback on PR github#2483. The previous fix called ``manifest.record_existing(rel_skip)`` from the skip branch of both loops in ``install_shared_infra``, which would crash with ``IsADirectoryError`` (or another ``OSError``) if a directory or other non-regular-file happened to exist at the expected destination path — since ``record_existing`` opens the file to compute its SHA-256. Three coordinated fixes: 1. ``IntegrationManifest.record_existing`` now validates its precondition: it raises ``ValueError`` if the path is a symlink or is not a regular file. The docstring already promised "an already-existing file"; this enforces it. The symlink check runs on the un-resolved path because ``_validate_rel_path`` calls ``resolve()``, which would silently follow the symlink. Mirrors the existing ``_ensure_safe_manifest_destination`` precedent in the same module. 2. In ``install_shared_infra``'s scripts and templates skip branches, guard the ``record_existing`` call with ``dst.is_file()`` and wrap it in ``try/except (OSError, ValueError)``. A directory collision, permission error, or TOCTOU race no longer aborts the whole install — the user gets a per-path warning, the path still surfaces in ``skipped_files``, and the rest of the install continues. 3. ``_read_manifest_files`` in the regression test no longer falls back to ``data.get("_files")`` (Copilot's low-confidence finding): the silent fallback could mask a schema regression where the public ``files`` key is renamed. It now asserts ``"files" in data`` and that the value is a dict. Add two regression tests in ``TestSpeckitManifestRecordsSkippedFiles`` covering the directory-at-destination edge case for both the scripts loop and the templates loop. Both verify (a) install does not crash, (b) the non-file path is not recorded in the manifest, and (c) the path still surfaces in the user-visible warning. The "shared infrastructure file(s)" warning text is changed to "path(s)" so it remains accurate when non-file entries appear in the list. Refs github#2107
|
Thanks for the review! Pushed What changed1. Hardened The function's docstring already promised "an already-existing file", but didn't enforce it. It now raises 2. Guarded both call sites in if dst_path.is_file() and rel_skip not in manifest.files:
try:
manifest.record_existing(rel_skip)
except (OSError, ValueError) as exc:
console.print(
f"[yellow]⚠[/yellow] could not record {rel_skip} in manifest: {exc}"
)Belt-and-suspenders: the 3. Tightened the test helper (Copilot's low-confidence finding)
4. New regression tests Two new methods in 5. Cosmetic The user-facing warning string Verification
Out of scope (intentionally)
|
| rel = Path(rel_path) | ||
| # Check ``is_symlink()`` on the un-resolved path because | ||
| # ``_validate_rel_path`` resolves the path (which would follow | ||
| # the symlink and silently record the target instead). | ||
| if (self.project_root / rel).is_symlink(): | ||
| raise ValueError( | ||
| f"Refusing to record symlinked manifest path: {rel}" | ||
| ) | ||
| abs_path = _validate_rel_path(rel, self.project_root) |
| def record_existing(self, rel_path: str | Path) -> None: | ||
| """Record the hash of an already-existing file at *rel_path*. | ||
|
|
||
| Raises ``ValueError`` if *rel_path* resolves outside the project root. | ||
| """Record the hash of an already-existing regular file at *rel_path*. | ||
|
|
||
| Raises: | ||
| ValueError: if *rel_path* resolves outside the project root, is | ||
| a symlink, or is not a regular file. A directory or other | ||
| non-file path cannot be silently recorded — its hash would | ||
| be meaningless and ``check_modified``/``uninstall`` would | ||
| treat the entry as permanently broken. | ||
| """ | ||
| rel = Path(rel_path) | ||
| # Check ``is_symlink()`` on the un-resolved path because | ||
| # ``_validate_rel_path`` resolves the path (which would follow | ||
| # the symlink and silently record the target instead). | ||
| if (self.project_root / rel).is_symlink(): | ||
| raise ValueError( | ||
| f"Refusing to record symlinked manifest path: {rel}" | ||
| ) | ||
| abs_path = _validate_rel_path(rel, self.project_root) | ||
| if not abs_path.is_file(): | ||
| raise ValueError( | ||
| f"Manifest path is not a regular file: {rel}" | ||
| ) |
| if dst_path.exists() and not force: | ||
| skipped_files.append(dst_path.relative_to(project_path).as_posix()) | ||
| rel_skip = dst_path.relative_to(project_path).as_posix() | ||
| skipped_files.append(rel_skip) | ||
| # Record the existing-on-disk file in the manifest so a | ||
| # fresh manifest run against an already-populated | ||
| # ``.specify/`` tree does not silently drop it (#2107). | ||
| # Skip if already tracked — preserving the original hash | ||
| # keeps user-modification detection working downstream. | ||
| # Also skip non-files (directory collision, FIFO, …): | ||
| # ``record_existing`` would refuse them, and the path | ||
| # still surfaces via the ``skipped_files`` warning below. | ||
| if dst_path.is_file() and rel_skip not in manifest.files: | ||
| try: | ||
| manifest.record_existing(rel_skip) |
|
Please address Copilot feedback and resolve conflicts |
`install_shared_infra` skipped files that already existed on disk when `force=False`, but the skip branches in both the scripts loop and the templates loop only appended to `skipped_files` without calling `manifest.record_existing`. So when the function ran with a fresh manifest against an already-populated `.specify/` tree (e.g. after the manifest was deleted, corrupted, or extracted out of band), every file went down the skip path, `planned_copies` / `planned_templates` stayed empty, and `manifest.save()` wrote an empty `files` field — leaving the integration believing nothing was installed. Record every skipped file in the manifest, but only when it is not already tracked. This preserves the original hash for files that were previously recorded so `check_modified()` (used by `integration use` to decide whether a user has customized a template) keeps working correctly. Add `TestSpeckitManifestRecordsSkippedFiles` in `tests/integrations/test_integration_claude.py` covering both the fresh-skip path and the recover-after-lost-manifest path. Fixes github#2107
Address Copilot review feedback on PR github#2483. The previous fix called ``manifest.record_existing(rel_skip)`` from the skip branch of both loops in ``install_shared_infra``, which would crash with ``IsADirectoryError`` (or another ``OSError``) if a directory or other non-regular-file happened to exist at the expected destination path — since ``record_existing`` opens the file to compute its SHA-256. Three coordinated fixes: 1. ``IntegrationManifest.record_existing`` now validates its precondition: it raises ``ValueError`` if the path is a symlink or is not a regular file. The docstring already promised "an already-existing file"; this enforces it. The symlink check runs on the un-resolved path because ``_validate_rel_path`` calls ``resolve()``, which would silently follow the symlink. Mirrors the existing ``_ensure_safe_manifest_destination`` precedent in the same module. 2. In ``install_shared_infra``'s scripts and templates skip branches, guard the ``record_existing`` call with ``dst.is_file()`` and wrap it in ``try/except (OSError, ValueError)``. A directory collision, permission error, or TOCTOU race no longer aborts the whole install — the user gets a per-path warning, the path still surfaces in ``skipped_files``, and the rest of the install continues. 3. ``_read_manifest_files`` in the regression test no longer falls back to ``data.get("_files")`` (Copilot's low-confidence finding): the silent fallback could mask a schema regression where the public ``files`` key is renamed. It now asserts ``"files" in data`` and that the value is a dict. Add two regression tests in ``TestSpeckitManifestRecordsSkippedFiles`` covering the directory-at-destination edge case for both the scripts loop and the templates loop. Both verify (a) install does not crash, (b) the non-file path is not recorded in the manifest, and (c) the path still surfaces in the user-visible warning. The "shared infrastructure file(s)" warning text is changed to "path(s)" so it remains accurate when non-file entries appear in the list. Refs github#2107
… tests
Address Copilot review (2026-05-11, review id 4266902103):
1. `record_existing` was calling `(self.project_root / rel).is_symlink()`
BEFORE validating containment. For absolute paths or paths containing
`..`, this performed a filesystem stat outside the project root before
`_validate_rel_path()` raised. Add a cheap lexical pre-check that
delegates to `_validate_rel_path()` for the canonical error messages,
so the symlink stat only ever runs on paths that are already lexically
inside the project root.
2. Add focused unit tests in `tests/integrations/test_manifest.py` for
the symlink and non-regular-file error paths, including:
- symlink target rejection
- dangling symlink rejection (caught by the symlink guard before
the is_file check)
- directory path rejection (is_file == False)
- missing-path rejection (is_file == False)
- absolute-path lexical pre-check
The Copilot reviewer noted these guards had no focused coverage in
`test_manifest.py`, only via the `test_integration_claude.py`
regression test.
3. The third Copilot finding (repeated `dict(self._files)` copies via
`manifest.files` in the skip branches) is already resolved on this
branch by using `prior_hashes` — the function-scope snapshot taken at
the top of `install_shared_infra` — for the membership check, instead
of `manifest.files`.
AI disclosure: drafted with assistance from Claude (Opus 4.7).
6790654 to
5a6e25b
Compare
|
Thanks for the bump @mnriem! Pushed 1. Rebased onto current
|
| # Record the existing-on-disk file in the manifest so a | ||
| # fresh manifest run against an already-populated | ||
| # ``.specify/`` tree does not silently drop it (#2107). | ||
| # ``prior_hashes`` is the function-scope snapshot taken | ||
| # at entry, so this membership check is O(1) and avoids | ||
| # the repeated ``dict(self._files)`` copy that | ||
| # ``manifest.files`` performs on every access. | ||
| if dst_path.is_file() and rel not in prior_hashes: | ||
| try: | ||
| manifest.record_existing(rel) |
| if dst.is_file() and rel not in prior_hashes: | ||
| try: | ||
| manifest.record_existing(rel) |
| # Check ``is_symlink()`` on the un-resolved path because | ||
| # ``_validate_rel_path`` resolves the path (which would follow | ||
| # the symlink and silently record the target instead). | ||
| if (self.project_root / rel).is_symlink(): | ||
| raise ValueError( | ||
| f"Refusing to record symlinked manifest path: {rel}" | ||
| ) |
| # Cheap lexical pre-check first so absolute / parent-traversal paths | ||
| # don't trigger a filesystem stat outside the project root before | ||
| # ``_validate_rel_path`` raises. ``_validate_rel_path`` produces the | ||
| # canonical error messages used elsewhere. | ||
| if rel.is_absolute() or ".." in rel.parts: | ||
| _validate_rel_path(rel, self.project_root) | ||
| # Defensive: _validate_rel_path always raises on these inputs, | ||
| # but make the contract explicit if it is ever loosened. | ||
| raise ValueError(f"Manifest path escapes project root: {rel}") | ||
| # Check ``is_symlink()`` on the un-resolved path because | ||
| # ``_validate_rel_path`` resolves the path (which would follow | ||
| # the symlink and silently record the target instead). | ||
| if (self.project_root / rel).is_symlink(): | ||
| raise ValueError( | ||
| f"Refusing to record symlinked manifest path: {rel}" | ||
| ) | ||
| abs_path = _validate_rel_path(rel, self.project_root) |
…canonical-path guards Address Copilot review id 4309888722 (2026-05-18) on PR github#2483: 1. Recovery semantics (shared_infra.py:371, 412) — install_shared_infra now passes ``recovered=True`` when re-recording a skipped existing file. This flag funnels into a new ``recovered_files`` array in the manifest JSON, so a future ``refresh_managed`` run can distinguish "hash I produced" from "hash I observed on a file that may be a user customization" and avoid silent overwrite without ``--refresh-shared-infra``. Schema is purely additive: ``files: dict[str, str]`` is unchanged; the new ``recovered_files: list[str]`` is omitted when empty. 2. Symlinked ancestor (manifest.py:172) — ``record_existing`` now walks every component of the rel path and rejects any symlinked ancestor, not just a symlinked leaf. Catches ``linked_dir/file.txt`` where ``linked_dir`` is a symlink, which previously slipped past the leaf-only ``is_symlink()`` check and was resolved through by ``_validate_rel_path``. Mirrors the component-walk pattern in ``_ensure_safe_manifest_directory``. 3. Misleading "escapes project root" message (manifest.py:168) — paths like ``dir/../file.txt`` normalize inside the project, so the old message lied about what was wrong. New message: "Manifest paths must be canonical; '..' segments are not allowed". Still rejects (canonical keys are required so ``check_modified``/``uninstall`` cannot key the same file under two paths). Tests: 7 new test methods across TestManifestRecoveredFiles and TestRecordExistingNewGuards covering all 4 Copilot findings. Full suite passes locally. 🤖 AI disclosure: drafted with assistance from Claude (Opus 4.7).
|
Thanks @copilot-pull-request-reviewer for the additional pass. Pushed 1 & 2. Recovery semantics (
|
Summary
In
install_shared_infra(src/specify_cli/shared_infra.py), the skip branches in both the scripts loop and the templates loop now record each skipped file inspeckit.manifest.json, so a fresh-manifest run against an already-populated.specify/tree no longer writes an emptyfilesfield.Fixes #2107
Problem
When
install_shared_infraran withforce=Falseagainst a project that already had files under.specify/scripts/and.specify/templates/— but a fresh, empty manifest — every iteration hit theif dst.exists() and not force: skipped_files.append(...); continuebranch.planned_copiesandplanned_templatesstayed empty, the post-loop record loop had nothing to record, andmanifest.save()serialisedfiles: {}. The integration then believed nothing had been installed.This bites users who delete or lose
speckit.manifest.json, who extract.specify/out-of-band, or who hit a code path where the manifest can't be loaded but the directory tree is intact.Solution
Call
manifest.record_existing(rel_skip)from inside both skip branches, but only when the path is not already tracked:The guard matters:
record_existingalways re-hashes the on-disk content, so without it a customized template would have its manifest hash overwritten with the customized hash, defeating the user-modification detection thatintegration userelies on (test_use_preserves_modified_templates_unless_forcedis the canonical regression test for that flow).Symmetric change in both the scripts loop and the templates loop.
Test plan
TestSpeckitManifestRecordsSkippedFiles::test_install_shared_infra_records_skipped_files— populates.specify/via a normal first run, deletes the manifest, runsinstall_shared_infraagain withforce=False(every file goes down the skip branch), asserts the saved manifest is non-empty and is a superset of the first run's files. Fails onmain, passes after the fix.test_use_preserves_modified_templates_unless_forcedcontinues to pass thanks to theif rel_skip not in manifest.files:guard — customized templates keep their original hash, sointegration usestill skips overwriting them.pytestfull suite: 2787 passed, 34 skipped — zero regressions.Notes
This change was AI-assisted. The fix was selected after a 3-agent solution-design debate where two designers independently identified the same root cause (skip branch in
install_shared_infrafailing to record), giving high convergent confidence. A third designer initially proposed a fix inclaude/__init__.py, but that targets the integration manifest (claude.manifest.json), not the shared-infrastructure manifest (speckit.manifest.json) the issue describes — convergent analysis correctly localized the right file.A first attempt at the fix omitted the
if rel_skip not in manifest.files:guard and broketest_use_preserves_modified_templates_unless_forced(customized templates were silently re-hashed). Adding the guard fixed the regression while still recovering from a lost-manifest scenario.