Skip to content

feat(manifest): --facts mode emits Socket facts JSON for Gradle projects (REA-442)#1318

Draft
Jeppe Fredsgaard Blaabjerg (jfblaa) wants to merge 11 commits into
v1.xfrom
jfblaa/rea-442-socket-manifest-gradle-facts-generate-socket-facts-json-from
Draft

feat(manifest): --facts mode emits Socket facts JSON for Gradle projects (REA-442)#1318
Jeppe Fredsgaard Blaabjerg (jfblaa) wants to merge 11 commits into
v1.xfrom
jfblaa/rea-442-socket-manifest-gradle-facts-generate-socket-facts-json-from

Conversation

@jfblaa
Copy link
Copy Markdown
Contributor

@jfblaa Jeppe Fredsgaard Blaabjerg (jfblaa) commented May 19, 2026

Closes REA-442.

Summary

Adds a --facts mode to socket manifest gradle / kotlin / auto that emits a .socket.facts.json file per (sub)project describing the resolved compile/runtime dependency graph. Output matches the canonical SocketFacts schema consumed by depscan's SBOM_Resolve pipeline.

Highlights

  • New init script src/commands/manifest/socket-facts.init.gradle walks resolvable compile/runtime classpath configurations via LenientConfiguration (handles unresolved deps gracefully), dedupes Gradle's variant artifacts so project deps don't get split across java-classes-directory + jar entries, and classifies prod vs dev correctly (prod-seen-anywhere wins so production deps aren't flagged dev just because test classpaths inherit them).
  • Schema mapping: groupId → namespace, artifactId → name, artifact.extension → qualifiers.ext, artifact.classifier → qualifiers.classifier. Strict on MavenQualifiersSchema.
  • Project itself isn't emitted as a component — the scan target shouldn't appear as a dependency of itself.
  • Output filename is .socket.facts.json to match Coana's convention and depscan's **/*.socket.facts.json glob.
  • Surface parity: --facts available on socket manifest gradle, socket manifest kotlin, and the auto-detect path (via defaults.manifest.gradle.facts in socket.json). socket manifest setup prompts for the toggle.

Local verification

Verified end-to-end across a JDK × Gradle matrix (sdkman):

JDK Gradle result
8 5.6.4 identical output
11 6.9.4 identical output
17 7.6.4 identical output
21 8.10.2 identical output
21 9.2.1 identical output

5 fixtures × 5 versions all produced byte-identical output after JSON normalization.

Test fixtures (added)

  • single-module-java/java-library, api/implementation/testImplementation. Versions pinned (guava 31.1-jre, slf4j 1.7.36) to resolve cleanly across all matrix Gradle versions.
  • multi-module-java/ — root + lib + app with a project(':lib') dep, exercises variant dedup.
  • unresolved-deps/ — paired real + intentionally-unresolvable dep, exercises lenient unresolvedModuleDependencies.
  • android-library/ — AGP 8.7.3 + compileSdk 34. Exercises per-variant classpaths (debugCompileClasspath, releaseRuntimeClasspath, …). AGP test exists under `describe.skip` when no `ANDROID_HOME` / `ANDROID_SDK_ROOT` is set.
  • kotlin-multiplatform/ — JVM + JS targets. Exercises per-target classpaths (jvmMainRuntimeClasspath, jsTestCompileClasspath, …).

Tests

  • cmd-manifest-gradle.test.mts / cmd-manifest-kotlin.test.mts--help snapshot + --dry-run + --facts --dry-run cases. Same shape as existing pom-path tests.
  • socket-facts-init-gradle.e2e.test.mts — 10 tests across the 5 fixtures, asserts schema shape, direct/dev attribution, edge integrity, variant dedup, AGP and KMP coverage. Auto-skips when no `gradle` binary is on PATH. Lives in the e2e suite.

CI

.github/workflows/e2e-tests.yml now installs Temurin JDK 21 + Gradle 9.2.1 via the official actions so the integration test actually runs in CI. Android sub-test will skip on CI until `android-actions/setup-android@v3` is added — non-Android coverage is exercised regardless.

Open follow-ups

  • Switch to principled discovery via Gradle's data model (`SourceSetContainer`, `androidComponents.onVariants`, `kotlin.targets.compilations`) instead of name-pattern matching. Name pattern catches everything we test today, so this is cleanup.
  • Promote sdkman matrix sweep into a CI job for multi-Gradle coverage.
  • Wire Android SDK into CI so the AGP test actually runs there.

Test plan

  • CI passes
  • `pnpm test:unit src/commands/manifest` green locally
  • `pnpm exec vitest run --config vitest.e2e.config.mts src/commands/manifest/socket-facts-init-gradle.e2e.test.mts` green locally with JDK + Gradle on PATH

Note

Medium Risk
Introduces a new Gradle execution path (bundled init script + CLI flag/defaults) and adds networked Gradle-based e2e tests/CI tooling, which can affect build stability and output correctness across Gradle variants.

Overview
Adds --facts output for Gradle projects. socket manifest gradle and socket manifest kotlin now accept --facts (and socket.json default defaults.manifest.gradle.facts) to generate .socket.facts.json instead of pom.xml, and generate_auto_manifest routes detected Gradle projects through this mode when enabled.

Implements facts generation via a new bundled Gradle init script. Adds convertGradleToFacts (invokes Gradle with socket-facts.init.gradle and task socketFacts) and updates the Rollup dist build to ship socket-facts.init.gradle alongside the existing init.gradle.

Adds integration coverage and CI support. Introduces Gradle e2e tests plus new Gradle/Java/Kotlin/Android fixtures, updates .gitignore for generated fixture artifacts, and updates the e2e GitHub workflow to install JDK 21 and Gradle so the new tests run.

Reviewed by Cursor Bugbot for commit 911fa30. Configure here.

Generates a per-(sub)project .socket.facts.json describing the resolved
compile/runtime dependency graph, matching the canonical SocketFacts
schema consumed by depscan's SBOM_Resolve pipeline.

The new init script (socket-facts.init.gradle) registers a socketFacts
task on every project, walks resolvable compile/runtime classpaths via
LenientConfiguration so unresolved deps degrade gracefully instead of
failing the build, dedupes Gradle's variant artifacts
(java-classes-directory vs jar) into one component per logical Maven
coordinate, and classifies prod/dev correctly (prod-seen-anywhere wins;
production deps are no longer flagged dev just because test classpaths
inherit them). Output filename matches Coana's .socket.facts.json so the
depscan **/*.socket.facts.json glob picks both pre- and
post-reachability versions out of a scan tarball.

The TS wrapper (convert-gradle-to-facts.mts) routes --facts through the
project's ./gradlew by default; the init script is bundled into dist/
alongside the existing pom-generating init.gradle.

Verified end-to-end across JDK 8/Gradle 5.6.4, JDK 11/Gradle 6.9.4,
JDK 17/Gradle 7.6.4, JDK 21/Gradle 8.10.2, JDK 21/Gradle 9.2.1 —
outputs byte-identical after JSON normalization across the matrix on
the test fixtures.

Test fixtures cover single-module Java, multi-module with project deps,
and an unresolvable-dep scenario. The integration test
(socket-facts-init-gradle.e2e.test.mts) is e2e-only and auto-skips
when no gradle binary is on PATH; CI doesn't yet install JDK/Gradle.

Open follow-ups (tracked in REA-442):
- AGP-aware variant config discovery and an Android fixture
- Kotlin Multiplatform fixture to exercise kotlin.targets.compilations
- Mirror --facts onto cmd-manifest-kotlin and the auto-detect path
- Promote the sdkman matrix sweep to a CI job
…urface tests

The previous commit added an e2e test that runs the socket-facts init
script against real Gradle fixtures. That's broader coverage than the
sibling `socket manifest gradle` / `kotlin` / `auto` commands currently
have in CI — those are tested only via `--help` snapshot and a
`--dry-run` short-circuit, and never actually invoke gradle. Bring the
new `--facts` mode in line: keep the help-text snapshot (already
covers the new flag) and add a `--facts --dry-run` case that mirrors
the existing dry-run test pattern.

Removes the e2e test and the gradle-facts fixtures; drops the matching
.gitignore entries that no longer have anywhere to apply.

The matrix sweep and integration coverage stay as an open follow-up in
REA-442 — to be picked up alongside `setup-java`/`setup-gradle` in CI
if/when we want any of the gradle commands actually exercised end-to-end.
The socketFacts init script was adding the project being scanned as a
component in its own .socket.facts.json. With no parent edges this
shows up downstream as `orphaned component not reachable from any
direct dependency`. The project is the SBOM target, not one of its own
dependencies — drop the root node entirely and let `directIds` carry
the first-level edges. afterEvaluate is no longer needed since project
coordinates were only used to populate that root entry.
cmd-manifest-kotlin.mts: add the --facts flag (also overridable via
defaults.manifest.gradle.facts in socket.json) and route through
convertGradleToFacts when set. Test pair (--help snapshot + --facts
--dry-run) mirrors what we did for cmd-manifest-gradle.

generate_auto_manifest.mts: when defaults.manifest.gradle.facts is
true, the auto-detect path now generates Socket facts instead of pom
files, matching what the explicit subcommands do.

Brings the new mode to feature parity with the existing pom path,
which is exposed through gradle, kotlin and auto.
`socket manifest setup` now asks whether gradle should emit Socket
facts instead of pom.xml files, writing the answer to
defaults.manifest.gradle.facts in socket.json. The prompt sits next to
the existing --verbose toggle and follows the same yes/no/leave-default
ternary shape.
…adle in e2e CI

Brings back the e2e test and gradle-facts fixtures we dropped earlier
in this branch and wires `setup-java@v4` + `gradle/actions/setup-gradle@v4`
into .github/workflows/e2e-tests.yml so the test actually exercises the
init script on every PR (it was previously auto-skipping for lack of a
gradle binary).

Fixtures keep the guava 31.1-jre / slf4j 1.7.36 pins so resolution
stays clean across Gradle 5.6.4 through 9.2.1 in the local sdkman
matrix. The e2e CI uses Gradle 9.2.1 / JDK 21 as a single baseline;
wider Gradle version coverage in CI is still tracked as a follow-up.

Also restores the .gitignore entries for the .gradle/, build/,
.socket.facts.json, and pom.xml outputs that integration runs produce.
Adds an android-library fixture (AGP 8.7.3, compileSdk 34) and the
minimum machinery the init script needs to scrape AGP-flavored
classpaths without blowing up.

Two changes to socket-facts.init.gradle:
- Skip configurations matching *AndroidTest* (instrumented tests).
  Their resolution needs device-vs-host target attributes the init
  script doesn't set, and they fail before producing useful data.
- Wrap per-configuration resolution in try/catch. AGP unit-test
  classpaths (releaseUnitTestCompileClasspath etc.) pull in the
  project's own debugApiElements, which exposes multiple variants
  (android-classes-jar, r-class-jar, android-lint, ...); without
  consumer-side build-type attributes we hit "variant ambiguity"
  errors. We log "[socket-facts] skipping <cfg>: ..." and continue
  so other classpaths still produce output. Production (release +
  debug compile/runtime) variants resolve fine.

The e2e test skips the Android case when neither ANDROID_HOME nor
ANDROID_SDK_ROOT is set — same auto-skip posture as the rest of the
gradle suite. Asserts that androidx.annotation:annotation is captured
as a direct dep, confirming AGP variant configs are being walked.

Still pending: principled discovery via androidComponents.onVariants
(AGP 7+) or android.libraryVariants — current name-pattern matching
catches Android variant configs by suffix and gets the job done, but
isn't AGP-aware in the strict sense.
A minimal KMP project (jvm + js targets) exercises per-target compile
and runtime classpaths (jvmMainCompileClasspath, jsTestRuntimeClasspath,
...) that aren't surfaced through Java's SourceSetContainer. Our
name-pattern selection picks them up by suffix.

The fixture pulls kotlinx-serialization-core (commonMain) so it shows
up in both jvm and js target variants of the artifact, and slf4j-api
(jvmMain-only) to confirm target-specific classpaths flow through. The
test asserts both deps are present in the resulting components array.
@jfblaa Jeppe Fredsgaard Blaabjerg (jfblaa) force-pushed the jfblaa/rea-442-socket-manifest-gradle-facts-generate-socket-facts-json-from branch from 911fa30 to bfd09fb Compare May 19, 2026 09:40
@jfblaa Jeppe Fredsgaard Blaabjerg (jfblaa) marked this pull request as draft May 19, 2026 09:41
Widens what `socket manifest gradle --facts` emits: instead of
whitelisting only `*CompileClasspath` / `*RuntimeClasspath`
configurations and silently dropping the rest, we now walk every
resolvable configuration and tag each artifact with two independent
boolean flags:

  - `dev: true`     ← artifact only ever appeared in test-named configs
  - `tooling: true` ← artifact only ever appeared outside compile/runtime
                      classpaths (annotation processors, linters,
                      code-gen plugins, gradle plugin internals)

"Only ever" means the inverse semantic: if a dep also appears in a
non-test classpath, `dev` is cleared; if it also appears in a
compile/runtime classpath, `tooling` is cleared. So a dep that shows
up as both an `api` and an `annotationProcessor` ends up flagged as
neither dev nor tooling — the production usage wins.

Motivation: downstream reachability scanners (depscan) want to
suppress reachability analysis for tooling artifacts, while still
including them in the SBOM for non-reachability alerts (malware,
license, supply chain). This was previously impossible because the
script dropped tooling deps entirely.

Schema: relies on a new `tooling: z.boolean().optional()` on
SF_ArtifactSchema in depscan, separate work. The cli side emits the
field regardless; older consumers that ignore it stay unaffected.

Fixture/test: single-module-java now declares
`annotationProcessor 'org.projectlombok:lombok:1.18.30'`, exercising
the tooling path. A new test case asserts lombok is emitted with
tooling=true while guava (api) and junit (testImplementation) are
not.

Effect on existing fixtures:
  - single-module-java: 11 components total (10 non-tooling + lombok)
  - kotlin-multiplatform: 29 components (11 non-tooling + 18 Kotlin
    compiler plugin classpath deps as tooling)
  - android-library: 83 components (5 non-tooling, 78 AGP internals
    as tooling) — previously these AGP internals were dropped
  - multi-module-java, unresolved-deps: unchanged shape
Today both the pom path and the facts path print
  "(It will show no output, you can use --verbose to see its output)"
but --verbose only dumped a captured stdout dump *after* gradle
finished. For large multi-project builds (elasticsearch-scale), that
means the user stared at a spinner for many minutes with no signal
that anything was happening.

When --verbose is set, spawn gradle with `stdio: 'inherit'` so the
build's stdout/stderr stream live to the user's terminal. The spinner
is skipped (would conflict with inherited tty output) and the post-run
"Reported exports:" / "POM file copied to:" summary is skipped too —
those lines were already visible inline during the streamed run.

Non-verbose runs are unchanged: spinner + captured stdout + summary.

Also corrects the misleading "(It will show no output, you can use
--verbose to see its output)" message to "(No live output. Pass
--verbose to stream gradle output instead.)".
`dep.moduleArtifacts` access combined with `.file.isDirectory()`
filtering was forcing Gradle to *download* every resolved artifact
file. On large multi-project builds (e.g. elasticsearch) this pulled
hundreds of MB of distribution archives — .deb / .tar.gz / .zip
packaging outputs that some configurations expose as dependencies of
the build target itself, not as library deps. User observed:
`> :qa:packaging:socketFacts > elasticsearch-7.17.22-amd64.deb >
88.3 MiB/310.4 MiB downloaded` mid-task.

Fix: read `artifact.type` / `artifact.extension` / `artifact.classifier`
from already-fetched POM/GMM metadata. Never touch `artifact.file` —
that's what triggers the actual file download. Replace the
`!file.isDirectory()` filter (which forced fetch) with a name-based
filter (`INTERNAL_ARTIFACT_TYPES`: java-classes-directory,
java-resources-directory, android-classes-directory,
android-resources-directory) that drops Gradle-internal variants we
don't want to surface as `qualifiers.ext`.

Verified locally:
- commons-io:commons-io:2.15.1 resolves cleanly under cleared cache,
  emits qualifiers.ext='jar', no jar in
  ~/.gradle/caches/modules-2/files-2.1/commons-io after the run
- 11/11 e2e fixture tests still green, qualifiers preserved across
  single-module / multi-module / Android / KMP fixtures
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant