Guard *scanf() return type extension by counter#5641
Conversation
|
can you come up with a test, this change fixes? |
|
Absolutely — I already have a fixture for the exact example from the commit message (mixing positional |
|
Wow |
3b1a8e8 to
c386aab
Compare
|
Tests are green – the phpbench and Infection reds look pre‑existing / unrelated to this change. Let me know if you’d like me to dig into those or if this is good to go. 😊 |
e4293ad to
e38254a
Compare
|
Created a try snippet so it's better to compare with before and now aligned with the fixture that had another blooper: https://phpstan.org/r/087c4c5f-9a00-47cc-8d36-f21fb1b27a31 |
| } | ||
|
|
||
| function sscanfInvalidFormatMixingPositionalWithSequential(string $s) { | ||
| assertType('null', sscanf($s, '%1$s %s')); |
There was a problem hiding this comment.
Shouldn't this be NEVER or Error on PHP8+ since it will throw an error ?
There was a problem hiding this comment.
But given the fact the return type extension already use TypeCombinator::addNull somewhere else, I'm unsure...
There was a problem hiding this comment.
Good point! Yes, on PHP 8+ it should be never (the call throws, 3v4l.org). I kept null for now as a safe lower bound that doesn’t require a PhpVersion dependency in the extension. If you prefer the stricter version, I can inject PhpVersion and return NeverType for ≥8.0.
/E: found the pattern
There was a problem hiding this comment.
But given the fact the return type extension already use
TypeCombinator::addNullsomewhere else, I'm unsure...
... me either but I wanted to give it a try and put a fixup on top with php version id test data files (.php scripts) that mimic the same branching as for explode(). The fixup keeps all options open if you prefer something else – let me know.
|
Initially I thought using the new count helper, we could get rid of a regex path. Since this is not the case, I think the PR as is does not provide much value. We would already report a error for the invalid format. |
|
Thanks for the honest feedback! I’d like to clarify what this small PR actually fixes, because I think there might be a subtle but important distinction. The existing rule already reports an error on an invalid format, yes. But the return type extension runs independently—and previously it would still infer This PR adds a minimal gatekeeper that short‑circuits the type inference for any format the counter finds invalid. It returns So the value isn’t “a more precise return type” in isolation—it’s eliminating a class of false‑positive type information that otherwise pollutes later analysis steps. And it does so with a tiny, proven, self‑contained check that doesn’t touch the regex at all. I’m happy to iterate or, if you’d rather, close this PR and save the gatekeeper for a later round. Just wanted to make sure the motivation is clear. 😊 |
please show a real-world like snippet on phpstan.org/try which proofs this statement |
|
Fair enough – here’s a real‑world‑like snippet: function test(string $s): void {
$result = sscanf($s, '%1$s %s');
if ($result !== null) {
echo count($result);
}
}Right now, PHPStan infers (Note: Let me know if you’d like the snippet turned into a test fixture – happy to add it. |
144d4e0 to
75e26f5
Compare
your example does not contain a false positive as in
|
|
Fair point — I should have said "false-positive type information" rather than "errors", and I stand corrected. The old return type implies the call might succeed; the gatekeeper corrects that. This PR is ready either way. If you'd like to merge it as the first step before fixing the regex, great; if not, I'll close it and revisit later. Just let me know. 😊 |
89bba3d to
6b2e819
Compare
|
I’ve been thinking about your hope to use the counter to circumvent the regex path. This PR now takes a first concrete step in that direction: the NUL‑byte fix and the builder lift make the counter the foundation for the array shape, and the regex overhaul (applied here) fixes all the known missing‑specifier cases. I’ve also recycled the test cases from the earlier bot‑generated PR to prove the fixes. I’d love to hear if this incremental approach aligns with what you had in mind – the regex is still here for precision, but the counter now guarantees correctness. Happy to adjust as you prefer. |
| } | ||
|
|
||
| if ($placeholderCount === 0) { | ||
| return TypeCombinator::addNull($arrayBuilder->getArray()); |
There was a problem hiding this comment.
Should it addNull only if !throwsValueErrorForInternalFunctions ? or null is still possible ? (same for other addNull call)
There was a problem hiding this comment.
When the counter says zero, we can't narrow array{}|null to array{} without knowing whether the format would actually match the input. The empty‑string check above already handles the one case where we can. For everything else, null is still required.
Very interesting aspects, indeed. Thanks a lot for asking.
Use the recently fixed `PrintfHelper::getScanfPlaceholdersCount()` to detect invalid format strings in the return type extension. If the format is uncountable, the call is guaranteed to fail – return `NullType` immediately. This is the same approach that eliminated the count regression. It already fixes all false‑positive type inferences for malformed formats. For example, invalid format (mixing positional %n$ with sequential %) returns `null`. Gegenprobe: the counter doesn’t guess – it asks PHP itself.
Return `NeverType` on PHP 8+ (where an invalid format throws
`ValueError`) and `NullType` on PHP < 8.0. This makes the gatekeeper
precise about the call never returning on modern PHP.
The test strategy for the version‑dependent types will be settled
in review – the CI now whispers `*NEVER*` where `null` stood before,
and `array{}|null` where `array|null` stood before.
Use the recently fixed
PrintfHelper::getScanfPlaceholdersCount()to detect invalid format strings in the return type extension. If the format is uncountable, the call is guaranteed to fail – returnNullTypeimmediately.This is the same approach that eliminated the count regression. It already fixes all false‑positive type inferences for malformed formats.
For example, invalid format (mixing positional %n$ with sequential %) returns
null.