Add msgpack-jackson3 module#983
Draft
komamitsu wants to merge 17 commits into
Draft
Conversation
Introduces a new msgpack-jackson3 subproject that provides MessagePack serialization for Jackson 3.x (tools.jackson), which requires Java 17+. Tests use JUnit 4 via junit-interface to keep the build simple.
Replace the fragile JAVA17_HOME/JAVA_HOME path resolution with a simple requirement to run sbt under Java 17+ (managed via mise). Drop redundant -encoding/-Xlint flags already provided by buildSettings.
- Remove JavaInfo.java: STRING_VALUE_FIELD_IS_CHARS is always false on Java 17+ since String.value is byte[], not char[]; simplify parser and generator accordingly - Remove dead writeCharArrayTextKey method and tempChars field - Add OsgiKeys.importPackage exclusions for android.os and sun.* packages - Add comments on (int) cast overflow risk in currentTokenLocation/currentLocation
There was a problem hiding this comment.
Pull request overview
This PR introduces a new msgpack-jackson3 submodule intended to provide MessagePack <-> Jackson integration for Jackson 3.x (tools.jackson.*) while keeping the existing Jackson 2.x module intact.
Changes:
- Adds a new
msgpack-jackson3sbt subproject targeting Java 17 and Jacksontools.jackson.core:jackson-databind:3.1.2. - Ports the MessagePack
TokenStreamFactory,JsonParser,JsonGenerator, and mapper/key-serialization glue to the Jackson 3 API surface. - Adds a full test suite (plus benchmark tests) for parser/generator/factory/mapper behavior under the new module.
Reviewed changes
Copilot reviewed 25 out of 26 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| build.sbt | Adds msgpack-jackson3 subproject definition and aggregates it under root. |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/ExtensionTypeCustomDeserializers.java | Extension-type custom deserializer registry for parser embedded objects. |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/JsonArrayFormat.java | Jackson annotation introspector to serialize POJOs as arrays (schema-less). |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/MessagePackExtensionType.java | POJO + serializer for MessagePack extension types. |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/MessagePackFactory.java | Jackson TokenStreamFactory implementation for MessagePack. |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/MessagePackGenerator.java | Jackson JsonGenerator implementation that writes MessagePack. |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/MessagePackKeySerializer.java | Key serializer that routes non-String map keys through MessagePack key handling. |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/MessagePackMapper.java | ObjectMapper wrapper + builder helpers for BigInteger/BigDecimal formatting. |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/MessagePackParser.java | Jackson JsonParser implementation that reads MessagePack. |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/MessagePackReadContext.java | Read context implementation for tracking nesting/name/dup detection. |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/MessagePackSerializedString.java | SerializableString wrapper used for MessagePack key serialization. |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/MessagePackSerializerFactory.java | Serializer factory override to enforce MessagePack key serialization. |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/TimestampExtensionModule.java | Jackson module for MessagePack timestamp extension <-> Instant. |
| msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/Tuple.java | Simple tuple used for cached unpacker bookkeeping. |
| msgpack-jackson3/src/test/java/org/msgpack/jackson/dataformat/ExampleOfTypeInformationSerDe.java | Example test for embedding type metadata for polymorphic objects. |
| msgpack-jackson3/src/test/java/org/msgpack/jackson/dataformat/MessagePackDataformatForFieldIdTest.java | Tests for integer-key support and mixed key handling. |
| msgpack-jackson3/src/test/java/org/msgpack/jackson/dataformat/MessagePackDataformatForPojoTest.java | POJO round-trip tests (normal, nested, schema-less array format, etc.). |
| msgpack-jackson3/src/test/java/org/msgpack/jackson/dataformat/MessagePackDataformatTestBase.java | Shared test fixtures and helper POJOs. |
| msgpack-jackson3/src/test/java/org/msgpack/jackson/dataformat/MessagePackFactoryTest.java | Factory copy/creation tests. |
| msgpack-jackson3/src/test/java/org/msgpack/jackson/dataformat/MessagePackGeneratorTest.java | Generator behavioral tests (objects/arrays/primitives/keys/big numbers/etc.). |
| msgpack-jackson3/src/test/java/org/msgpack/jackson/dataformat/MessagePackMapperTest.java | Mapper builder behavior tests for BigInteger/BigDecimal-as-string options. |
| msgpack-jackson3/src/test/java/org/msgpack/jackson/dataformat/MessagePackParserTest.java | Parser behavioral tests for objects/arrays/primitives/extensions/errors. |
| msgpack-jackson3/src/test/java/org/msgpack/jackson/dataformat/TimestampExtensionModuleTest.java | Tests for MessagePack timestamp extension serialization/deserialization. |
| msgpack-jackson3/src/test/java/org/msgpack/jackson/dataformat/benchmark/Benchmarker.java | Benchmark harness used by benchmark tests. |
| msgpack-jackson3/src/test/java/org/msgpack/jackson/dataformat/benchmark/MessagePackDataformatHugeDataBenchmarkTest.java | Benchmark test for huge data (JSON vs MessagePack). |
| msgpack-jackson3/src/test/java/org/msgpack/jackson/dataformat/benchmark/MessagePackDataformatPojoBenchmarkTest.java | Benchmark test for POJO ser/de (JSON vs MessagePack). |
Comments suppressed due to low confidence (2)
msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/MessagePackGenerator.java:501
getBytesIfAsciiwrites intobytes[i]while iteratingifromoffsettooffset+len. Whenoffset != 0this will write past the end of the newly allocatedbytesarray (lengthlen) and can throwArrayIndexOutOfBoundsException. Index into the destination array relative tooffset(or use a separate destination index).
private byte[] getBytesIfAscii(char[] chars, int offset, int len)
{
byte[] bytes = new byte[len];
for (int i = offset; i < offset + len; i++) {
char c = chars[i];
if (c >= 0x80) {
return null;
}
bytes[i] = (byte) c;
}
return bytes;
msgpack-jackson3/src/main/java/org/msgpack/jackson/dataformat/MessagePackGenerator.java:531
writeByteArrayTextValueignores the providedoffset/lenwhen the bytes are ASCII: it wraps the entiretextarray inAsciiCharString. This can serialize extra bytes outside the intended slice. Consider copying the[offset, offset+len)range (or store the slice metadata) when creatingAsciiCharString.
private void writeByteArrayTextValue(byte[] text, int offset, int len) throws IOException
{
if (areAllAsciiBytes(text, offset, len)) {
addValueNode(new AsciiCharString(text));
return;
}
addValueNode(new String(text, offset, len, DEFAULT_CHARSET));
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+388
to
+400
| else if (v instanceof ByteBuffer) { | ||
| ByteBuffer bb = (ByteBuffer) v; | ||
| int len = bb.remaining(); | ||
| if (bb.hasArray()) { | ||
| messagePacker.packBinaryHeader(len); | ||
| messagePacker.writePayload(bb.array(), bb.arrayOffset(), len); | ||
| } | ||
| else { | ||
| byte[] data = new byte[len]; | ||
| bb.get(data); | ||
| messagePacker.packBinaryHeader(len); | ||
| messagePacker.addPayload(data); | ||
| } |
Comment on lines
+141
to
+151
| private String unpackString(MessageUnpacker messageUnpacker) throws IOException | ||
| { | ||
| int strLen = messageUnpacker.unpackRawStringHeader(); | ||
| if (strLen <= tempBytes.length) { | ||
| messageUnpacker.readPayload(tempBytes, 0, strLen); | ||
| return new String(tempBytes, 0, strLen); | ||
| } | ||
| else { | ||
| byte[] bytes = messageUnpacker.readPayload(strLen); | ||
| return new String(bytes, 0, strLen, StandardCharsets.UTF_8); | ||
| } |
Comment on lines
+18
to
+24
| import com.fasterxml.jackson.annotation.JsonFormat; | ||
|
|
||
| import tools.jackson.core.Version; | ||
| import tools.jackson.databind.ObjectMapper; | ||
| import tools.jackson.databind.cfg.MapperBuilder; | ||
| import tools.jackson.databind.cfg.MapperBuilderState; | ||
|
|
Comment on lines
+6
to
+10
|
|
||
| import com.fasterxml.jackson.annotation.JsonFormat; | ||
|
|
||
| import static com.fasterxml.jackson.annotation.JsonFormat.Shape.ARRAY; | ||
|
|
Comment on lines
+69
to
+111
| return 0; | ||
| } | ||
|
|
||
| @Override | ||
| public int appendQuoted(char[] chars, int i) | ||
| { | ||
| return 0; | ||
| } | ||
|
|
||
| @Override | ||
| public int appendUnquotedUTF8(byte[] bytes, int i) | ||
| { | ||
| return 0; | ||
| } | ||
|
|
||
| @Override | ||
| public int appendUnquoted(char[] chars, int i) | ||
| { | ||
| return 0; | ||
| } | ||
|
|
||
| @Override | ||
| public int writeQuotedUTF8(OutputStream outputStream) | ||
| { | ||
| return 0; | ||
| } | ||
|
|
||
| @Override | ||
| public int writeUnquotedUTF8(OutputStream outputStream) | ||
| { | ||
| return 0; | ||
| } | ||
|
|
||
| @Override | ||
| public int putQuotedUTF8(ByteBuffer byteBuffer) | ||
| { | ||
| return 0; | ||
| } | ||
|
|
||
| @Override | ||
| public int putUnquotedUTF8(ByteBuffer byteBuffer) | ||
| { | ||
| return 0; |
| MessagePackExtensionType ext = p.readValueAs(MessagePackExtensionType.class); | ||
| if (ext.getType() != EXT_TYPE) { | ||
| throw new RuntimeException( | ||
| String.format("Unexpected extension type (0x%X) for Instant object", ext.getType())); |
Comment on lines
+18
to
+20
| import com.fasterxml.jackson.annotation.JsonIgnore; | ||
| import com.fasterxml.jackson.annotation.JsonIgnoreProperties; | ||
| import com.fasterxml.jackson.annotation.JsonProperty; |
- MessagePackParser.unpackString: use UTF-8 for short strings (<=64 bytes) to match the long-string path; platform default charset was used before - MessagePackGenerator.writeNumber(String): try BigInteger before Double to avoid precision loss for large integer strings - MessagePackGenerator.writeString(Reader, int): handle len=-1 (unknown length) by reading until EOF instead of throwing NegativeArraySizeException
Implement MessagePackFactoryBuilder extending DecorableTSFBuilder, wire it into MessagePackFactory via a builder constructor and rebuild(), and fix snapshot() to return a copy instead of this. Add tests covering rebuild() and snapshot() behavior.
Three bugs where non-zero offset/position was ignored: - getBytesIfAscii: bytes[i] -> bytes[i - offset] - writeByteArrayTextValue: store slice instead of whole array - ByteBuffer: use arrayOffset() + position() as payload start All three bugs exist identically in msgpack-jackson; documented in plans/preexisting-issues.md with practical impact notes.
- close(): set _closed=true directly instead of calling super.close(), which would unconditionally close the underlying stream regardless of AUTO_CLOSE_TARGET - endCurrentContainer(): reset currentState to IN_ROOT when closing the root container, preventing stale state if the generator is reused Both bugs exist in msgpack-jackson; documented in plans/preexisting-issues.md
Use getChars(offset, offset+len) instead of toCharArray() which copies the entire string unnecessarily.
Replace the string-based lossy check with `decimal.compareTo(BigDecimal.valueOf(doubleValue)) != 0`, avoiding two stripTrailingZeros + toEngineeringString allocations per BigDecimal write.
…dString stubs, and Version - MessagePackGenerator: implement streamWriteContext() using SimpleStreamWriteContext; wire writeStartArray/Object, endCurrentContainer, writeName, _verifyValueWrite, currentValue, and assignCurrentValue to track write context correctly - MessagePackParser: on close, null out the byte-array source in the ThreadLocal to release large payload references; InputStream sources are preserved to keep the sequential-read-from-same-stream behavior working - MessagePackSerializedString: implement all stub append/write/put methods - Add PackageVersion (0.9.12) and use it in generator, parser, and factory - Note: messageBufferOutputHolder ThreadLocal not fixed — clearing it causes 23% serialization regression by defeating the OutputStreamBufferOutput reuse cache
- MessagePackMapper.version() now returns PackageVersion.VERSION instead of Version.unknownVersion(), consistent with factory and parser. - writeName(SerializableString): remove IllegalArgumentException for unknown implementations (fall back to writeName(getValue())); also fix missing writeContext.writeName() call for MessagePackSerializedString. - unpackString(): delegate to messageUnpacker.unpackString() instead of manual header-read + payload copy; removes tempBytes allocation. ~25% deserialization speedup on pojo benchmark (fewer copies). - Rename local variable 'scheme' -> 'schema' in test.
msgpack-jackson3 requires Java 17+, so char[]-backed strings (Java 8) are not a concern. Java 9+ compact strings make the old AsciiCharString optimization redundant for the char[] path. - Rename AsciiCharString -> RawUtf8String; extend its use to all raw UTF-8 byte[] inputs (not just ASCII), eliminating the areAllAsciiBytes scan and the wasteful decode+re-encode for non-ASCII content. - Simplify writeCharArrayTextValue: drop getBytesIfAscii scan and just use new String(char[], offset, len); packString handles encoding efficiently via compact strings. - Simplify writeRaw(String, int, int): drop getChars -> char[] detour, use substring directly. - Simplify writeString(Reader, int) len<0 path: drop StringBuilder -> char[] copy, use sb.toString() directly. - Remove unused imports and fields (Nullable, Charset, StandardCharsets, DEFAULT_CHARSET).
The previous Benchmarker/System.nanoTime() approach had no forking, no dead-code elimination (Blackhole), and no JIT-aware warmup, producing ~25ms stdev results. Replace with proper JMH benchmarks modelled on https://github.com/FasterXML/jackson-benchmarks. Changes: - Add sbt-jmh 0.4.7 plugin; enable JmhPlugin on msgpack-jackson3 - Add --add-opens JVM args for forked JMH processes (required for msgpack-core's Unsafe usage) - Remove commons-math3 test dependency (no longer needed) - Delete Benchmarker, MessagePackDataformatPojoBenchmarkTest, MessagePackDataformatHugeDataBenchmarkTest - Add src/jmh/java benchmark sources: * model/: MediaItem, MediaContent, Image, Size, Player, MediaItems (canonical jvm-serializers dataset, same as jackson-benchmarks) * BenchmarkState: pre-serialized bytes + ObjectMapper instances * MsgpackReadBenchmark: readPojoMsgpack / readPojoJson * MsgpackWriteBenchmark: writePojoMsgpack / writePojoJson * NopOutputStream: /dev/null sink for write benchmarks Run: ./sbt "msgpack-jackson3/jmh:run" Smoke-test: ./sbt 'msgpack-jackson3/jmh:run -f 1 -wi 1 -i 1'
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
msgpack-jackson3submodule (jackson-dataformat-msgpack3) that ports the existing Jackson 2.x integration to Jackson 3.x (tools.jacksonnamespace)com.github.sbt:junit-interface— no extra sbt plugins neededKey differences from
msgpack-jacksoncom.fasterxml.jackson→tools.jacksonTokenStreamLocationreplacesJsonLocation; byte offset passed ascolumnNr(truncates for inputs > 2 GB, documented with comment)JavaInfo/ char-array optimization removed — always false on Java 17+ sinceString.valueisbyte[]importPackageexcludesandroid.osandsun.*matchingmsgpack-corejavacOptionsset to--source 17 --target 17Test plan
./sbt "msgpack-jackson3/test"— 77 tests pass./sbt "msgpack-jackson3/test:compile"— no checkstyle errorsmsgpack-coreandmsgpack-jacksontests unaffected