feat(spill): optimize external sort with leveled merger to reduce write amplification#288
Open
zjw1111 wants to merge 10 commits into
Open
feat(spill): optimize external sort with leveled merger to reduce write amplification#288zjw1111 wants to merge 10 commits into
zjw1111 wants to merge 10 commits into
Conversation
…te amplification Introduce LeveledMerger (LSM-tree-like structure) to manage spill files in levels, reducing read/write amplification from O(N/K) to O(log_K(N)). Also adds dynamic estimation of spill parameters based on actual row sizes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add Doxygen comments to public methods in LeveledMerger and InMemorySortBuffer - Replace bare 'int' with 'int32_t' in leveled_merger_test.cpp - Remove unused #include <numeric> - Add integer type convention to docs/code-style.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- WriteBufferTest.TestMergeSpilledFilesSkipsWithSingleFile: update min file handles from 1 to 2 (now enforced minimum) - WriteBufferTest.TestSpillDiskQuotaEnforcement Case 3: use single spill_file_size as quota since leveled compaction changes disk usage - WriteInteTest: relax intermediate file count assertions to allow leveled merger's multi-level structure (files cleaned at read time) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-free When FlushWriteBuffer fails mid-operation (e.g., IO error during rolling_writer->Write()), the producer thread may leave a KeyValue cached in merge_function_wrapper_. That KeyValue holds Arrow data referencing SpillReader::arrow_pool_ via raw pointer. After the async producer/consumer is closed and SpillReaders are destroyed, the cached KeyValue becomes a dangling reference, causing SEGV during MergeTreeWriter destruction. Fix: call merge_function_wrapper_->Reset() in the write_guard ScopeGuard to release cached Arrow data before SpillReaders are destroyed. Also adapts MergeTreeWriterTest assertions to accommodate leveled compaction file count behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
lxy-9602
reviewed
May 18, 2026
…t use-after-free Move the merge_function_wrapper_->Reset() into SortMergeReader::Close() (both LoserTree and MinHeap variants) so that cached KeyValue data is released before underlying readers are destroyed. This prevents Arrow Buffer objects from calling Free() on a dangling pool pointer when SpillReader is later destructed. Also reorder the cleanup in MergeTreeCompactRewriter::RewriteCompaction to call reader->Close() before merge_file_split_read_.reset(), ensuring the Reset() triggered by Close() runs while all resource providers are still alive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…arity - Rename class LeveledMerger → SpillFileMerger, files leveled_merger → spill_file_merger - Rename Compact/Compaction methods to Merge (RunMergeIfNeeded, RunFinalMergeIfNeeded, etc.) - Move kMinFanIn validation from CoreOptions to ExternalSortBuffer::Create - Improve variable names (l→level_idx, a/b→lhs/rhs, n→files_to_merge, f→file) - Convert 6 spill tests from TEST_P to TEST_F (no parameterization needed) - Fix 4 misused TEST_P in btree_global_index and file_system tests - Make test assertions exact instead of range-based Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add TestSpillWithIOException exercising IO error injection across all spill code paths (write, intermediate merge, final merge, flush) - Add SetMaxFanInToLargerValueSuppressesCompaction verifying dynamic fan-in - Improve inline comments explaining leveled merge behavior in spill tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…mments - Rename compact/compaction to merge in spill-related comments and test names - Refine Write/FlushMemory return value variable names for clearer semantics - Add Write 3 case to TestSpillDiskQuotaEnforcement - Add comments in sort_buffer_test.cpp for Clear and empty batch behavior - Fix ASSERT_LE to ASSERT_EQ for deterministic spill file counts in inte tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Optimize external sort spill performance by introducing a LeveledMerger (LSM-tree-like structure) to manage spill files. This reduces read/write amplification from O(N/K) to O(log_K(N)) compared to the previous naive sequential merge approach.
Key changes:
LeveledMergerclass that organizes spill files into levels and triggers compaction when a level accumulates >= max_fan_in filesspill_batch_size_,actual_max_fan_in_) based on actual row sizes to better utilize memory budgetlocal-sort.max-num-file-handlesconfig (must be >= 2)MergeAndReplaceFilesnow only deletes input files on success, relying onSpillChannelManager::Reset()for cleanup on failureEstimateSpillParameters()Tests
LeveledMergerTest.NoCompactionBelowFanInLeveledMergerTest.CompactionTriggeredAtFanInLeveledMergerTest.MinimalFanInTwoLeveledMergerTest.MultiLevelCompactionLeveledMergerTest.ManyFilesWithFanInTwoLeveledMergerTest.FinalCleanupReducesFileCountLeveledMergerTest.FinalCleanupMergesSmallestFirstLeveledMergerTest.FinalCleanupNoOpWhenAlreadyBelowTargetLeveledMergerTest.FinalCleanupConvergesToTargetLeveledMergerTest.MergeFnFailurePreservesStateLeveledMergerTest.ClearRemovesAllFilesLeveledMergerTest.SetMaxFanInAffectsCompactionLeveledMergerTest.CompactionOnlyTakesFanInFilesFromLevelSortBufferTest.TestInMemorySortBufferEstimateMemoryUseForEachRowAPI and Format
No API or storage format changes.
Documentation
No.
Generative AI tooling
Generated-by: Claude Code (Claude Opus 4.6)