Feature: file cache: max size#945
Conversation
# Conflicts: # include/vcpkg/base/messages.h # src/vcpkg/base/messages.cpp
# Conflicts: # src/vcpkg/binarycaching.cpp
# Conflicts: # src/vcpkg-test/metrics.cpp # src/vcpkg/binarycaching.cpp
# Conflicts: # include/vcpkg/base/files.h # src/vcpkg-test/files.cpp # src/vcpkg/base/files.cpp # src/vcpkg/binarycaching.cpp
|
This is not 100% ready for merge, but ready for review. When I get a design approval I will handle all edge case errors and do localization. |
# Conflicts: # include/vcpkg/base/message-data.inc.h
BillyONeal
left a comment
There was a problem hiding this comment.
0. Every instance has a unique "sync" file in "<root>/sync/<random_number>".
1. Append a line to sync file of the format "<name_of_object>;<file_size>\n"
2. Read the new entries from other sync files
3. Check if files must be deleted to respect the limits
I do not believe this is correct. There is nothing to stop 2 instances from concurrently reading all the sync files, deciding that something needs to be deleted, writing their intent to delete to their own sync files, and now there is no way to resolve the ambiguity on who 'won' that race.
In particular, this is assuming that writes or appends within a file will be atomic, which is a feature most file systems do not provide and several network file systems extremely do not provide. The only operation which we can assume is atomic is the creation or removal of a file system entry; as in rename.
I think you need something like a read/write oplock here, where instances trying to remove entries from the cache are writers, and instances trying to do anything else are readers. Only one instance should be trying to delete out of the cache at a time, and if any instance is doing so, we need to make sure we don't delete a cache entry that other instances are potentially touching.
| cache.own_sync_file = get_own_sync_file(cache.sync_root_dir); | ||
| if (cache.folder_settings.delete_policy != FolderSettings::DeletePolicy::None) | ||
| { | ||
| std::unordered_map<std::string, uint64_t> file_sizes; |
There was a problem hiding this comment.
I don't believe that in general this works correctly, due to concurrent insertions into the cache. Cache entries are not meaningfully part of the cache until they have been renamed into place.
There was a problem hiding this comment.
This is also a cache so that I don't have to call fs.file_size() for every cache entry. If a file is not in the cache anymore, this cache size is not used.
| } | ||
| } | ||
|
|
||
| size_t push_success(const BinaryPackageWriteInfo& request, MessageSink& msg_sink) override |
There was a problem hiding this comment.
I don't think 'during push_success' is the right time to be doing this. It probably should be a one time pass that looks at the cache(s) after all cache operations this particular vcpkg instance will do on this run.
There was a problem hiding this comment.
But then the cache could be larger then its max size in the meantime?
The instances only write to the sync files what they want to add, not what they want to delete.
No this code does not assumes this. I explicitly handle this case ... I just realize that I wanted to implement this but haven't done this yet ... 🤦 🙈 Edit: Now implemented Start:
|
# Conflicts: # src/vcpkg/binarycaching.cpp
# Conflicts: # locales/messages.json
|
Has there any progress been made on this direction? it's becoming quite an effort to cleanup manually the vcpkg cache directory from time to time. Thank you in advance 🙏🏻 |
# Conflicts: # include/vcpkg/base/message-data.inc.h # locales/messages.json
# Conflicts: # include/vcpkg/base/message-data.inc.h # locales/messages.json
|
@julianxhokaxhiu You could compile your own version of vcpkg-tool 🙈 I am using this on mac/linux/windows on a daily basis since years and never had any problems. |
# Conflicts: # src/vcpkg/binarycaching.cpp
# Conflicts: # include/vcpkg/base/files.h # include/vcpkg/base/message-data.inc.h # locales/messages.json # src/vcpkg/base/files.cpp # src/vcpkg/base/json.cpp # src/vcpkg/binarycaching.cpp
There was a problem hiding this comment.
Pull request overview
This PR introduces configurable eviction for the on-disk “files” binary cache by adding a settings file (settings.json) + schema, tracking cache entries across concurrent vcpkg instances via per-process sync files, and extending the filesystem/json infrastructure needed to enforce size/age/free-space limits.
Changes:
- Add a
FilesCacheManagerto coordinate cache-size enforcement (including multi-process sync updates) for thefilesbinary provider. - Add JSON helpers (
Json::parse_file,PositiveNumberDeserializer) and new localized messages for settings parsing/reporting. - Extend filesystem APIs with access-time getters/setters and
space()info; adjust JSON numeric stringify formatting (test updates included).
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 23 comments.
Show a summary per file
| File | Description |
|---|---|
| src/vcpkg/binarycaching.cpp | Adds file-cache settings parsing and a multi-process cache manager that evicts entries based on configured limits. |
| src/vcpkg/base/json.cpp | Adds parse_file(), introduces PositiveNumberDeserializer, and changes JSON number stringify formatting. |
| src/vcpkg/base/files.cpp | Adds filesystem support for access times, disk space queries, and overwrite control for file creation. |
| src/vcpkg/base/downloads.cpp | Updates WriteFilePointer construction to match new overwrite-aware signature. |
| src/vcpkg-test/metrics.cpp | Updates expected JSON payload formatting due to numeric stringify changes. |
| src/vcpkg-test/files.cpp | Adds tests for new filesystem time APIs and strengthens temp directory setup. |
| locales/messages.json | Adds new localized message strings used by settings parsing/reporting. |
| include/vcpkg/base/message-data.inc.h | Declares new message IDs used by settings parsing/reporting. |
| include/vcpkg/base/jsonreader.h | Declares PositiveNumberDeserializer. |
| include/vcpkg/base/json.h | Declares Json::parse_file() and adds needed forward declarations. |
| include/vcpkg/base/fwd/files.h | Introduces Overwrite enum. |
| include/vcpkg/base/files.h | Updates WriteFilePointer API; adds space_info, access-time and space APIs. |
| docs/file-cache-settings.schema.json | Adds JSON schema for settings.json controlling cache eviction behavior. |
Comments suppressed due to low confidence (8)
docs/file-cache-settings.schema.json:26
- Schema allows
0formax-age-in-days(minimum: 0), but the current deserializer rejects 0. If0is intended to mean “no age limit”, the parser should accept it; otherwise, make the schema require> 0.
"max-age-in-days": {
"description": "The maximum age of the cache in days.",
"type": "number",
"minimum": 0
},
docs/file-cache-settings.schema.json:31
- Schema allows
0forkeep-available-in-percentage(minimum: 0), but the current parser rejects 0 even though the eviction code treats 0 as disabling the check. Align schema and parsing so0works as documented/intended.
"keep-available-in-percentage": {
"description": "How much space should be kept available on the disk in percentage.",
"type": "number",
"minimum": 0
}
src/vcpkg/binarycaching.cpp:471
make_space_for()reads sync updates and callsfolder_settings.last_time(...)even when eviction is disabled (delete_policy == None). With the currentlast_time()implementation this will crash. Skip sync processing when eviction is disabled or makelast_time()safe forNone.
// 2. Read changes from other instances
get_sync_updates([&](auto id, auto size) {
const auto archive_path = archives_root_dir / files_archive_subpath(id.to_string());
auto last_time = folder_settings.last_time(fs, archive_path, IgnoreErrors{});
auto size_as_int = Strings::strto<uint64_t>(size).value_or_exit(VCPKG_LINE_INFO);
src/vcpkg/binarycaching.cpp:542
file_added_to_cache()always callsfolder_settings.last_time(...), which will crash fordelete-policy: "None". If eviction is disabled, this should be a no-op (or should use a safe timestamp source).
void file_added_to_cache(const Path& file_path, uint64_t file_size)
{
auto last_time = folder_settings.last_time(fs, file_path, IgnoreErrors{});
file_data.push(FileData{file_path, file_size, last_time});
current_size += file_size;
src/vcpkg/binarycaching.cpp:337
KEEP_AVAILABLE_PERCENTAGEis parsed twice. Remove the duplicateoptional_object_fieldcall to avoid confusion and make future edits less error-prone.
reader.optional_object_field(obj,
KEEP_AVAILABLE_PERCENTAGE,
folder_settings.keep_available_percentage,
Json::PositiveNumberDeserializer::instance);
src/vcpkg/binarycaching.cpp:307
- The parse error text
"Unexped DeletePolicy"has a typo and is hard to understand. Replace it with a correctly spelled, descriptive message (ideally via the message system).
r.add_generic_error(type_name(), LocalizedString::from_raw("Unexped DeletePolicy"));
return nullopt;
src/vcpkg/base/files.cpp:4121
filetime_to_int64()multiplies by 100 to return nanoseconds after applying an epoch shift. If callers compare this tofile_time_now()/last_write_time()on Windows (which typically use 100ns file_clock ticks), eviction and age calculations will be wrong. Prefer returning the raw FILETIME tick count (or otherwise matchfile_time_type’s convention).
static int64_t filetime_to_int64(FILETIME filetime)
{
ULARGE_INTEGER large_integer;
large_integer.HighPart = filetime.dwHighDateTime;
large_integer.LowPart = filetime.dwLowDateTime;
src/vcpkg/base/files.cpp:4156
last_access_time()returns the converted value fromfiletime_to_int64(). Ensure the returned timestamp is in the same units/epoch aslast_write_time()/file_time_now()on Windows so access-time eviction policies and comparisons work correctly.
FILETIME last_access_time;
if (!GetFileTime(fh.h_file, nullptr, &last_access_time, nullptr))
{
ec.assign(GetLastError(), std::system_category());
return {};
| const auto oldest_date = | ||
| (folder_settings.max_age.count() ? fs.file_time_now() - folder_settings.max_age.count() : 0); |
| if (fs.file_time_now() - fs.last_write_time(file, VCPKG_LINE_INFO) > | ||
| duration_cast<nanoseconds>(24h).count()) |
| auto file_handle = fs.open_for_read(file, VCPKG_LINE_INFO); | ||
| file_handle.try_seek_to(cur_size).value_or_exit(VCPKG_LINE_INFO); | ||
| std::error_code ec; | ||
| auto file_content = file_handle.read_to_end(ec); |
| current_size += size_as_int; | ||
| }); | ||
|
|
||
| // 3. Delete files if not enouph space is available |
| } | ||
| max_size_in_bytes -= file_size; | ||
| Debug::print(fmt::format("{:<25}{:>20}\n", "max_cache_size", max_size_in_bytes)); | ||
| // 5. Delete files until the constraints are fullfilled |
|
|
||
| Optional<double> PositiveNumberDeserializer::visit_number(Reader&, double value) const | ||
| { | ||
| if (value <= 0) |
| // FILETIME contains a 64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 | ||
| // (UTC). shift epoch by 400 years to fit into int64_t (can hold 292 years) | ||
| static constexpr uint64_t epoch_shift = | ||
| std::chrono::duration_cast<std::chrono::duration<uint64_t, std::nano>>(std::chrono::hours{24 * 365 * 400}) | ||
| .count() / | ||
| 100; | ||
|
|
||
| static int64_t filetime_to_int64(FILETIME filetime) | ||
| { | ||
| ULARGE_INTEGER large_integer; | ||
| large_integer.HighPart = filetime.dwHighDateTime; | ||
| large_integer.LowPart = filetime.dwLowDateTime; | ||
| large_integer.QuadPart -= epoch_shift; | ||
| return large_integer.QuadPart * 100; | ||
| } | ||
|
|
||
| static FILETIME int64_to_filetime(int64_t value) | ||
| { | ||
| ULARGE_INTEGER large_integer; | ||
| FILETIME filetime; | ||
| large_integer.QuadPart = static_cast<uint64_t>(value / 100); | ||
| large_integer.QuadPart += epoch_shift; |
| std::unordered_map<std::string, uint64_t> file_sizes; | ||
| get_sync_updates( | ||
| [&](auto id, auto size) { | ||
| auto size_as_int = Strings::strto<uint64_t>(size).value_or_exit(VCPKG_LINE_INFO); | ||
| file_sizes.emplace(id.to_string(), size_as_int); |
| while (true) | ||
| { | ||
| Path path = sync_root_dir / fmt::format("{}", rand()); | ||
| std::error_code ec; | ||
| WriteFilePointer wp(path, Append::NO, Overwrite::NO, ec); |
| constexpr static StringLiteral MAX_AGE_DAYS = "max-age-in-days"; | ||
| constexpr static StringLiteral KEEP_AVAILABLE_PERCENTAGE = "keep-available-in-percentage"; | ||
| constexpr static StringLiteral DELETE_POLICY = "delete-policy"; | ||
| constexpr static StringLiteral MODIFICATION_DATE_UPDATE_INTERVAL = "modification-date-update-interval"; |
Fixes microsoft/vcpkg#19452
If no settings are found a new settings file is created. You can set the following properties:
vcpkg-tool/docs/file-cache-settings.schema.json
Lines 8 to 32 in 5ab77d3
To ensure that the limits are respected when multiple instances are running the following is done:
0. Every instance has a unique "sync" file in "/sync/<random_number>".