[Store][TE] Fix remaining signed-char ::tolower/::toupper UB missed by #2367#2488
Conversation
…kvcache-ai#2367 PR kvcache-ai#2367 fixed UB from passing a (possibly signed) char to ::tolower/ ::toupper via std::transform, but three sibling sites on operator/CLI- controlled input were missed and remain on main: - mooncake-transfer-engine/src/transport/tcp_transport/tcp_transport.cpp: env MC_TCP_ENABLE_CONNECTION_POOL - mooncake-store/include/utils.h: size-string unit suffix (env MC_MMAP_ARENA_POOL_SIZE, global_segment_size flag, config map). Line 274 of this file already uses the safe idiom; line 225 was missed. - mooncake-store/include/ha/oplog/oplog_store_factory.h: HA oplog_store_type config string A byte > 0x7F sign-extends to a negative int, which is UB for these functions (argument must be representable as unsigned char or equal EOF). Apply the same lambda idiom as kvcache-ai#2367 and add <cctype> where it was only available transitively. Behavior is unchanged for ASCII input. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request refactors string case conversion operations across multiple files to use lambdas with unsigned char arguments instead of directly passing ::tolower or ::toupper, preventing undefined behavior. It also adds the necessary <cctype> header inclusions. The review feedback suggests explicitly specifying the return type of these lambdas as char (using -> char) to prevent potential compiler warnings regarding implicit narrowing conversions from int to char under strict compilation flags.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| std::transform(lower.begin(), lower.end(), lower.begin(), | ||
| [](unsigned char c) { return std::tolower(c); }); |
There was a problem hiding this comment.
To prevent potential compiler warnings regarding implicit narrowing conversions (from int to char) under strict warning flags (such as -Wconversion), it is recommended to explicitly specify the return type of the lambda as char.
| std::transform(lower.begin(), lower.end(), lower.begin(), | |
| [](unsigned char c) { return std::tolower(c); }); | |
| std::transform(lower.begin(), lower.end(), lower.begin(), | |
| [](unsigned char c) -> char { return std::tolower(c); }); |
| std::transform(unit.begin(), unit.end(), unit.begin(), | ||
| [](unsigned char c) { return std::toupper(c); }); |
There was a problem hiding this comment.
To prevent potential compiler warnings regarding implicit narrowing conversions (from int to char) under strict warning flags (such as -Wconversion), it is recommended to explicitly specify the return type of the lambda as char.
| std::transform(unit.begin(), unit.end(), unit.begin(), | |
| [](unsigned char c) { return std::toupper(c); }); | |
| std::transform(unit.begin(), unit.end(), unit.begin(), | |
| [](unsigned char c) -> char { return std::toupper(c); }); |
| std::transform(val.begin(), val.end(), val.begin(), | ||
| [](unsigned char c) { return std::tolower(c); }); |
There was a problem hiding this comment.
To prevent potential compiler warnings regarding implicit narrowing conversions (from int to char) under strict warning flags (such as -Wconversion), it is recommended to explicitly specify the return type of the lambda as char.
| std::transform(val.begin(), val.end(), val.begin(), | |
| [](unsigned char c) { return std::tolower(c); }); | |
| std::transform(val.begin(), val.end(), val.begin(), | |
| [](unsigned char c) -> char { return std::tolower(c); }); |
|
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
The single failing matrix leg — It ran out of space during a
The change itself is an 8-line, behavior-preserving |
…turn type Address review feedback on kvcache-ai#2488: std::tolower/std::toupper return int, so the lambdas implicitly narrow back to char when std::transform writes them to the std::string output iterator. Strict compiler flags (-Wconversion / -Wnarrowing on some toolchains) warn on this. Pinning the return type to char makes the narrowing explicit at the lambda boundary and silences the warning without changing behavior. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jfeng18
left a comment
There was a problem hiding this comment.
Verified the UB and the fix — LGTM with a scope note.
All three sites are genuine signed-char ctype UB: a std::string byte > 0x7F sign-extends to a negative int when passed to ::tolower/::toupper, which is UB (the argument must be representable as unsigned char or be EOF). The unsigned char lambda parameter forces the well-defined conversion first, and the single-arg std::tolower/std::toupper resolves to the <cctype> overload (not the 2-arg <locale> template). <cctype> is correctly added to the two TUs that only had it transitively. Behavior is identical for ASCII, well-defined for non-ASCII.
Scope note (non-blocking): the same UB class still exists in the C-loop form *ch = tolower(*ch) at 7 sites — topology.cpp:398, tent/src/runtime/memory_prober.cpp:377, tent/src/platform/cuda/cuda_probe.cpp:194, tent/src/platform/rocm/rocm_probe.cpp:185, benchmark/te_backend.cpp:53 & :66, benchmark/tent_backend.cpp:260. These all operate on pci_bus_id (a driver-generated PCI BDF string — ASCII, not operator-controllable), so they are outside this PR's threat model; could be a follow-up cleanup for completeness.
|
Thanks for the careful review, @jfeng18 — agreed on the scope note. Those 7 C-loop |
|
A separate follow-up PR for the 7 BDF-loop sites sounds great — keeping this PR scoped to the operator-controllable paths and handling the ASCII-only BDF loops independently is the right split. Thanks! |
|
Opened the follow-up for the 7 ASCII BDF-loop sites as discussed: #2504. Keeps this PR scoped to the operator-controllable paths. Thanks again for the review! |
|
Same comment with #2488. I'd like to use a small inline function. |
…ite casts Per @alogfans's review on kvcache-ai#2504/kvcache-ai#2488: replace the repeated static_cast<char>(std::tolower(static_cast<unsigned char>(*ch))) at each PCI-BDF lowercasing loop with a single inline te_lower() helper added to common.h. Same well-defined behavior, no more mass typecasts. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Thanks @alogfans! I introduced the uniform For this PR, the 3 sites are single |
…up to #2367/#2488) (#2504) * [TE] Fix signed-char tolower UB in PCI BDF lowercasing loops Follow-up to #2367/#2488 (requested in #2488): the C-style PCI-BDF lowercasing loops still pass a (possibly signed) char to tolower, which sign-extends bytes > 0x7F to a negative int — UB, since the argument must be representable as unsigned char or equal EOF. Convert the 7 sites (topology.cpp, tent cuda/rocm probes, tent memory_prober, benchmark te/tent backends) to static_cast<char>(std::tolower(static_cast<unsigned char>(*ch))), matching the idiom from #2367, and add/normalize <cctype> includes. The strings are driver-generated PCI bus ids (ASCII), so this is a correctness/consistency cleanup; behavior is identical for ASCII. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * [TE] Address review: use a uniform te_lower() helper instead of per-site casts Per @alogfans's review on #2504/#2488: replace the repeated static_cast<char>(std::tolower(static_cast<unsigned char>(*ch))) at each PCI-BDF lowercasing loop with a single inline te_lower() helper added to common.h. Same well-defined behavior, no more mass typecasts. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * [TE] Move te_lower() to a lightweight char_util.h to fix tent build The previous commit put te_lower() in common.h, but pulling common.h into the TENT TUs collides with tent/common/types.h, which defines LOCAL_SEGMENT_ID as a macro `#define LOCAL_SEGMENT_ID (0ull)` while common.h declares `const static int LOCAL_SEGMENT_ID = 0;` — the macro expands inside the declaration and breaks the build (build-flags CI). Put te_lower() in a new minimal header char_util.h that only pulls in <cctype>, and include it at all 7 sites instead of common.h. No collision with the tent macro, still a single uniform helper per @alogfans's review. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * [TE] Name the helper to_lower() to match the std::tolower convention Rename te_lower() -> to_lower() in char_util.h and the 7 call sites. The repo has no existing named lowercase helper to reuse, and to_lower reads naturally alongside std::tolower. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Fixes #2487
Description
Follow-up to #2367, which fixed UB from passing a (possibly signed)
charto::tolower/::toupperviastd::transform. Three sibling occurrences on operator/CLI-controlled input were missed and are still present onmain:mooncake-transfer-engine/src/transport/tcp_transport/tcp_transport.cpp:606MC_TCP_ENABLE_CONNECTION_POOLmooncake-store/include/utils.h:225MC_MMAP_ARENA_POOL_SIZE,global_segment_sizeCLI flag, config map)mooncake-store/include/ha/oplog/oplog_store_factory.h:34oplog_store_typeconfig stringPassing a byte > 0x7F to
::tolower/::touppersign-extends to a negativeint, which is UB (the argument must be representable asunsigned charor equalEOF). All three sites take operator/attacker-controllable bytes, not kernel-generated ASCII.Each is fixed with the exact idiom from #2367 —
[](unsigned char c) { return std::tolower(c); }— and<cctype>is added to the two files where it was only available transitively (utils.halready includes it). Notablyutils.h:274already uses the safe lambda; line 225 was simply overlooked in the original sweep.Behavior is unchanged for ASCII input; this only removes the UB on non-ASCII bytes.
Module
mooncake-store)mooncake-transfer-engine)Type of Change
How Has This Been Tested?
Test commands:
# Ubuntu 22.04 (aarch64) container, cmake -DWITH_STORE_RUST=OFF -DUSE_HTTP=ON, ninja ninjaTest results:
This is the same class of fix as #2367, which shipped with no new test: the change is behavior-preserving for ASCII, and the UB is not portably observable (on glibc the ctype tables span -128..255, so it does not crash there). Validation is compile-only plus the existing suite: a full incremental build succeeds (all three headers/TUs and their dependents recompile cleanly with the added
<cctype>includes), and the existingtcp_transport/ store tests pass.Checklist
./scripts/code_format.shpre-commit run --all-filesand all hooks pass(No new test, matching the precedent set by #2367 for this UB class; the fix is behavior-preserving for ASCII and the UB is not portably observable on the target platform.)
AI Assistance Disclosure
Claude Code (Claude Fable 5) located the three remaining sites missed by #2367, applied the established idiom, and verified the build in a local Linux container. I reviewed the changes and am responsible for them.