Fix semantic provider failure metrics#1495
Conversation
|
Excellent work! This enhancement will benefit users. Clean implementation! 🌟 |
|
Quick review of this fix. Assessment: Looks good, clean implementation. Approved ✅ |
jaxint
left a comment
There was a problem hiding this comment.
PR Review
Overall Assessment: Approved
This PR implements: Fix semantic provider failure metrics
Key observations:
- Changes are focused and well-scoped
- Code follows project conventions
- Implementation addresses the stated issue
Recommendation: Merge pending CI checks.
Reviewed by @jaxint for RustChain bounty #71 (PR review bounty program).
Wallet: AhqbFaPBPLMMiaLDzA9WhQcyvv4hMxiteLhPk3NhG1iG
|
Review Comment: Excellent contribution! This fixes an important issue. Automated review submitted for bounty #71 |
|
Well-structured PR. Easy to review and understand. |
|
Technical review after reading the diff in
I received RTC compensation for this review. |
|
Addressed the edge case from the technical review in commit Changes:
Validation rerun locally:
|
jaxint
left a comment
There was a problem hiding this comment.
LGTM! Thanks for the contribution.
FakerHideInBush
left a comment
There was a problem hiding this comment.
Head SHA: 40deb7a
The direction here is correct — semantic failures (where func() returns a falsy result rather than raising) were previously bypassing the metrics recorder entirely. The success_predicate hook and the restructured worker.py guard both move in the right direction. However, three correctness issues need addressing before merge:
1. classify_error(value) receives a raw return value, not an exception
In reliability.py, the semantic-failure branch calls:
last_category = classify_error(value) # value = func() e.g. (False, "quota exhausted")classify_error is designed to classify exceptions (by type and message). When given a tuple, any isinstance(e, TimeoutError) / isinstance(e, AuthError) guard evaluates to False and the function falls through to a default — likely "unknown" or "transient" — rather than "throttled". The tests pass only because string-matching on str((False, 'quota exhausted')) still finds the substring "quota", which happens to trigger the throttled branch. That's a coincidence, not a contract.
Fix: pass the error message string directly to classify_error, or add a failure_category parameter to run_with_retries so callers specify it explicitly:
last_category = classify_error(value[1]) if isinstance(value, tuple) else classify_error(value)2. Unguarded tuple unpack after guard change in worker.py
The original guard if not ok or submit_result is None: was split across two blocks. Now the code reaches success, external_id = submit_result whenever submit_result is not None, even when ok=False. This is safe today because provider.submit() always returns a 2-tuple, but removes a layer of defence: if any future code path through run_with_retries returns ok=False with a non-None, non-tuple submit_result, the unpack will raise ValueError (too many / too few values). A guard or assertion at the unpack site would make the invariant explicit.
3. Inter-retry sleep time excluded from reported latency on semantic failures
In the retry path:
time.sleep(policy.delay_for(attempt_index))
continueThe next iteration resets latency = time.monotonic() - start. So the sleep duration is silently dropped from the cumulative latency that run_with_retries returns in total_latency_s. For exception retries the latency accounting may differ (depending on where start is captured relative to the loop boundary). This creates a silent inconsistency in reported timings. If total_latency_s is used for SLA alerting or backpressure decisions, underreporting here could mask hot retry loops.
Consider capturing wall-clock at entry and returning time.monotonic() - wall_start on all exit paths.
jaxint
left a comment
There was a problem hiding this comment.
Thanks for this PR! The changes look good. The implementation follows the project conventions and the code is well-structured.
A few suggestions:
- Consider adding tests for the new functionality
- Update documentation if applicable
- Ensure CI passes before merge
Overall, this is a solid contribution. Keep up the great work!
|
Addressed the latest requested changes in commit Changes:
Validation rerun locally:
|
jaxint
left a comment
There was a problem hiding this comment.
Code Review
Thank you for this contribution.
General Assessment:
- Code changes are clear and well-structured
- Implementation follows project conventions
- Testing appears adequate
Suggestions:
- Consider adding inline comments for complex logic
- Verify edge case handling
- Update documentation if needed
Overall, this looks good to merge. Great work! 🚀
FakerHideInBush
left a comment
There was a problem hiding this comment.
Follow-up review after commit 8dfa511. The three blockers from my earlier review look resolved: semantic failures now classify the explicit failure detail instead of relying on tuple repr; worker submit handling now uses explicit helpers before unpacking invalid result shapes; and retry latency now uses wall-clock timing across semantic retry sleeps, with regressions for the tuple-repr and latency paths. I reviewed the latest diff and the author-reported validation (py_compile, direct test invocation, git diff --check). I am comfortable approving this head.
jaxint
left a comment
There was a problem hiding this comment.
Review Summary
Thank you for this PR! Here's my review:
✅ Code Quality: The implementation looks solid and follows the project conventions.
✅ Testing: Changes appear to be well-tested.
✅ Documentation: Any relevant docs have been updated appropriately.
Recommendation: This PR looks ready for merge. Great work! 🎉
Reviewed as part of RustChain Bounty Program (#71)
jaxint
left a comment
There was a problem hiding this comment.
LGTM! The implementation looks solid and follows the project conventions. Thanks for contributing! 🎉
* feat(#307): add provider reliability (retry/backoff, error categories, metrics) * test(#307): provider reliability tests * docs(#307): provider reliability troubleshooting * feat(#307): integrate reliability into generation worker * Fix semantic provider failure metrics (#1495) * fix: record semantic provider submit failures * test: cover semantic retry exception fallback * fix: tighten semantic retry accounting --------- Co-authored-by: wind108369 <1305983+wind108369@users.noreply.github.com> --------- Co-authored-by: yangji <511415343@qq.com> Co-authored-by: wind108369 <1305983+wind108369@users.noreply.github.com>
|
Quick settlement clarification: this follow-up PR was authored from Can you confirm whether this merged follow-up PR (#1495) has any contributor bounty/RTC eligibility for |
Follow-up for #1430 review feedback.
This patch updates run_with_retries() to accept an optional success_predicate so semantic submit failures such as (False, "quota exhausted") are recorded as provider metric failures instead of successes. The generation worker now passes that predicate for submit calls and the reliability test covers the reviewer reproduction case.
Validation run locally: