Skip to content

fix: 4 concurrency & null-safety bugs from code review#531

Merged
ginccc merged 9 commits into
mainfrom
fix/code-review-bugs
Jun 18, 2026
Merged

fix: 4 concurrency & null-safety bugs from code review#531
ginccc merged 9 commits into
mainfrom
fix/code-review-bugs

Conversation

@ginccc

@ginccc ginccc commented Jun 10, 2026

Copy link
Copy Markdown
Member

Summary

Fixes 4 verified bugs found during code review, ranging from HIGH to MEDIUM severity. All fixes include regression tests and were validated by a thorough self-review.

Changes

Fix # 1 — PropertySetterTask NPE on blank input (HIGH)

The CATCH_ANY_INPUT_AS_PROPERTY handler dereferences getLatestData("input:initial") without a null check. When a client sends an empty/whitespace-only message, Conversation.storeUserInputInMemory skips storing input:initial → NPE → pipeline dies → conversation enters ERROR state.

Fix: Added null guards for both initialInputData and initialInput.

Fix # 2 — Config version race condition (HIGH PG / MEDIUM Mongo)

HistorizedResourceStore.update() performs a non-atomic read→increment→write cycle. Two concurrent edits both reading version N would both write N+1 — last write wins silently. On PostgreSQL, ON CONFLICT DO UPDATE in history also silently merges duplicates. On MongoDB, history insertOne throws unhandled MongoWriteException (HTTP 500 instead of 409).

Fix: Introduced optimistic locking via storeIfCurrentVersion() on IResourceStorage with atomic conditional writes in both backends. ResourceModifiedException now surfaces as HTTP 409.

Fix # 3 — ComponentCache HashMap race (MEDIUM)

ComponentCache is @ApplicationScoped (singleton) using plain HashMap. computeIfAbsent on HashMap is not thread-safe — concurrent reads (every conversation turn) and writes (lazy agent deployment) can corrupt the bucket structure.

Fix: Replaced HashMap with ConcurrentHashMap for both outer and inner maps.

Fix # 4 — Zombie-write snapshot clobber after timeout (MEDIUM)

When an agent times out, future.cancel(true) sets the interrupt flag but can't stop threads blocked in non-interruptible I/O (LLM HTTP calls). The completed callable's onComplete callback fires and does an unconditional replaceOne that overwrites newer conversation state.

Fix: Check Thread.currentThread().isInterrupted() before calling onComplete(). Interrupted executions route to onFailure() instead.

Test Coverage

Test Class Before After New Tests
PropertySetterTaskTest 13 16 3 (missing/null/empty input)
HistorizedResourceStoreTest 10 11 1 (concurrent modification)
ComponentCacheTest 5 6 1 (concurrent stress: 8 threads × 500 ops)
BaseRuntimeTest 9 11 2 (cancelled → onFailure, normal → onComplete)

All tests use InOrder verification where ordering matters (history archived before conditional store).

Files Changed

File Change
PropertySetterTask.java Null guards on input:initial
IResourceStorage.java New storeIfCurrentVersion() default method
MongoResourceStorage.java Override with version-conditioned updateOne + dup-key catch
PostgresResourceStorage.java Override with UPDATE WHERE version=? + ON CONFLICT DO NOTHING
HistorizedResourceStore.java store()storeIfCurrentVersion() in update()
ComponentCache.java HashMapConcurrentHashMap
BaseRuntime.java Interrupt flag check before onComplete()
docs/changelog.md Changelog entries for all 4 fixes
4 test files 7 new regression tests

Verification

  • ./mvnw compile — ✅ BUILD SUCCESS
  • ./mvnw test — ✅ BUILD SUCCESS (all targeted test classes pass, 0 failures)
  • Self-review by 3 independent code reviewers — 4 findings fixed, 9 accepted/deferred

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📝 Documentation update
  • ♻️ Refactoring (no functional changes)
  • 🔧 Chore (dependency updates, CI changes, etc.)

Checklist

  • My code follows the project's code style
  • I have added tests that prove my fix/feature works
  • Existing tests pass locally (./mvnw clean verify -DskipITs)
  • I have updated documentation if needed
  • My commit messages follow conventional commits
  • I have not committed any secrets, API keys, or tokens
  • This PR has a clear, focused scope (one concern per PR)

Summary by CodeRabbit

  • Bug Fixes
    • Improved null-safety when setting properties if initial input is missing/blank
    • Added conditional/optimistic versioned writes to prevent version-race overwrites (including safer history persistence)
    • Fixed interruption handling to avoid stale results
    • Made the component cache thread-safe to prevent concurrent corruption
  • Tests
    • Added/expanded concurrency and interrupt-handling coverage; strengthened update/history ordering checks
    • Added tests for property-setting when initial input is absent or its result is null
  • Documentation / Chores
    • Updated changelog and CI CodeQL gating so CodeQL runs on main pushes while still skipping docs-only PRs

ginccc added 6 commits June 10, 2026 15:33
CATCH_ANY_INPUT_AS_PROPERTY handler dereferences getLatestData(input:initial)
without null check. When a client sends an empty/whitespace-only message,
Conversation.storeUserInputInMemory skips storing input:initial, causing NPE
that kills the pipeline and puts the conversation in ERROR state.

Added null guards for both initialInputData and initialInput.
3 new regression tests cover the blank input scenarios.
HistorizedResourceStore.update() had a non-atomic read-increment-write cycle.
Two concurrent edits both reading version N would both write N+1, with the
last write silently winning. On PostgreSQL, ON CONFLICT DO UPDATE in history
also silently merged duplicates.

Introduced storeIfCurrentVersion() default method on IResourceStorage with
atomic conditional writes in both backends:
- MongoDB: version-conditioned updateOne, catch duplicate-key on history
- PostgreSQL: UPDATE WHERE version=?, ON CONFLICT DO NOTHING for history

ResourceModifiedException now surfaces as HTTP 409 instead of silent data loss.
ComponentCache is @ApplicationScoped (singleton) but used plain HashMap.
computeIfAbsent on HashMap is not thread-safe — concurrent reads (every
conversation turn) and writes (lazy agent deployment) can corrupt the
bucket structure. ConcurrentHashMap is a thread-safe drop-in replacement.

Added concurrent stress test (8 threads, 500 ops each).
When an agent times out, future.cancel(true) sets the interrupt flag but
cannot stop threads blocked in non-interruptible I/O (LLM HTTP calls).
The completed callable's onComplete callback would fire and do an
unconditional replaceOne that overwrites newer conversation state.

Added interrupt flag check before onComplete — cancelled executions now
route to onFailure with a log warning instead of persisting stale data.
- HistorizedResourceStoreTest: add InOrder verification to prove history
  is archived BEFORE the conditional store (ordering is the whole point
  of the race fix). Verify history archived even when concurrent update
  throws ResourceModifiedException.
- BaseRuntimeTest: fix misleading comment — interrupt simulation is
  post-completion (non-interruptible I/O), not mid-execution cancel.
  Document pair-test relationship for the no-interruption test.
@ginccc ginccc requested a review from rolandpickl as a code owner June 10, 2026 14:41
@ginccc ginccc requested review from Copilot and removed request for rolandpickl June 10, 2026 14:41
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 14162065-ffde-4515-bdf5-bdaebdd33a23

📥 Commits

Reviewing files that changed from the base of the PR and between d58b145 and e246ef6.

📒 Files selected for processing (4)
  • .github/workflows/ci.yml
  • docs/changelog.md
  • src/test/java/ai/labs/eddi/engine/runtime/BaseRuntimeTest.java
  • src/test/java/ai/labs/eddi/modules/properties/impl/PropertySetterTaskTest.java
💤 Files with no reviewable changes (2)
  • src/test/java/ai/labs/eddi/engine/runtime/BaseRuntimeTest.java
  • src/test/java/ai/labs/eddi/modules/properties/impl/PropertySetterTaskTest.java
✅ Files skipped from review due to trivial changes (1)
  • docs/changelog.md

📝 Walkthrough

Walkthrough

This PR applies four focused fixes with tests: version-checked conditional storage for resources, concurrent-safe component maps, interrupt-aware runtime task completion, and null-safe property setting; plus changelog and CI documentation describing the changes.

Changes

Concurrency & Null Safety Fixes

Layer / File(s) Summary
Changelog and CI documentation
docs/changelog.md, .github/workflows/ci.yml
Changelog entry documents the OpenSSF Scorecard mitigation and four bug fixes. CI workflow updated to run CodeQL on all push events (to catch Dependabot merges) while preserving PR gating.
Optimistic locking for resource versioning
src/main/java/ai/labs/eddi/datastore/IResourceStorage.java, src/main/java/ai/labs/eddi/datastore/mongo/MongoResourceStorage.java, src/main/java/ai/labs/eddi/datastore/postgres/PostgresResourceStorage.java, src/main/java/ai/labs/eddi/datastore/HistorizedResourceStore.java, src/test/java/ai/labs/eddi/datastore/HistorizedResourceStoreTest.java
Adds storeIfCurrentVersion default method to storage interface and implements atomic conditional updates in Mongo (matching _id and _version) and Postgres (UPDATE ... WHERE version = expectedVersion). HistorizedResourceStore uses conditional store instead of unconditional write. Mongo history insert ignores duplicate-key 11000, Postgres history upsert uses ON CONFLICT DO NOTHING. Tests assert history archival precedes conditional write and verifies archival occurs even when conditional store fails.
ComponentCache thread safety
src/main/java/ai/labs/eddi/engine/lifecycle/internal/ComponentCache.java, src/test/java/ai/labs/eddi/engine/lifecycle/internal/ComponentCacheTest.java
Switches outer componentMaps and per-type maps from HashMap to ConcurrentHashMap. Adds multithreaded test using fixed thread pool and CountDownLatch to synchronize concurrent reads and writes across multiple component types; verifies no corruption and expected maps are populated after concurrent access.
BaseRuntime interrupt handling
src/main/java/ai/labs/eddi/engine/runtime/BaseRuntime.java, src/test/java/ai/labs/eddi/engine/runtime/BaseRuntimeTest.java
submitCallable now checks Thread.currentThread().isInterrupted() after computing result; if interrupted, calls callback.onFailure(InterruptedException) and returns null, otherwise calls onComplete(result). Tests cover both interrupted path (via thread flag set in callable) and non-interrupted path with callback verification.
PropertySetterTask null safety
src/main/java/ai/labs/eddi/modules/properties/impl/PropertySetterTask.java, src/test/java/ai/labs/eddi/modules/properties/impl/PropertySetterTaskTest.java
Adds null-check before accessing input:initial data result; only writes user_input property when data exists and is non-empty. Tests verify no exception and no property write when initial input is missing or returns null.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested reviewers

  • rolandpickl

Poem

🐰 Four fuzzy bugs hop away today,
Locks prevent the race-condition play,
Threads now safe in ConcurrentMaps' keep,
Null checks guard where dangers creep,
Interrupts don't clobber dreams—hooray! 🥕

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 3.13% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: fixing 4 concurrency and null-safety bugs discovered during code review, which aligns perfectly with the changeset's fixes across PropertySetterTask, HistorizedResourceStore, ComponentCache, and BaseRuntime.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/code-review-bugs

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown

⚠️ Deprecation Warning: The deny-licenses option is deprecated for possible removal in the next major release. For more information, see issue 997.

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses four previously identified defects affecting null-safety and concurrent execution/persistence across the runtime, component lifecycle cache, and resource versioning layer (MongoDB/PostgreSQL), with accompanying regression tests and changelog documentation.

Changes:

  • Add null-guards in PropertySetterTask for input:initial when CATCH_ANY_INPUT_AS_PROPERTY is active.
  • Introduce optimistic locking in HistorizedResourceStore.update() via IResourceStorage.storeIfCurrentVersion() and implement conditional writes in MongoDB/PostgreSQL backends.
  • Make ComponentCache thread-safe by replacing HashMap with ConcurrentHashMap, and prevent “zombie-write” persistence by routing interrupted executions to onFailure() in BaseRuntime.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/main/java/ai/labs/eddi/modules/properties/impl/PropertySetterTask.java Adds null checks around input:initial to avoid NPEs on blank/missing user input.
src/main/java/ai/labs/eddi/datastore/IResourceStorage.java Adds storeIfCurrentVersion() default method to support optimistic locking in storage backends.
src/main/java/ai/labs/eddi/datastore/mongo/MongoResourceStorage.java Implements conditional update for optimistic locking and hardens history insert against duplicate-key races.
src/main/java/ai/labs/eddi/datastore/postgres/PostgresResourceStorage.java Implements UPDATE ... WHERE version=? optimistic locking and avoids history merge on conflicts.
src/main/java/ai/labs/eddi/datastore/HistorizedResourceStore.java Switches update path to call storeIfCurrentVersion() instead of unconditional store.
src/main/java/ai/labs/eddi/engine/lifecycle/internal/ComponentCache.java Replaces non-thread-safe HashMap usage with ConcurrentHashMap.
src/main/java/ai/labs/eddi/engine/runtime/BaseRuntime.java Prevents interrupted executions from calling onComplete() (avoids stale persistence after cancel/timeout).
src/test/java/ai/labs/eddi/modules/properties/impl/PropertySetterTaskTest.java Adds regression tests for missing/null/empty input:initial cases.
src/test/java/ai/labs/eddi/datastore/HistorizedResourceStoreTest.java Updates tests to assert storeIfCurrentVersion() usage and ordering; adds concurrent-modification test.
src/test/java/ai/labs/eddi/engine/lifecycle/internal/ComponentCacheTest.java Adds concurrent access stress test to validate cache thread-safety.
src/test/java/ai/labs/eddi/engine/runtime/BaseRuntimeTest.java Adds tests ensuring interrupted executions route to onFailure() and non-interrupted executions still complete normally.
docs/changelog.md Documents all four bug fixes with context and affected files/tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/main/java/ai/labs/eddi/engine/runtime/BaseRuntime.java
Comment thread src/main/java/ai/labs/eddi/datastore/IResourceStorage.java
Comment thread src/test/java/ai/labs/eddi/datastore/HistorizedResourceStoreTest.java Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/changelog.md`:
- Around line 7-38: Add a new "Design decisions made" subsection to the
2026-06-10 changelog entry that briefly documents the rationale behind each fix
and any trade-offs or alternatives considered: mention why null guards were
chosen for PropertySetterTask (avoid NPE vs. stricter input normalization), why
optimistic locking via IResourceStorage.storeIfCurrentVersion with
MongoResourceStorage/PostgresResourceStorage overrides was used (avoid
DB-specific atomic ops vs. heavier locking), why ComponentCache was switched
from HashMap to ConcurrentHashMap (thread-safety with minimal API changes vs.
full cache redesign), and why BaseRuntime now checks
Thread.currentThread().isInterrupted() before onComplete() (prevent zombie-write
clobber vs. more complex cancellation handling); also include a short "what's
next" note listing follow-ups (e.g., monitor for contention, consider
schema-level constraints, evaluate input normalization) and reference the
affected symbols PropertySetterTask, IResourceStorage, MongoResourceStorage,
PostgresResourceStorage, ComponentCache, and BaseRuntime so reviewers can find
the code.

In `@src/main/java/ai/labs/eddi/engine/runtime/BaseRuntime.java`:
- Around line 116-118: The warning lacks conversation/agent context: update the
log.warn call in BaseRuntime where the cancellation path discards results to
include identifying context (e.g., conversationId, agentId, and any
execution/requestId available in scope) using JBoss Logger placeholders, and
also augment the InterruptedException message passed to callback.onFailure to
include the same identifiers so logs and the exception can be correlated; locate
the block around BaseRuntime where log.warn(...) and callback.onFailure(new
InterruptedException(...)) are invoked and add the context variables to both the
log message and exception text.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d9f82b4e-50a0-4a4b-8b3b-ac5b163ed99d

📥 Commits

Reviewing files that changed from the base of the PR and between e35c57e and d6fa368.

📒 Files selected for processing (12)
  • docs/changelog.md
  • src/main/java/ai/labs/eddi/datastore/HistorizedResourceStore.java
  • src/main/java/ai/labs/eddi/datastore/IResourceStorage.java
  • src/main/java/ai/labs/eddi/datastore/mongo/MongoResourceStorage.java
  • src/main/java/ai/labs/eddi/datastore/postgres/PostgresResourceStorage.java
  • src/main/java/ai/labs/eddi/engine/lifecycle/internal/ComponentCache.java
  • src/main/java/ai/labs/eddi/engine/runtime/BaseRuntime.java
  • src/main/java/ai/labs/eddi/modules/properties/impl/PropertySetterTask.java
  • src/test/java/ai/labs/eddi/datastore/HistorizedResourceStoreTest.java
  • src/test/java/ai/labs/eddi/engine/lifecycle/internal/ComponentCacheTest.java
  • src/test/java/ai/labs/eddi/engine/runtime/BaseRuntimeTest.java
  • src/test/java/ai/labs/eddi/modules/properties/impl/PropertySetterTaskTest.java

Comment thread docs/changelog.md
Comment thread src/main/java/ai/labs/eddi/engine/runtime/BaseRuntime.java Outdated
ginccc added 2 commits June 10, 2026 17:24
… accuracy, log context

- BaseRuntime: return null instead of stale result when thread is
  interrupted (prevents leaking via Future.get()), add thread name
  to cancellation warning log for traceability
- IResourceStorage: fix misleading Javadoc — default storeIfCurrentVersion
  does not provide atomicity, clarify override responsibility
- HistorizedResourceStoreTest: fix inaccurate ordering comment — real risk
  is missing history entry, not orphaned new version
- changelog.md: add required Design Decisions subsection per guidelines
Strengthens submitCallable_cancelledRoutesToOnFailure test to also
assert that future.get() returns null — directly validates the
return-null fix that prevents stale result leakage via the Future.
@ginccc ginccc requested a review from rolandpickl June 18, 2026 12:28
@ginccc ginccc merged commit bd1b859 into main Jun 18, 2026
23 checks passed
@ginccc ginccc deleted the fix/code-review-bugs branch June 18, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants