Skip to content

fix(concurrency): batch-2 hazards C1, H1-H7, M1-M9 (17 fixes)#2847

Open
hongwei1 wants to merge 12 commits into
OpenBankProject:developfrom
hongwei1:feature/concurrency-hazard-fixes-batch2
Open

fix(concurrency): batch-2 hazards C1, H1-H7, M1-M9 (17 fixes)#2847
hongwei1 wants to merge 12 commits into
OpenBankProject:developfrom
hongwei1:feature/concurrency-hazard-fixes-batch2

Conversation

@hongwei1

Copy link
Copy Markdown
Contributor

Summary

Fix 17 concurrency hazards identified in the batch-2 audit across consent
status transitions, Redis rate-limiting, mutable singletons, and business
status state machines.

Hazards addressed

Group Hazards Fix
C1 Bulk-payment batch-reference Propagate claim failure before fan-out
H1, H2, H3, M5 Consent & UAC status races Atomic conditional UPDATE (Doobie CAS)
H5, H6, H7, M8, M9 Mutable singletons & lazy init Replace with TrieMap / @volatile
H4, M6, M7 Redis non-atomic ops SETNX+EXPIRE → atomic SET NX EX
M1, M2, M3, M4 Business status machines & TR lock Doobie CAS conditional UPDATE

Tests

Five ConcurrencyRace*Test suites added in code.concurrency; all pass
locally and in CI (8-shard build green).

Also included

  • Remove run_all_tests.sh (superseded by run_tests_parallel.sh which
    mirrors the CI shard structure and injects required env vars)

hongwei1 added 12 commits June 26, 2026 09:10
…ards

Add five ScalaTest suites under code.concurrency (tagged ConcurrencyRace)
covering hazards found in the second codebase scan:

- ConcurrentBulkPaymentRaceTest        C1
- ConcurrentConsentStatusRaceTest      H1, H2, H3, M5
- ConcurrentRateLimiterRaceTest        H4, M6, M7
- ConcurrentMutableSingletonRaceTest   H5, H7, H6, M8, M9
- ConcurrentBusinessStatusRaceTest     M1, M2, M3, M4

Behavioural suites reproduce the race; structural suites (H6/M8/M9) assert the
hardening primitive (volatile field / concurrent map) is present. Redis-backed
suites self-skip when Redis is unavailable. Fixes land per file in follow-ups.
…e (C1)

createTransactionRequestBulk ran claimBatchReference inside a bare Future and dropped the result.
Two concurrent submissions with the same batch_reference both passed the earlier isBatchReferenceUsed
check, then both proceeded — the losing INSERT hit the UniqueIndex and returned a Failure that was
silently discarded, so a duplicate payment was fanned out.

Claim the batch_reference BEFORE creating the parent transaction request or fanning out any payment,
and surface the Failure via unboxFullOrFail (409) so the losing request aborts before charging.
… M5)

checkAnswer / revoke / skip-SCA-accept all did find-then-saveMe with an in-memory status check, so
concurrent requests could both pass the guard and both write: two correct answers both accepted (H1,
H2), or a skip-SCA / challenge accept overwriting an explicit revoke (H3, M5).

Replace the unconditional saveMe with guarded conditional UPDATEs (UPDATE ... WHERE id = ? AND
mstatus = <guard>) via new DoobieConsentStatusQueries / DoobieUserAuthContextUpdateQueries. The loser
of a race gets 0 affected rows and a Failure instead of a second success. revoke uses WHERE mstatus
<> 'REVOKED' so an explicit revocation always wins.
…7, M8, H6, M9)

- DynamicConnector.singletonObjectMap, SecureLogging.customPatternCache and APIUtil.connectorToEndpoint
  were scala.collection.mutable.Map, whose put/getOrElseUpdate are not thread-safe and can lose
  writes or corrupt the map during a concurrent resize. Switch to scala.collection.concurrent.TrieMap
  (same Map API, lock-free).
- ObpLookupSystem.obpLookupSystem and ObpActorSystem.{obpActorSystem, northSideAkkaConnectorActorSystem}
  were plain vars with unguarded null-check-then-assign init. Mark @volatile and use synchronized
  double-checked init so a write publishes safely and the system is created exactly once.
…4, M6, M7)

Add two atomic Redis primitives: setNxEx (SET key value EX ttl NX, the Jedis 2.9.0 five-arg overload)
and incrementWithTtl (a Lua INCR + create-TTL).

- H4: RateLimitingUtil.incrementCounter replaced its TTL-read-then-SET/INCR sequence with the atomic
  incrementWithTtl, so concurrent callers cannot lose increments or race the expiry.
- M6: IdempotencyMiddleware.writeResponseKey used setex (unconditional overwrite); now setNxEx, so the
  first cached response for an idempotency key is immutable and a replay returns the correct body.
- M7: IdempotencyMiddleware.tryAcquireLock used setnx + a separate expire (a crash between left a
  TTL-less key blocking all retries); now a single atomic setNxEx sets value and TTL together.
…s update (M1, M2, M3, M4)

- M2 AccountAccessRequest.updateStatus had no terminal-state guard; M3 MappedAccountApplication and
  M4 MappedChallengeProvider.validateChallenge checked status/success in memory then saveMe. Concurrent
  requests could both write. Add conditional UPDATEs via new DoobieBusinessStatusQueries: action an
  access request only from INITIATED (M2), transition an application only from its loaded status (M3,
  optimistic CAS), and compare-and-set the challenge success flag from false (M4) so one challenge can
  never green-light a payment twice. NB: the challenge `Successful` field maps to column successful_c.
- M1 Http4s510 updateTransactionRequestStatus now acquires lockTransactionRequest within the request
  transaction (mirroring Http4s400) so it cannot overwrite a status set by the racing challenge path.
Thread 0 was simulating the old unconditional .saveMe() write instead of
calling DoobieConsentStatusQueries.conditionalStatusTransition, causing the
test to demonstrate the pre-fix bug rather than verify the fix. Update to
use the same conditional UPDATE path as the production Http4s310 code.
…on status lock

atomicallyLockTransactionRequest used .query[String].unique, which raises
UnexpectedEnd when the transaction-request row does not exist. runUpdate runs it
with unsafeRunSync, so the exception propagates; lockTransactionRequest's tryo
converts it to a Failure box, the caller's .isDefined is then false, and
booleanToFuture fires its default 400 before the handler reaches
getTransactionRequestImpl — so an authorised caller hitting a non-existent
request id got 400 'Failed to acquire transaction request lock' instead of 404.

Switch to .query[String].option so a missing row returns Full(None) (query ran
cleanly) and only a genuine DB/lock error yields a Failure. The Box .isDefined
check in both callers (Http4s510 status update, Http4s400 challenge answer) now
passes for a missing row, letting the downstream lookup return the correct 404;
the 400 is reserved for real lock failures.
The skip-SCA auto-accept did MappedConsent.find(...).map(c => conditionalStatusTransition(...)).
On an Empty box the map yielded Empty, which the for-comprehension's _ <- discarded
silently, leaving the consent INITIATED with no error. The consent is created earlier
in the same request so Empty is effectively unreachable, but a silent no-op is wrong.

Unwrap the box with openOrThrowException so a missing consent surfaces as a clear
internal error, preserving the prior fail-visibly contract while keeping the atomic
guarded INITIATED -> ACCEPTED transition.
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant