Skip to content

Parallel gadget scan with --jobs (#23, part 2)#28

Merged
ricardojrdez merged 1 commit into
masterfrom
feature-parallel-scan
Jun 23, 2026
Merged

Parallel gadget scan with --jobs (#23, part 2)#28
ricardojrdez merged 1 commit into
masterfrom
feature-parallel-scan

Conversation

@ricardojrdez

Copy link
Copy Markdown
Member

Second and final part of #23 (the on-disk cache landed in #27). Closes #23.

--jobs N

Distributes the gadget scan over N worker processes. Each executable section is split into chunks scanned independently, then merged and deduplicated. The output is identical to a serial run.

How correctness is kept:

  • A chunk emits only the gadgets whose termination offset falls inside its window. The slice handed to a worker extends depth bytes before the window so gadgets straddling a boundary are still complete, and the final chunk owns the closing offset of the section. No cross-chunk duplicates, none missed.
  • The duplicate tie-break is now deterministic (fewest canary bytes, then lowest address), so serial and parallel produce byte-identical output (verified on a 32-bit busybox: same (vaddr, gadget, count) set for jobs 2/4/8).

Implementation:

  • _scan_worker (module level, picklable) re-initialises the architecture in its own process and reuses the existing validity checks. Capstone decode objects are not picklable, so workers return raw [vaddr, hex] records and the parent rebuilds decodes/symbols (the same path as the cache).
  • _scan_parallel builds the chunk tasks and runs a multiprocessing.Pool; results feed the existing reconstruct / dedup / cache path, so --jobs composes with --cache.
  • utils.TOOL_NAME tolerates a __main__ without __file__ (needed for spawned workers and library use).
  • Wired through GadFinder, the Rop3 API (jobs=) and the CLI (--jobs, validated >= 1).

Performance (honest numbers)

The speedup is sublinear: the merge, deduplication and sort run in the parent, plus per-process start-up cost. On a 1 MB busybox:

workload serial jobs=4 jobs=8
depth 10 1.46s 1.30s (1.12x) 1.16s (1.26x)
depth 20 3.78s 2.75s (1.38x) 2.21s (1.71x)

It is worth it for large binaries and/or high --depth; on small inputs the default --jobs 1 is faster. README documents this.

Testing

  • 94 tests pass on Python 3.11 and 3.13 (+4).
  • test_parallel.py: parallel output equals serial (including with --badchar-bytes/--depth and a ret as the last byte to exercise the closing boundary), and parallel results are cacheable.

Closes #23.

--jobs N distributes the scan over N worker processes. Each executable
section is split into chunks; a chunk emits only the gadgets whose
termination falls inside its window (the slice extends `depth` bytes earlier
so straddling gadgets stay complete, and the final chunk owns the closing
offset), so there are no cross-chunk duplicates and the merged result is
identical to a serial run.

- gadfinder: module-level _scan_worker (re-inits the architecture per
  process and reuses the existing validity checks); _scan_parallel builds
  the chunk tasks, runs a multiprocessing.Pool and returns sorted
  [vaddr, hex] records, which feed the same reconstruct/dedup/cache path.
- dedup tie-break is now deterministic (fewest canary bytes, then lowest
  address), so serial and parallel produce byte-identical output.
- utils: TOOL_NAME tolerates a __main__ without __file__ (spawned workers,
  library use).
- wired through GadFinder, the Rop3 API (jobs=) and the CLI (--jobs, with
  validation). README documents the flag and its sublinear speedup.

Closes #23.

Tests: 94 pass (3.11/3.13). New test_parallel.py checks that parallel
output equals serial (incl. with bad-char/depth options and the closing
boundary) and that parallel results are cacheable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ricardojrdez ricardojrdez merged commit bcc195e into master Jun 23, 2026
3 checks passed
@ricardojrdez ricardojrdez deleted the feature-parallel-scan branch June 23, 2026 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Scalability: parallelize gadget search and cache results

1 participant