Parallel gadget scan with --jobs (#23, part 2)#28
Merged
Conversation
--jobs N distributes the scan over N worker processes. Each executable section is split into chunks; a chunk emits only the gadgets whose termination falls inside its window (the slice extends `depth` bytes earlier so straddling gadgets stay complete, and the final chunk owns the closing offset), so there are no cross-chunk duplicates and the merged result is identical to a serial run. - gadfinder: module-level _scan_worker (re-inits the architecture per process and reuses the existing validity checks); _scan_parallel builds the chunk tasks, runs a multiprocessing.Pool and returns sorted [vaddr, hex] records, which feed the same reconstruct/dedup/cache path. - dedup tie-break is now deterministic (fewest canary bytes, then lowest address), so serial and parallel produce byte-identical output. - utils: TOOL_NAME tolerates a __main__ without __file__ (spawned workers, library use). - wired through GadFinder, the Rop3 API (jobs=) and the CLI (--jobs, with validation). README documents the flag and its sublinear speedup. Closes #23. Tests: 94 pass (3.11/3.13). New test_parallel.py checks that parallel output equals serial (incl. with bad-char/depth options and the closing boundary) and that parallel results are cacheable. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Second and final part of #23 (the on-disk cache landed in #27). Closes #23.
--jobs NDistributes the gadget scan over
Nworker processes. Each executable section is split into chunks scanned independently, then merged and deduplicated. The output is identical to a serial run.How correctness is kept:
depthbytes before the window so gadgets straddling a boundary are still complete, and the final chunk owns the closing offset of the section. No cross-chunk duplicates, none missed.(vaddr, gadget, count)set for jobs 2/4/8).Implementation:
_scan_worker(module level, picklable) re-initialises the architecture in its own process and reuses the existing validity checks. Capstone decode objects are not picklable, so workers return raw[vaddr, hex]records and the parent rebuilds decodes/symbols (the same path as the cache)._scan_parallelbuilds the chunk tasks and runs amultiprocessing.Pool; results feed the existing reconstruct / dedup / cache path, so--jobscomposes with--cache.utils.TOOL_NAMEtolerates a__main__without__file__(needed for spawned workers and library use).GadFinder, theRop3API (jobs=) and the CLI (--jobs, validated>= 1).Performance (honest numbers)
The speedup is sublinear: the merge, deduplication and sort run in the parent, plus per-process start-up cost. On a 1 MB busybox:
It is worth it for large binaries and/or high
--depth; on small inputs the default--jobs 1is faster. README documents this.Testing
test_parallel.py: parallel output equals serial (including with--badchar-bytes/--depthand a ret as the last byte to exercise the closing boundary), and parallel results are cacheable.Closes #23.