perf: use strict counters in check pass and avoid Map conversion (IMPL15) - auditor - An aarch64 constant-time memory access auditing tool.

commit b2f6ae41c37e0ef447e3f74ea3a0aa24163d581d
parent f2600a3da8ea853940d3c6ea3a0a93de29022dde
Author: Jared Tobin <jared@jtobin.io>
Date:   Wed, 11 Feb 2026 20:51:34 +0400

perf: use strict counters in check pass and avoid Map conversion (IMPL15)

Replace per-line AuditResult accumulation with strict counters and
reverse-accumulation via foldl' (flip (:)). Use IntMap lookups directly
in inter-procedural checks instead of converting to Map.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Diffstat:
M lib/Audit/AArch64/Check.hs  | 4 ++--
A plans/ARCH10.md  | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
A plans/ARCH11.md  | 44 ++++++++++++++++++++++++++++++++++++++++++++
A plans/ARCH12.md  | 47 +++++++++++++++++++++++++++++++++++++++++++++++
A plans/ARCH15.md  | 28 ++++++++++++++++++++++++++++
A plans/ARCH9.md  | 35 +++++++++++++++++++++++++++++++++++
A plans/IMPL10.md  | 41 +++++++++++++++++++++++++++++++++++++++++
A plans/IMPL11.md  | 38 ++++++++++++++++++++++++++++++++++++++
A plans/IMPL12.md  | 47 +++++++++++++++++++++++++++++++++++++++++++++++
A plans/IMPL15.md  | 34 ++++++++++++++++++++++++++++++++++
A plans/IMPL9.md  | 41 +++++++++++++++++++++++++++++++++++++++++

11 files changed, 405 insertions(+), 2 deletions(-)
diff --git a/lib/Audit/AArch64/Check.hs b/lib/Audit/AArch64/Check.hs
@@ -78,7 +78,7 @@ checkBlock sym st0 lns = go [] 0 0 st0 lns
     go !vs !lc !ma st (l:ls) =
       let (vs', ma') = checkLineStrict sym st l
           st' = analyzeLine l st
-      in  go (vs' ++ vs) (lc + 1) (ma + ma') st' ls
+      in  go (foldl' (flip (:)) vs vs') (lc + 1) (ma + ma') st' ls
 
 -- | Check a single line, returning violations and memory access count.
 checkLineStrict :: Text -> TaintState -> Line -> ([Violation], Int)
@@ -217,7 +217,7 @@ checkBlockWithSummary sym summaries st0 lns = go [] 0 0 st0 lns
                 Just summ -> applySummary summ st
                 Nothing   -> analyzeLine l st
               _ -> analyzeLine l st
-      in  go (vs' ++ vs) (lc + 1) (ma + ma') st' ls
+      in  go (foldl' (flip (:)) vs vs') (lc + 1) (ma + ma') st' ls
 
 -- | Check entire CFG with taint config.
 checkCFGWithConfig :: TaintConfig -> Text -> CFG -> AuditResult
diff --git a/plans/ARCH10.md b/plans/ARCH10.md
@@ -0,0 +1,48 @@
+# ARCH10: CFG Refactor for Indexed Blocks and Cached Metadata
+
+## Goal
+
+Make CFG construction and traversal more rigorous and efficient by
+switching to indexed blocks and precomputing successor indices,
+fallthrough, and function block ranges.
+
+## Scope
+
+- Replace list-based block storage with an indexed structure.
+- Precompute and store successor indices in each block.
+- Precompute function block ranges once per CFG.
+- Preserve external API semantics where possible.
+
+## Architecture
+
+### Block Representation
+
+- Store blocks in `Data.Primitive.Array.Array` for O(1) indexing.
+- Use `Array` (not `SmallArray`) because block counts routinely exceed
+  128; the card table improves GC performance on larger arrays.
+- Extend `BasicBlock` with:
+  - `bbLastInstr :: Maybe Instr`
+  - `bbSuccIdxs :: [Int]`
+  - `bbHasFallthrough :: Bool` (or derive once)
+
+### CFG Caches
+
+- Add `cfgFuncBlocks :: Map Text [Int]` computed once during CFG build.
+- Add `cfgCallGraph :: Map Text [Text]` if needed for inter-proc.
+
+### Function Boundaries
+
+- Compute function ranges in a single pass over block labels using
+  `isFunctionLabel`.
+- Avoid repeated scans in `functionBlocks` and call graph building.
+
+## Integration
+
+- Update traversal functions to use cached successors and indexed blocks.
+- Keep `functionBlocks` as a thin lookup into cached map.
+- Adjust inter-proc analysis to use cached maps if not already.
+
+## Risks
+
+- API changes for `cfgBlocks` (list -> vector) may ripple into analysis.
+- Function label heuristics remain; consider tightening in a follow-on.
diff --git a/plans/ARCH11.md b/plans/ARCH11.md
@@ -0,0 +1,44 @@
+# ARCH11: Configurable Public Roots for Taint Analysis
+
+## Goal
+
+Allow users to override or extend the set of public root registers via a
+JSON sidecar file, instead of hard-wiring the GHC calling convention.
+
+## Scope
+
+- JSON config file for public roots (register list).
+- CLI flag to load the config (aligned with existing taint config flow).
+- No new dependencies beyond aeson/text/containers.
+
+## Config Model
+
+Example:
+
+{
+  "public_roots": ["SP", "X29", "X19", "X20", "X21", "X28", "X18", "XZR", "WZR"]
+}
+
+Semantics:
+
+- If provided, the public root list fully replaces the default set.
+- Unknown registers remain `Unknown` unless seeded by other policies.
+- Invalid register names are a configuration error.
+
+## Integration Points
+
+- Extend CLI to accept `--public-roots <path>` (or similar).
+- Parse config at startup; pass into analysis environment.
+- Initialize `TaintState` using configured public roots when present.
+- Inter-proc summaries continue to assume caller-saved unknown unless
+  overridden by config.
+
+## Reporting
+
+- Emit a clear error on invalid register names.
+- Optional warning if config is empty (all roots unknown).
+
+## Risks
+
+- Overly permissive roots can hide issues; docs should caution users.
+- Different ABIs may require different defaults; allow opt-in override.
diff --git a/plans/ARCH12.md b/plans/ARCH12.md
@@ -0,0 +1,47 @@
+# ARCH12: Array-Backed Taint State for Register Maps
+
+## Goal
+
+Reduce allocation and lookup overhead in taint analysis by replacing
+`Map Reg Taint`/`Map Reg Provenance` with fixed-size arrays indexed by
+register number, and by making folds strict.
+
+## Scope
+
+- Replace register maps with `Data.Primitive.Array.SmallArray` for taint
+  and provenance (register count < 128).
+- Keep stack slot maps as `IntMap`.
+- Use strict folds (`foldl'`) in hot paths.
+
+## Rationale
+
+- Register set is small and fixed; maps add unnecessary overhead.
+- `SmallArray` is more efficient for small arrays (no card table).
+- Strict folds prevent buildup of thunks in long blocks.
+
+## Architecture
+
+### Register Indexing
+
+- Add `regIndex :: Reg -> Int` and `regCount :: Int`.
+- Store register taints/provenances in `SmallArray` indexed by
+  `regIndex`.
+- Provide helpers `getRegTaint`, `setRegTaint`, `getRegProv`,
+  `setRegProv`.
+
+### Initialization
+
+- `initTaintState` builds arrays with default `Unknown/ProvUnknown` and
+  writes `Public/ProvPublic` at `publicRoots` indices.
+
+### Strictness
+
+- Replace `foldl` with `foldl'` in `analyzeBlock` and
+  `analyzeBlockWithSummaries`.
+
+## Risks
+
+- Must keep regIndex mapping in sync with `Reg` constructors.
+- `SmallArray` is immutable; updates require copying. Use an update
+  helper that writes via `runST`/`MutableArray` for batched updates where
+  it pays off, or accept small copy cost for per-reg updates.
diff --git a/plans/ARCH15.md b/plans/ARCH15.md
@@ -0,0 +1,28 @@
+# ARCH15: Optimize Check Module Accumulation and Lookups
+
+## Goal
+
+Reduce allocation and overhead in the check pass by using strict folds,
+O(1) accumulation for audit results, and avoiding unnecessary Map
+conversions during inter-procedural checks.
+
+## Scope
+
+- Replace per-line `acc <> result` accumulation with strict counters and
+  list accumulation.
+- Avoid building `Map` from `IntMap` for per-block lookups.
+- (Optional) Add a per-block memory-access flag to skip blocks with no
+  loads/stores.
+
+## Rationale
+
+`AuditResult` uses list concatenation and aggregate updates that are
+expensive when done per-line. Inter-proc check currently builds an extra
+`Map` just to lookup by block index.
+
+## Changes
+
+- Introduce strict accumulator in `checkBlock`/`checkBlockWithSummary`.
+- Use `IM.findWithDefault` directly in inter-proc checks.
+- If desired, add `bbHasMemAccess :: Bool` to `BasicBlock` (or compute on
+  the fly) to skip blocks with no memory accesses.
diff --git a/plans/ARCH9.md b/plans/ARCH9.md
@@ -0,0 +1,35 @@
+# ARCH9: Performance Benchmarks (Criterion + Weigh)
+
+## Goal
+
+Establish baseline performance measurement for parsing, CFG building, and
+analysis with wall-clock timing (criterion) and allocation tracking
+(weigh), so refactors can be validated quantitatively.
+
+## Scope
+
+- Benchmarks for parser, CFG construction, and taint analysis.
+- Separate suites for timing (`bench/Main.hs`) and allocations
+  (`bench/Weight.hs`).
+- Use existing deps only (criterion, weigh already allowed in bench).
+
+## Bench Targets
+
+- Parse a representative large NCG assembly dump.
+- Build CFG from parsed lines.
+- Run dataflow (intra-proc) and inter-proc where applicable.
+
+## Inputs
+
+- Use fixtures from `etc/` (real assembly dumps).
+- If needed, add a smaller fixture for quick local runs.
+
+## Reporting
+
+- Criterion: standard reports and regression comparison support.
+- Weigh: allocation summaries per benchmark group.
+
+## Risks
+
+- Bench inputs can be large; keep at least one smaller fixture to avoid
+  slow feedback loops.
diff --git a/plans/IMPL10.md b/plans/IMPL10.md
@@ -0,0 +1,41 @@
+# IMPL10: Implement Indexed CFG and Cached Metadata
+
+## Summary
+
+Refactor CFG to use indexed storage and precompute successor indices and
+function block ranges, reducing traversal overhead and repeated scans.
+
+## Steps
+
+1) Update CFG data structures
+- Change `cfgBlocks` to `Data.Primitive.Array.Array BasicBlock`.
+- Add fields for cached successors and last-instr metadata.
+
+2) Build caches during CFG construction
+- Compute `bbSuccIdxs` and `bbHasFallthrough` once in `buildCFG`.
+- Build `cfgFuncBlocks :: Map Text [Int]` in a single pass.
+
+3) Update APIs
+- `blockSuccessors` should return cached indices.
+- `functionBlocks` should lookup in `cfgFuncBlocks`.
+- `buildCallGraph` should use cached function blocks.
+
+4) Update analysis callers
+- Replace list indexing `blocks !! idx` with vector indexing.
+- Ensure call sites in `Taint`, `Check`, and tests are updated.
+
+5) Tests and benchmarks
+- Add/adjust tests for CFG correctness (succ edges, function ranges).
+- Use new benchmarks to quantify improvements.
+
+## Files to Touch
+
+- `lib/Audit/AArch64/CFG.hs`
+- `lib/Audit/AArch64/Taint.hs`
+- `lib/Audit/AArch64/Check.hs`
+- `test/`
+
+## Validation
+
+- `cabal test`
+- `cabal bench` (criterion + weigh)
diff --git a/plans/IMPL11.md b/plans/IMPL11.md
@@ -0,0 +1,38 @@
+# IMPL11: Implement Configurable Public Roots
+
+## Summary
+
+Add a JSON-configurable public root register set and use it to seed the
+initial taint state.
+
+## Steps
+
+1) Define config types
+- Add a `PublicRootsConfig` with `public_roots :: Set Reg`.
+- Implement FromJSON with register name validation.
+
+2) CLI integration
+- Add `--public-roots PATH` flag.
+- Parse JSON and pass into analysis options.
+
+3) Taint initialization
+- Modify `initTaintState` to accept an optional public-roots set.
+- Add `initTaintStateWithRoots :: Set Reg -> TaintState`.
+- Thread config through `runDataflow`, `runInterProc`, and config-aware
+  variants.
+
+4) Tests
+- Add tests for JSON parsing and override behavior.
+- Add a small taint test with a custom root set.
+
+## Files to Touch
+
+- `lib/Audit/AArch64/Taint.hs`
+- `lib/Audit/AArch64/Types.hs` (if new config types exported)
+- `app/` CLI entrypoint
+- `test/`
+
+## Validation
+
+- `cabal test`
+- Run auditor with/without `--public-roots` to confirm behavior.
diff --git a/plans/IMPL12.md b/plans/IMPL12.md
@@ -0,0 +1,47 @@
+# IMPL12: Implement Array-Backed Register Taint/Provenance
+
+## Summary
+
+Refactor `TaintState` to use `SmallArray` for register taint and
+provenance, and make block analysis folds strict.
+
+## Steps
+
+1) Add register indexing
+- Define `regIndex :: Reg -> Int` and `regCount :: Int`.
+- Provide total mapping for all `Reg` constructors.
+
+2) Update TaintState
+- Replace `tsRegs :: Map Reg Taint` with `tsRegs :: SmallArray Taint`.
+- Replace `tsProv :: Map Reg Provenance` with `tsProv :: SmallArray Provenance`.
+- Keep `tsStack`/`tsStackProv` unchanged.
+
+3) Access helpers
+- Add `getRegTaint`, `setRegTaint`, `getRegProv`, `setRegProv`.
+- Update `getTaint`/`setTaint`/`getProvenance` and call sites.
+
+4) Initialization
+- Build arrays with defaults (Unknown/ProvUnknown).
+- Overwrite `publicRoots` indices to Public/ProvPublic.
+
+5) Strict folds
+- Use `foldl'` in `analyzeBlock` and `analyzeBlockWithSummaries`.
+
+6) Update joins
+- Implement `joinTaintState` with element-wise array zip.
+- Consider an ST-based update for fewer allocations if needed.
+
+7) Tests/Validation
+- Re-run taint tests; add a small test for register array indexing.
+- `cabal test` and (optionally) benchmark deltas.
+
+## Files to Touch
+
+- `lib/Audit/AArch64/Taint.hs`
+- `lib/Audit/AArch64/Types.hs` (if Reg mapping helpers live there)
+- `test/`
+
+## Validation
+
+- `cabal test`
+- `cabal bench`
diff --git a/plans/IMPL15.md b/plans/IMPL15.md
@@ -0,0 +1,34 @@
+# IMPL15: Implement Check Module Optimizations
+
+## Summary
+
+Make the check pass stricter and cheaper by using strict folds for
+result accumulation and avoiding Map conversions in inter-proc checks.
+
+## Steps
+
+1) Strict accumulation
+- Replace `checkBlock` loop to accumulate:
+  - `violations :: [Violation]` (reverse + final reverse if needed)
+  - `linesChecked :: Int`
+  - `memAccesses :: Int`
+- Use `foldl'` for strictness.
+
+2) Inter-proc lookup cleanup
+- Remove `inStatesMap` creation; use `IM.findWithDefault` directly when
+  iterating block indices in `checkCFGInterProc` and
+  `checkCFGInterProcWithConfig`.
+
+3) Optional block filtering
+- If `bbHasMemAccess` is available, skip `checkBlock` for blocks without
+  loads/stores.
+
+## Files to Touch
+
+- `lib/Audit/AArch64/Check.hs`
+- (Optional) `lib/Audit/AArch64/CFG.hs`
+
+## Validation
+
+- `cabal test`
+- Benchmark check path if available.
diff --git a/plans/IMPL9.md b/plans/IMPL9.md
@@ -0,0 +1,41 @@
+# IMPL9: Add Bench Suites for Parser/CFG/Analysis
+
+## Summary
+
+Implement criterion and weigh benchmark suites to measure parsing, CFG
+construction, and taint analysis performance.
+
+## Steps
+
+1) Add benchmark entrypoints
+- Create/extend `bench/Main.hs` using criterion.
+- Create/extend `bench/Weight.hs` using weigh.
+
+2) Select fixtures
+- Use one large real dump from `etc/` (e.g., `secp256k1NCG.s`).
+- Add a smaller fixture if needed for quick local iteration.
+
+3) Benchmark groups
+- `parse`: parse assembly into `[Line]`.
+- `cfg`: build CFG from parsed lines.
+- `taint`: run intra-proc dataflow and optionally inter-proc.
+
+4) NFData instances
+- Define NFData instances for core types if needed (Line, CFG, etc.) to
+  ensure full evaluation in benchmarks.
+
+5) Wire into cabal
+- Ensure benchmark stanzas exist in `ppad-auditor.cabal` for criterion
+  and weigh targets.
+
+## Files to Touch
+
+- `bench/Main.hs`
+- `bench/Weight.hs`
+- `ppad-auditor.cabal`
+- `lib/` (NFData instances if required)
+
+## Validation
+
+- `cabal bench`
+- Confirm criterion and weigh suites run and report results.

	auditor An aarch64 constant-time memory access auditing tool.
	git clone git://git.ppad.tech/auditor.git
	Log \| Files \| Refs \| README \| LICENSE

M	lib/Audit/AArch64/Check.hs	\|	4	++--
A	plans/ARCH10.md	\|	48	++++++++++++++++++++++++++++++++++++++++++++++++
A	plans/ARCH11.md	\|	44	++++++++++++++++++++++++++++++++++++++++++++
A	plans/ARCH12.md	\|	47	+++++++++++++++++++++++++++++++++++++++++++++++
A	plans/ARCH15.md	\|	28	++++++++++++++++++++++++++++
A	plans/ARCH9.md	\|	35	+++++++++++++++++++++++++++++++++++
A	plans/IMPL10.md	\|	41	+++++++++++++++++++++++++++++++++++++++++
A	plans/IMPL11.md	\|	38	++++++++++++++++++++++++++++++++++++++
A	plans/IMPL12.md	\|	47	+++++++++++++++++++++++++++++++++++++++++++++++
A	plans/IMPL15.md	\|	34	++++++++++++++++++++++++++++++++++
A	plans/IMPL9.md	\|	41	+++++++++++++++++++++++++++++++++++++++++