auditor

An aarch64 constant-time memory access auditing tool.
git clone git://git.ppad.tech/auditor.git
Log | Files | Refs | README | LICENSE

ARCH0.md (3615B)


      1 # ARCH0: AArch64 ASM Constant-Time Memory Access Audit
      2 
      3 ## Goal
      4 
      5 Prototype a static analyzer that ingests GHC+LLVM aarch64 assembly
      6 and proves a narrow property: all memory accesses are based on public
      7 registers plus constant offsets (or other whitelisted addressing
      8 modes), and no loads/stores use secret-derived addresses.
      9 
     10 This is an automated sanity check, not a full constant-time proof.
     11 
     12 ## Scope
     13 
     14 - Target: aarch64-darwin GHC output (LLVM backend), no inline asm.
     15 - Input: text assembly dumps (.s or llc output) with symbols.
     16 - Property: memory access address derivations are public and bounded.
     17 - Non-goals: micro-architectural timing proof, full CT branching proof.
     18 
     19 ## Assumptions
     20 
     21 - GHC calling convention for aarch64 is stable enough to whitelist
     22   public base registers (e.g. sp, STG stack/heap pointers, RTS regs).
     23 - LLVM does not introduce secret-dependent memory addressing without
     24   explicit arithmetic in the asm.
     25 - External calls are treated as opaque unless summarized.
     26 
     27 ## Prototype Architecture
     28 
     29 1) Parser
     30 - Tokenize instructions, labels, and directives from .s.
     31 - Extract instruction mnemonics and operands.
     32 - Normalize addressing modes: [xN, #imm], [xN, xM], [xN, xM, lsl #k].
     33 
     34 2) CFG and basic blocks
     35 - Identify basic blocks by labels and control flow instructions.
     36 - Build a simple CFG for forward dataflow; no SSA needed.
     37 
     38 3) Taint lattice (publicness)
     39 - States per register: Public, Secret, Unknown.
     40 - Initialize public roots: sp, fp, STG stack/heap pointers, TLS regs,
     41   and known constant pools; all others Unknown.
     42 - Propagate for arithmetic and moves (e.g. add, sub, mov, lsl, orr).
     43 - Conservative joins across CFG edges.
     44 
     45 4) Memory access rules
     46 - For each load/store:
     47   - Base register must be Public.
     48   - Index register (if any) must be Public and either constant or
     49     masked to a fixed range (configurable).
     50   - Offset must be constant immediate unless whitelisted pattern.
     51 - Flag any instruction that violates the above.
     52 
     53 5) Whitelisting and summaries
     54 - Config file for:
     55   - public root registers per platform and GHC version.
     56   - allowed addressing modes.
     57   - known-safe helper symbols (e.g. RTS/GC stubs).
     58 
     59 6) Reporting
     60 - Emit a report with:
     61   - first offending instruction per symbol,
     62   - reason (base not public, non-const offset, etc.),
     63   - suggested remediation.
     64 
     65 ## Prototype Rule Set (initial)
     66 
     67 - Public roots: sp, fp/x29, x28 (STG stack ptr), x27 (STG heap ptr),
     68   x19-x26 when used as GHC-prescribed callee-saves, x18 (TLS).
     69 - Public constants: literal pools, constant addresses from adrp/add.
     70 - Allowed addr modes:
     71   - [xN, #imm]
     72   - [xN, xM] only if xM is Public and range-checked via and/lsr mask
     73   - [xN, xM, lsl #k] only if xM is Public and range-checked
     74 
     75 ## Prototype Flow
     76 
     77 - Input: .s file
     78 - Output: JSON + text summary
     79 - CLI: arch-audit --ghc-aarch64 --input foo.s --config cfg.toml
     80 
     81 ## Validation Plan
     82 
     83 - Create small, controlled asm fixtures:
     84   - known-good (stack and GHC heap accesses only)
     85   - known-bad (secret-derived base or index)
     86 - Compare findings to manual inspection of ppad-fixed and ppad-secp256k1
     87   dumps; check for false positives and negatives.
     88 
     89 ## Risks and Limitations
     90 
     91 - Register reuse across calls makes publicness hard to track without
     92   interprocedural summaries.
     93 - LLVM may fold complex addressing in ways that require more semantics.
     94 - This does not prove constant-time; it only enforces a strict memory
     95   access policy.
     96 
     97 ## Next Steps
     98 
     99 - Confirm the exact GHC aarch64 register roles to seed the whitelist.
    100 - Pick a parsing strategy (simple regex vs. small grammar).
    101 - Implement a minimal dataflow engine and run on one target dump.