ARCH0.md (3615B)
1 # ARCH0: AArch64 ASM Constant-Time Memory Access Audit 2 3 ## Goal 4 5 Prototype a static analyzer that ingests GHC+LLVM aarch64 assembly 6 and proves a narrow property: all memory accesses are based on public 7 registers plus constant offsets (or other whitelisted addressing 8 modes), and no loads/stores use secret-derived addresses. 9 10 This is an automated sanity check, not a full constant-time proof. 11 12 ## Scope 13 14 - Target: aarch64-darwin GHC output (LLVM backend), no inline asm. 15 - Input: text assembly dumps (.s or llc output) with symbols. 16 - Property: memory access address derivations are public and bounded. 17 - Non-goals: micro-architectural timing proof, full CT branching proof. 18 19 ## Assumptions 20 21 - GHC calling convention for aarch64 is stable enough to whitelist 22 public base registers (e.g. sp, STG stack/heap pointers, RTS regs). 23 - LLVM does not introduce secret-dependent memory addressing without 24 explicit arithmetic in the asm. 25 - External calls are treated as opaque unless summarized. 26 27 ## Prototype Architecture 28 29 1) Parser 30 - Tokenize instructions, labels, and directives from .s. 31 - Extract instruction mnemonics and operands. 32 - Normalize addressing modes: [xN, #imm], [xN, xM], [xN, xM, lsl #k]. 33 34 2) CFG and basic blocks 35 - Identify basic blocks by labels and control flow instructions. 36 - Build a simple CFG for forward dataflow; no SSA needed. 37 38 3) Taint lattice (publicness) 39 - States per register: Public, Secret, Unknown. 40 - Initialize public roots: sp, fp, STG stack/heap pointers, TLS regs, 41 and known constant pools; all others Unknown. 42 - Propagate for arithmetic and moves (e.g. add, sub, mov, lsl, orr). 43 - Conservative joins across CFG edges. 44 45 4) Memory access rules 46 - For each load/store: 47 - Base register must be Public. 48 - Index register (if any) must be Public and either constant or 49 masked to a fixed range (configurable). 50 - Offset must be constant immediate unless whitelisted pattern. 51 - Flag any instruction that violates the above. 52 53 5) Whitelisting and summaries 54 - Config file for: 55 - public root registers per platform and GHC version. 56 - allowed addressing modes. 57 - known-safe helper symbols (e.g. RTS/GC stubs). 58 59 6) Reporting 60 - Emit a report with: 61 - first offending instruction per symbol, 62 - reason (base not public, non-const offset, etc.), 63 - suggested remediation. 64 65 ## Prototype Rule Set (initial) 66 67 - Public roots: sp, fp/x29, x28 (STG stack ptr), x27 (STG heap ptr), 68 x19-x26 when used as GHC-prescribed callee-saves, x18 (TLS). 69 - Public constants: literal pools, constant addresses from adrp/add. 70 - Allowed addr modes: 71 - [xN, #imm] 72 - [xN, xM] only if xM is Public and range-checked via and/lsr mask 73 - [xN, xM, lsl #k] only if xM is Public and range-checked 74 75 ## Prototype Flow 76 77 - Input: .s file 78 - Output: JSON + text summary 79 - CLI: arch-audit --ghc-aarch64 --input foo.s --config cfg.toml 80 81 ## Validation Plan 82 83 - Create small, controlled asm fixtures: 84 - known-good (stack and GHC heap accesses only) 85 - known-bad (secret-derived base or index) 86 - Compare findings to manual inspection of ppad-fixed and ppad-secp256k1 87 dumps; check for false positives and negatives. 88 89 ## Risks and Limitations 90 91 - Register reuse across calls makes publicness hard to track without 92 interprocedural summaries. 93 - LLVM may fold complex addressing in ways that require more semantics. 94 - This does not prove constant-time; it only enforces a strict memory 95 access policy. 96 97 ## Next Steps 98 99 - Confirm the exact GHC aarch64 register roles to seed the whitelist. 100 - Pick a parsing strategy (simple regex vs. small grammar). 101 - Implement a minimal dataflow engine and run on one target dump.