ARCH1.md (1570B)
1 # ARCH1: Symbol-Offset Addressing Support 2 3 ## Problem 4 5 The parser rejects AArch64 address forms like: 6 7 ldr x8, [x8, _symbol@GOTPAGEOFF] 8 9 because `pAddrModeInner` only accepts immediates or registers. This 10 blocks analysis of GHC aarch64 dumps that use symbol-offset addressing. 11 12 ## Proposed Fix (Architecture) 13 14 Introduce a distinct address mode for symbol-offset addressing and 15 thread it through taint and checks. 16 17 ### New address mode 18 19 - Add `BaseSymbol Reg Text` to `AddrMode` to represent: 20 - [xN, _symbol@PAGEOFF] 21 - [xN, _symbol@GOTPAGEOFF] 22 - [xN, _symbol@PAGE] 23 24 ### Parsing 25 26 - Extend `pAddrModeInner` to accept a symbol reference as the offset. 27 - Reuse `pSymbolRef` so suffixes like `@PAGEOFF` are preserved. 28 - `pBracketAddr` should map symbol offsets to `BaseSymbol`. 29 30 ### Analysis impact 31 32 - Treat `BaseSymbol` as a *constant offset* with a symbolic name. 33 - `AddrMode` taint for `BaseSymbol` depends only on the base register. 34 - No change to dataflow rules for taint propagation. 35 36 ### Reporting 37 38 - JSON encoding must include the new constructor. 39 - Violation checks should consider `BaseSymbol` equivalent to 40 `BaseImm` for base-taint evaluation. 41 42 ### Non-blocking follow-ups 43 44 - Add parsing support for `ldur`, `stur`, `adcs`, `negs`, `mneg`. 45 - These should be treated as standard instructions (not `Other`). 46 47 ## Acceptance Criteria 48 49 - Parser accepts symbol-offset address modes in brackets. 50 - Address taint checks run without false parse errors. 51 - JSON output includes symbol-offset address modes. 52 - Existing fixtures continue to parse and analyze cleanly.