auditor

An aarch64 constant-time memory access auditing tool.
git clone git://git.ppad.tech/auditor.git
Log | Files | Refs | README | LICENSE

commit 37685ec46605865baab92ed590bd1e97f9ba85b5
Author: Jared Tobin <jared@jtobin.io>
Date:   Tue, 10 Feb 2026 12:06:40 +0400

init: scaffold ppad-auditor project

AArch64 constant-time memory access auditor for GHC+LLVM assembly.

Implements static analysis to verify memory accesses use only public
(non-secret-derived) addresses, helping ensure CT properties for
cryptographic code.

Components:
- Parser: megaparsec-based AArch64 assembly parser
- CFG: basic block and control flow graph construction
- Taint: forward dataflow analysis for register publicness
- Check: memory access validation against taint state
- CLI: command-line interface with JSON output

GHC 9.10.3 AArch64 register map:
- X19: Base, X20: Sp, X21: Hp, X28: SpLim (public roots)
- SP, X29, X18, XZR also treated as public

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Diffstat:
AAGENTS.md | 143+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
ALICENSE | 20++++++++++++++++++++
Aapp/Main.hs | 87+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Aflake.nix | 56++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Alib/Audit/AArch64.hs | 69+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Alib/Audit/AArch64/CFG.hs | 124+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Alib/Audit/AArch64/Check.hs | 132+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Alib/Audit/AArch64/Parser.hs | 463+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Alib/Audit/AArch64/Taint.hs | 215+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Alib/Audit/AArch64/Types.hs | 271+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Aplans/ARCH0.md | 101+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Aplans/IMPL0.md | 67+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Appad-auditor.cabal | 67+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Atest/Main.hs | 168+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Atest/fixtures/bad.s | 31+++++++++++++++++++++++++++++++
Atest/fixtures/good.s | 37+++++++++++++++++++++++++++++++++++++
16 files changed, 2051 insertions(+), 0 deletions(-)

diff --git a/AGENTS.md b/AGENTS.md @@ -0,0 +1,143 @@ +# ppad-bolt8 + +Haskell implementation of BOLT #8 (Lightning Network encrypted transport). + +Specification: https://github.com/lightning/bolts/blob/master/08-transport.md + +## Project Structure + +- `lib/` - library source (Lightning.Protocol.BOLT8) +- `test/` - tests (tasty + tasty-hunit) +- `bench/` - benchmarks (criterion for timing, weigh for allocations) +- `flake.nix` - nix flake for dependency and build management +- `ppad-bolt8.cabal` - cabal package definition +- `CLAUDE.md` / `AGENTS.md` - keep these in sync + +## Build and Test + +Enter devshell and use cabal: + +``` +nix develop +cabal build +cabal test +cabal bench +``` + +Do not use stack. All dependency and build management via nix. + +## Dependencies + +### ppad libraries (use freely) + +Use ppad libraries (github.com/ppad-tech, git.ppad.tech) liberally. +Current dependencies: ppad-aead, ppad-hkdf, ppad-secp256k1, ppad-sha256. + +### External libraries + +Use only minimal external dependencies. Prefer GHC's core/boot libraries +(base, bytestring, primitive, etc.). + +**Ask for explicit confirmation before adding any library outside of:** +- GHC boot/core libraries +- ppad-* libraries +- Test dependencies (tasty, QuickCheck, etc. for test-suite only) +- Benchmark dependencies (criterion, weigh for benchmark only) + +## Code Style + +### Performance + +- Use strictness annotations (BangPatterns) liberally +- Prefer UNPACK for strict record fields +- Use MagicHash, UnboxedTuples, GHC.Exts for hot paths +- Do not rely on UNBOX pragmas; implement primitives directly with + MagicHash and GHC.Exts when needed +- Use INLINE pragmas for small functions +- Refer to ppad-sha256 and ppad-fixed for low-level patterns + +### Type safety + +- Encode invariants into the type system +- Use newtypes liberally (e.g., Sec, Pub, Session) +- Use ADTs to make illegal states unrepresentable +- Prefer smart constructors that validate inputs + +### Safety + +- Never use partial Prelude functions (head, tail, !!, etc.) +- Avoid brittle partials in tests too (e.g., unchecked indexing). Prefer + bounds checks or total helpers even in test code. +- Avoid non-exhaustive pattern matches and unsafe behavior; use total + helpers and make all constructors explicit. +- Use Maybe/Either for fallible operations +- Validate all inputs at system boundaries + +### Formatting + +- Keep lines under 80 characters +- Use Haskell2010 +- Module header with copyright, license, maintainer +- OPTIONS_HADDOCK prune for public modules +- Haddock examples for exported functions + +## Testing + +Use tasty to wrap all tests: +- tasty-hunit for unit tests with known vectors +- tasty-quickcheck for property-based tests +- Source test vectors from specifications (RFC, BOLT spec, Wycheproof, etc.) + +Property tests should enforce invariants that can't be encoded in types. + +## Benchmarking + +Always maintain benchmark suites: +- `bench/Main.hs` - criterion for wall-time benchmarks +- `bench/Weight.hs` - weigh for allocation tracking + +Define NFData instances for types that need benchmarking. + +## Git Workflow + +- Feature branches for development; commit freely there +- Logical, atomic commits on feature branches +- Master should be mostly merge commits +- Merge to master with `--no-ff` after validation +- Always build and test before creating a merge commit +- Write detailed merge commit messages summarising changes + +### Worktree flow (for planned work) + +When starting work on an implementation plan: + +``` +git worktree add ./impl-<desc> -b impl/<desc> master +# work in that worktree +# merge to master when complete +git worktree remove ./impl-<desc> +``` + +### Commits + +- Higher-level descriptions in merge commits +- Never update git config +- Never use destructive git commands (push --force, hard reset) without + explicit request +- Never skip hooks unless explicitly requested + +## Planning + +When planning work: +- Highlight which steps can be done independently +- Consider forking subagents for concurrent work on independent steps +- Write implementation plans to `plans/IMPL<n>.md` if the project uses + this convention + +## Flake Structure + +The flake.nix follows ppad conventions: +- Uses ppad-nixpkgs as base +- Follows references to avoid duplication +- Supports LLVM backend via cabal flag +- Provides devShell with ghc, cabal, cc, llvm diff --git a/LICENSE b/LICENSE @@ -0,0 +1,20 @@ +Copyright (c) 2025 Jared Tobin + +Permission is hereby granted, free of charge, to any person obtaining +a copy of this software and associated documentation files (the +"Software"), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions: + +The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE +LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION +OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION +WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. diff --git a/app/Main.hs b/app/Main.hs @@ -0,0 +1,87 @@ +{-# LANGUAGE OverloadedStrings #-} + +module Main where + +import Audit.AArch64 +import Data.Aeson (encode) +import qualified Data.ByteString.Lazy.Char8 as BL +import Data.Text (Text) +import qualified Data.Text as T +import qualified Data.Text.IO as TIO +import Options.Applicative +import System.Exit (exitFailure, exitSuccess) + +data Options = Options + { optInput :: !FilePath + , optJson :: !Bool + , optQuiet :: !Bool + } deriving (Eq, Show) + +optParser :: Parser Options +optParser = Options + <$> strOption + ( long "input" + <> short 'i' + <> metavar "FILE" + <> help "Input assembly file (.s)" + ) + <*> switch + ( long "json" + <> short 'j' + <> help "Output JSON format" + ) + <*> switch + ( long "quiet" + <> short 'q' + <> help "Suppress summary, only show violations" + ) + +optInfo :: ParserInfo Options +optInfo = info (optParser <**> helper) + ( fullDesc + <> progDesc "Audit AArch64 assembly for constant-time memory access" + <> header "auditor - CT memory access auditor for AArch64" + ) + +main :: IO () +main = do + opts <- execParser optInfo + result <- auditFile (optInput opts) + case result of + Left err -> do + TIO.putStrLn $ "Error: " <> err + exitFailure + Right ar -> + if optJson opts + then outputJson ar + else outputText opts ar + +outputJson :: AuditResult -> IO () +outputJson ar = BL.putStrLn (encode (arViolations ar)) + +outputText :: Options -> AuditResult -> IO () +outputText opts ar = do + let vs = arViolations ar + mapM_ printViolation vs + if optQuiet opts + then pure () + else do + TIO.putStrLn "" + TIO.putStrLn $ "Lines checked: " <> T.pack (show (arLinesChecked ar)) + TIO.putStrLn $ "Memory accesses: " <> T.pack (show (arMemoryAccesses ar)) + TIO.putStrLn $ "Violations: " <> T.pack (show (length vs)) + if null vs + then exitSuccess + else exitFailure + +printViolation :: Violation -> IO () +printViolation v = TIO.putStrLn $ + vSymbol v <> ":" <> T.pack (show (vLine v)) <> ": " <> reasonText (vReason v) + +reasonText :: ViolationReason -> Text +reasonText r = case r of + SecretBase reg -> "secret base register " <> regName reg + SecretIndex reg -> "secret index register " <> regName reg + UnknownBase reg -> "unknown base register " <> regName reg + UnknownIndex reg -> "unknown index register " <> regName reg + NonConstOffset -> "non-constant offset without masking" diff --git a/flake.nix b/flake.nix @@ -0,0 +1,56 @@ +{ + description = "AArch64 constant-time memory access auditor"; + + inputs = { + ppad-nixpkgs = { + type = "git"; + url = "git://git.ppad.tech/nixpkgs.git"; + ref = "master"; + }; + flake-utils.follows = "ppad-nixpkgs/flake-utils"; + nixpkgs.follows = "ppad-nixpkgs/nixpkgs"; + }; + + outputs = { self, nixpkgs, flake-utils, ppad-nixpkgs }: + flake-utils.lib.eachDefaultSystem (system: + let + lib = "ppad-auditor"; + + pkgs = import nixpkgs { inherit system; }; + hlib = pkgs.haskell.lib; + llvm = pkgs.llvmPackages_19.llvm; + + hpkgs = pkgs.haskell.packages.ghc910.extend (new: old: { + ${lib} = old.callCabal2nixWithOptions lib ./. "--enable-profiling" {}; + }); + + cc = pkgs.stdenv.cc; + ghc = hpkgs.ghc; + cabal = hpkgs.cabal-install; + in + { + packages.default = hpkgs.${lib}; + + packages.haddock = hpkgs.${lib}.doc; + + devShells.default = hpkgs.shellFor { + packages = p: [ + p.${lib} + ]; + + buildInputs = [ + cabal + cc + llvm + ]; + + shellHook = '' + PS1="[${lib}] \w$ " + echo "entering ${system} shell, using" + echo "ghc: $(${ghc}/bin/ghc --version)" + echo "cabal: $(${cabal}/bin/cabal --version)" + ''; + }; + } + ); +} diff --git a/lib/Audit/AArch64.hs b/lib/Audit/AArch64.hs @@ -0,0 +1,69 @@ +{-# OPTIONS_HADDOCK prune #-} +{-# LANGUAGE OverloadedStrings #-} + +-- | +-- Module: Audit.AArch64 +-- Copyright: (c) 2025 Jared Tobin +-- License: MIT +-- Maintainer: jared@ppad.tech +-- +-- AArch64 constant-time memory access auditor. +-- +-- This module provides static analysis for AArch64 GHC+LLVM assembly +-- to verify that memory accesses use only public (non-secret-derived) +-- addresses. This helps ensure constant-time properties for +-- cryptographic code. +-- +-- Example usage: +-- +-- @ +-- import Audit.AArch64 +-- import qualified Data.Text.IO as TIO +-- +-- main = do +-- src <- TIO.readFile "foo.s" +-- case audit "foo.s" src of +-- Left err -> putStrLn $ "Parse error: " ++ show err +-- Right result -> print result +-- @ + +module Audit.AArch64 ( + -- * Main API + audit + , auditFile + + -- * Results + , AuditResult(..) + , Violation(..) + , ViolationReason(..) + + -- * Re-exports + , ParseError + ) where + +import Audit.AArch64.CFG +import Audit.AArch64.Check +import Audit.AArch64.Parser +import Audit.AArch64.Types +import qualified Data.ByteString as BS +import Data.Text (Text) +import qualified Data.Text as T +import Data.Text.Encoding (decodeUtf8') + +-- | Audit assembly source for memory access violations. +audit :: Text -> Text -> Either ParseError AuditResult +audit name src = do + lns <- parseAsm src + let cfg = buildCFG lns + pure (checkCFG name cfg) + +-- | Audit an assembly file. +auditFile :: FilePath -> IO (Either Text AuditResult) +auditFile path = do + bs <- BS.readFile path + case decodeUtf8' bs of + Left err -> pure (Left (T.pack (show err))) + Right src -> + case audit (T.pack path) src of + Left err -> pure (Left (T.pack (show err))) + Right result -> pure (Right result) diff --git a/lib/Audit/AArch64/CFG.hs b/lib/Audit/AArch64/CFG.hs @@ -0,0 +1,124 @@ +{-# OPTIONS_HADDOCK prune #-} +{-# LANGUAGE OverloadedStrings #-} + +-- | +-- Module: Audit.AArch64.CFG +-- Copyright: (c) 2025 Jared Tobin +-- License: MIT +-- Maintainer: jared@ppad.tech +-- +-- Control flow graph construction for AArch64 assembly. + +module Audit.AArch64.CFG ( + BasicBlock(..) + , CFG(..) + , buildCFG + , blockLabels + ) where + +import Audit.AArch64.Types +import Data.Map.Strict (Map) +import qualified Data.Map.Strict as Map +import Data.Set (Set) +import qualified Data.Set as Set +import Data.Text (Text) + +-- | A basic block: a sequence of instructions with a single entry point. +data BasicBlock = BasicBlock + { bbLabel :: !(Maybe Text) -- ^ Optional label at block start + , bbLines :: ![Line] -- ^ Instructions in the block + , bbSuccs :: ![Text] -- ^ Successor block labels + } deriving (Eq, Show) + +-- | Control flow graph. +data CFG = CFG + { cfgBlocks :: ![BasicBlock] -- ^ All basic blocks + , cfgLabelMap :: !(Map Text Int) -- ^ Label -> block index + , cfgEntry :: !Int -- ^ Entry block index + } deriving (Eq, Show) + +-- | Get all labels defined in the CFG. +blockLabels :: CFG -> Set Text +blockLabels cfg = Set.fromList + [ lbl | bb <- cfgBlocks cfg, Just lbl <- [bbLabel bb] ] + +-- | Build a CFG from parsed assembly lines. +buildCFG :: [Line] -> CFG +buildCFG lns = CFG + { cfgBlocks = blocks + , cfgLabelMap = labelMap + , cfgEntry = 0 + } + where + -- Split into basic blocks at labels and control flow instructions + blocks = buildBlocks lns + labelMap = Map.fromList + [ (lbl, idx) + | (idx, bb) <- zip [0..] blocks + , Just lbl <- [bbLabel bb] + ] + +-- | Build basic blocks from lines. +buildBlocks :: [Line] -> [BasicBlock] +buildBlocks [] = [] +buildBlocks lns = go [] Nothing lns + where + go acc mLabel [] = + if null acc + then [] + else [finishBlock mLabel (reverse acc)] + + go acc mLabel (l:ls) = + case (lineLabel l, lineInstr l) of + -- New label starts a new block + (Just lbl, _) -> + let current = if null acc then [] else [finishBlock mLabel (reverse acc)] + in current ++ go [l] (Just lbl) ls + + -- Control flow instruction ends a block + (Nothing, Just instr) | isTerminator instr -> + let block = finishBlock mLabel (reverse (l:acc)) + in block : go [] Nothing ls + + -- Regular instruction + _ -> go (l:acc) mLabel ls + + finishBlock mLabel lns' = BasicBlock + { bbLabel = mLabel + , bbLines = lns' + , bbSuccs = successorLabels (lastInstr lns') + } + + lastInstr [] = Nothing + lastInstr xs = lineInstr (last xs) + +-- | Check if an instruction is a control flow terminator. +isTerminator :: Instr -> Bool +isTerminator instr = case instr of + B _ -> True + BCond _ _ -> True + Bl _ -> True + Blr _ -> True + Br _ -> True + Ret _ -> True + Cbz _ _ -> True + Cbnz _ _ -> True + Tbz _ _ _ -> True + Tbnz _ _ _ -> True + _ -> False + +-- | Extract successor labels from a terminating instruction. +successorLabels :: Maybe Instr -> [Text] +successorLabels Nothing = [] +successorLabels (Just instr) = case instr of + B lbl -> [lbl] + BCond _ lbl -> [lbl] -- Conditional also falls through + Bl lbl -> [lbl] -- Call returns, so also falls through + Blr _ -> [] -- Indirect call + Br _ -> [] -- Indirect branch + Ret _ -> [] -- Return + Cbz _ lbl -> [lbl] -- Plus fallthrough + Cbnz _ lbl -> [lbl] -- Plus fallthrough + Tbz _ _ lbl -> [lbl] -- Plus fallthrough + Tbnz _ _ lbl -> [lbl] -- Plus fallthrough + _ -> [] diff --git a/lib/Audit/AArch64/Check.hs b/lib/Audit/AArch64/Check.hs @@ -0,0 +1,132 @@ +{-# OPTIONS_HADDOCK prune #-} +{-# LANGUAGE OverloadedStrings #-} + +-- | +-- Module: Audit.AArch64.Check +-- Copyright: (c) 2025 Jared Tobin +-- License: MIT +-- Maintainer: jared@ppad.tech +-- +-- Memory access validation for constant-time properties. +-- +-- Checks that all memory accesses use public base registers and +-- constant (or properly masked) offsets. + +module Audit.AArch64.Check ( + checkLine + , checkBlock + , checkCFG + , AuditResult(..) + ) where + +import Audit.AArch64.CFG +import Audit.AArch64.Taint +import Audit.AArch64.Types +import Data.Text (Text) +import qualified Data.Text as T + +-- | Result of auditing assembly. +data AuditResult = AuditResult + { arViolations :: ![Violation] + , arLinesChecked :: !Int + , arMemoryAccesses :: !Int + } deriving (Eq, Show) + +instance Semigroup AuditResult where + a <> b = AuditResult + { arViolations = arViolations a ++ arViolations b + , arLinesChecked = arLinesChecked a + arLinesChecked b + , arMemoryAccesses = arMemoryAccesses a + arMemoryAccesses b + } + +instance Monoid AuditResult where + mempty = AuditResult [] 0 0 + +-- | Check a single line for memory access violations. +checkLine :: Text -> TaintState -> Line -> AuditResult +checkLine sym st l = case lineInstr l of + Nothing -> AuditResult [] 1 0 + Just instr -> case getMemoryAccess instr of + Nothing -> AuditResult [] 1 0 + Just addr -> + let violations = checkAddrMode sym (lineNum l) instr addr st + in AuditResult violations 1 1 + +-- | Check a basic block, threading taint state. +checkBlock :: Text -> TaintState -> [Line] -> (AuditResult, TaintState) +checkBlock sym st0 lns = go mempty st0 lns + where + go acc st [] = (acc, st) + go acc st (l:ls) = + let result = checkLine sym st l + st' = analyzeLine l st + in go (acc <> result) st' ls + +-- | Check entire CFG. +-- Uses a simple forward pass (no fixpoint iteration for now). +checkCFG :: Text -> CFG -> AuditResult +checkCFG sym cfg = mconcat + [ fst (checkBlock blockSym initTaintState (bbLines bb)) + | bb <- cfgBlocks cfg + , let blockSym = maybe sym id (bbLabel bb) + ] + +-- | Extract memory address from a load/store instruction. +getMemoryAccess :: Instr -> Maybe AddrMode +getMemoryAccess instr = case instr of + Ldr _ addr -> Just addr + Ldrb _ addr -> Just addr + Ldrh _ addr -> Just addr + Ldrsb _ addr -> Just addr + Ldrsh _ addr -> Just addr + Ldrsw _ addr -> Just addr + Str _ addr -> Just addr + Strb _ addr -> Just addr + Strh _ addr -> Just addr + Ldp _ _ addr -> Just addr + Stp _ _ addr -> Just addr + _ -> Nothing + +-- | Check an addressing mode for violations. +checkAddrMode :: Text -> Int -> Instr -> AddrMode -> TaintState -> [Violation] +checkAddrMode sym ln instr addr st = case addr of + BaseImm base _ -> + checkBase sym ln instr base st + + BaseReg base idx -> + checkBase sym ln instr base st ++ + checkIndex sym ln instr idx st + + BaseRegShift base idx _ -> + checkBase sym ln instr base st ++ + checkIndex sym ln instr idx st + + BaseRegExtend base idx _ -> + checkBase sym ln instr base st ++ + checkIndex sym ln instr idx st + + PreIndex base _ -> + checkBase sym ln instr base st + + PostIndex base _ -> + checkBase sym ln instr base st + + Literal _ -> + -- PC-relative literal: public + [] + +-- | Check that base register is public. +checkBase :: Text -> Int -> Instr -> Reg -> TaintState -> [Violation] +checkBase sym ln instr base st = + case getTaint base st of + Public -> [] + Secret -> [Violation sym ln instr (SecretBase base)] + Unknown -> [Violation sym ln instr (UnknownBase base)] + +-- | Check that index register is public. +checkIndex :: Text -> Int -> Instr -> Reg -> TaintState -> [Violation] +checkIndex sym ln instr idx st = + case getTaint idx st of + Public -> [] + Secret -> [Violation sym ln instr (SecretIndex idx)] + Unknown -> [Violation sym ln instr (UnknownIndex idx)] diff --git a/lib/Audit/AArch64/Parser.hs b/lib/Audit/AArch64/Parser.hs @@ -0,0 +1,463 @@ +{-# OPTIONS_HADDOCK prune #-} +{-# LANGUAGE OverloadedStrings #-} + +-- | +-- Module: Audit.AArch64.Parser +-- Copyright: (c) 2025 Jared Tobin +-- License: MIT +-- Maintainer: jared@ppad.tech +-- +-- Parser for AArch64 assembly. + +module Audit.AArch64.Parser ( + parseAsm + , ParseError + ) where + +import Audit.AArch64.Types +import Control.Monad (void) +import Data.Char (isAlphaNum, isDigit, isSpace) +import Data.Text (Text) +import qualified Data.Text as T +import Data.Void (Void) +import Text.Megaparsec +import Text.Megaparsec.Char +import qualified Text.Megaparsec.Char.Lexer as L + +type Parser = Parsec Void Text +type ParseError = ParseErrorBundle Text Void + +-- | Parse assembly source into a list of lines. +parseAsm :: Text -> Either ParseError [Line] +parseAsm = parse (many pLine <* eof) "<asm>" + +-- Space consumer (skips whitespace and comments). +sc :: Parser () +sc = L.space + space1 + (L.skipLineComment "//") + (L.skipBlockComment "/*" "*/") + +-- Lexeme wrapper. +lexeme :: Parser a -> Parser a +lexeme = L.lexeme sc + +-- Symbol (alphanumeric + underscore + dot + dollar). +symbol :: Text -> Parser Text +symbol = L.symbol sc + +-- Parse a single line. +pLine :: Parser Line +pLine = do + pos <- getSourcePos + let ln = unPos (sourceLine pos) + skipMany (satisfy isSpace) + choice + [ pDirectiveLine ln + , pLabelOrInstrLine ln + , pEmptyLine ln + ] + +pEmptyLine :: Int -> Parser Line +pEmptyLine ln = do + void (optional pComment) + void (optional eol) + pure (Line ln Nothing Nothing) + +pComment :: Parser () +pComment = do + void (choice [string "//", string ";", string "@"]) + void (takeWhileP Nothing (/= '\n')) + +pDirectiveLine :: Int -> Parser Line +pDirectiveLine ln = do + void (char '.') + void (takeWhileP Nothing (\c -> c /= '\n')) + void (optional eol) + pure (Line ln Nothing Nothing) + +pLabelOrInstrLine :: Int -> Parser Line +pLabelOrInstrLine ln = do + mLabel <- optional (try pLabel) + skipMany (satisfy isSpace) + mInstr <- optional pInstr + skipMany (satisfy isSpace) + void (optional pComment) + void (optional eol) + pure (Line ln mLabel mInstr) + +pLabel :: Parser Text +pLabel = lexeme $ do + name <- pIdentifier + void (char ':') + pure name + +pIdentifier :: Parser Text +pIdentifier = do + c <- satisfy isIdentStart + cs <- takeWhileP Nothing isIdentChar + pure (T.cons c cs) + where + isIdentStart c = c == '_' || c == '.' || c == 'L' || + (c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z') + isIdentChar c = isAlphaNum c || c == '_' || c == '.' || c == '$' + +pInstr :: Parser Instr +pInstr = do + mnem <- lexeme pMnemonic + parseByMnemonic mnem + +pMnemonic :: Parser Text +pMnemonic = do + cs <- takeWhile1P Nothing isIdentChar + pure (T.toLower cs) + where + isIdentChar c = isAlphaNum c || c == '.' + +parseByMnemonic :: Text -> Parser Instr +parseByMnemonic m = case m of + -- Data movement + "mov" -> Mov <$> pReg <*> (pComma *> pOperand) + "movz" -> Movz <$> pReg <*> (pComma *> pImm) <*> optional (pComma *> pShift) + "movk" -> Movk <$> pReg <*> (pComma *> pImm) <*> optional (pComma *> pShift) + "movn" -> Movn <$> pReg <*> (pComma *> pImm) <*> optional (pComma *> pShift) + + -- Arithmetic + "add" -> Add <$> pReg <*> (pComma *> pReg) <*> (pComma *> pOperand) + "sub" -> Sub <$> pReg <*> (pComma *> pReg) <*> (pComma *> pOperand) + "adds" -> Adds <$> pReg <*> (pComma *> pReg) <*> (pComma *> pOperand) + "subs" -> Subs <$> pReg <*> (pComma *> pReg) <*> (pComma *> pOperand) + "adc" -> Adc <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + "sbc" -> Sbc <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + "neg" -> Neg <$> pReg <*> (pComma *> pOperand) + "mul" -> Mul <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + "madd" -> Madd <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + <*> (pComma *> pReg) + "msub" -> Msub <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + <*> (pComma *> pReg) + "umulh" -> Umulh <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + "smulh" -> Smulh <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + "udiv" -> Udiv <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + "sdiv" -> Sdiv <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + + -- Logical + "and" -> And <$> pReg <*> (pComma *> pReg) <*> (pComma *> pOperand) + "orr" -> Orr <$> pReg <*> (pComma *> pReg) <*> (pComma *> pOperand) + "eor" -> Eor <$> pReg <*> (pComma *> pReg) <*> (pComma *> pOperand) + "bic" -> Bic <$> pReg <*> (pComma *> pReg) <*> (pComma *> pOperand) + "mvn" -> Mvn <$> pReg <*> (pComma *> pOperand) + "tst" -> Tst <$> pReg <*> (pComma *> pOperand) + + -- Shifts + "lsl" -> Lsl <$> pReg <*> (pComma *> pReg) <*> (pComma *> pOperand) + "lsr" -> Lsr <$> pReg <*> (pComma *> pReg) <*> (pComma *> pOperand) + "asr" -> Asr <$> pReg <*> (pComma *> pReg) <*> (pComma *> pOperand) + "ror" -> Ror <$> pReg <*> (pComma *> pReg) <*> (pComma *> pOperand) + + -- Bit manipulation + "ubfx" -> Ubfx <$> pReg <*> (pComma *> pReg) <*> (pComma *> pInt) + <*> (pComma *> pInt) + "sbfx" -> Sbfx <$> pReg <*> (pComma *> pReg) <*> (pComma *> pInt) + <*> (pComma *> pInt) + "bfi" -> Bfi <$> pReg <*> (pComma *> pReg) <*> (pComma *> pInt) + <*> (pComma *> pInt) + "extr" -> Extr <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + <*> (pComma *> pInt) + + -- Address generation + "adr" -> Adr <$> pReg <*> (pComma *> pSymbolRef) + "adrp" -> Adrp <$> pReg <*> (pComma *> pSymbolRef) + + -- Load/Store + "ldr" -> Ldr <$> pReg <*> (pComma *> pAddrMode) + "ldrb" -> Ldrb <$> pReg <*> (pComma *> pAddrMode) + "ldrh" -> Ldrh <$> pReg <*> (pComma *> pAddrMode) + "ldrsb" -> Ldrsb <$> pReg <*> (pComma *> pAddrMode) + "ldrsh" -> Ldrsh <$> pReg <*> (pComma *> pAddrMode) + "ldrsw" -> Ldrsw <$> pReg <*> (pComma *> pAddrMode) + "str" -> Str <$> pReg <*> (pComma *> pAddrMode) + "strb" -> Strb <$> pReg <*> (pComma *> pAddrMode) + "strh" -> Strh <$> pReg <*> (pComma *> pAddrMode) + "ldp" -> Ldp <$> pReg <*> (pComma *> pReg) <*> (pComma *> pAddrMode) + "stp" -> Stp <$> pReg <*> (pComma *> pReg) <*> (pComma *> pAddrMode) + + -- Compare and select + "cmp" -> Cmp <$> pReg <*> (pComma *> pOperand) + "cmn" -> Cmn <$> pReg <*> (pComma *> pOperand) + "csel" -> Csel <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + <*> (pComma *> pCondCode) + "csinc" -> Csinc <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + <*> (pComma *> pCondCode) + "csinv" -> Csinv <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + <*> (pComma *> pCondCode) + "csneg" -> Csneg <$> pReg <*> (pComma *> pReg) <*> (pComma *> pReg) + <*> (pComma *> pCondCode) + "cset" -> Cset <$> pReg <*> (pComma *> pCondCode) + "cinc" -> Cinc <$> pReg <*> (pComma *> pReg) <*> (pComma *> pCondCode) + + -- Branches + "b" -> B <$> pSymbolRef + "b.eq" -> BCond "eq" <$> pSymbolRef + "b.ne" -> BCond "ne" <$> pSymbolRef + "b.lt" -> BCond "lt" <$> pSymbolRef + "b.le" -> BCond "le" <$> pSymbolRef + "b.gt" -> BCond "gt" <$> pSymbolRef + "b.ge" -> BCond "ge" <$> pSymbolRef + "b.hi" -> BCond "hi" <$> pSymbolRef + "b.hs" -> BCond "hs" <$> pSymbolRef + "b.lo" -> BCond "lo" <$> pSymbolRef + "b.ls" -> BCond "ls" <$> pSymbolRef + "b.mi" -> BCond "mi" <$> pSymbolRef + "b.pl" -> BCond "pl" <$> pSymbolRef + "b.vs" -> BCond "vs" <$> pSymbolRef + "b.vc" -> BCond "vc" <$> pSymbolRef + "b.cc" -> BCond "cc" <$> pSymbolRef + "b.cs" -> BCond "cs" <$> pSymbolRef + "bl" -> Bl <$> pSymbolRef + "blr" -> Blr <$> pReg + "br" -> Br <$> pReg + "ret" -> Ret <$> optional pReg + "cbz" -> Cbz <$> pReg <*> (pComma *> pSymbolRef) + "cbnz" -> Cbnz <$> pReg <*> (pComma *> pSymbolRef) + "tbz" -> Tbz <$> pReg <*> (pComma *> pInt) <*> (pComma *> pSymbolRef) + "tbnz" -> Tbnz <$> pReg <*> (pComma *> pInt) <*> (pComma *> pSymbolRef) + + -- System + "nop" -> pure Nop + "svc" -> Svc <$> pImm + + -- Fallback: store as Other + _ -> Other m <$> pRemainingOperands + +pRemainingOperands :: Parser [Text] +pRemainingOperands = do + rest <- takeWhileP Nothing (\c -> c /= '\n' && c /= ';' && c /= '/') + let trimmed = T.strip rest + if T.null trimmed + then pure [] + else pure (map T.strip (T.splitOn "," trimmed)) + +pComma :: Parser () +pComma = void (lexeme (char ',')) + +pReg :: Parser Reg +pReg = lexeme $ choice + [ pXReg, pWReg, pDReg, pSReg, pQReg, pSpecialReg ] + +pXReg :: Parser Reg +pXReg = do + void (char' 'x') + n <- L.decimal + case n :: Int of + 0 -> pure X0; 1 -> pure X1; 2 -> pure X2; 3 -> pure X3 + 4 -> pure X4; 5 -> pure X5; 6 -> pure X6; 7 -> pure X7 + 8 -> pure X8; 9 -> pure X9; 10 -> pure X10; 11 -> pure X11 + 12 -> pure X12; 13 -> pure X13; 14 -> pure X14; 15 -> pure X15 + 16 -> pure X16; 17 -> pure X17; 18 -> pure X18; 19 -> pure X19 + 20 -> pure X20; 21 -> pure X21; 22 -> pure X22; 23 -> pure X23 + 24 -> pure X24; 25 -> pure X25; 26 -> pure X26; 27 -> pure X27 + 28 -> pure X28; 29 -> pure X29; 30 -> pure X30 + _ -> fail "invalid x register number" + +pWReg :: Parser Reg +pWReg = do + void (char' 'w') + n <- L.decimal + case n :: Int of + 0 -> pure W0; 1 -> pure W1; 2 -> pure W2; 3 -> pure W3 + 4 -> pure W4; 5 -> pure W5; 6 -> pure W6; 7 -> pure W7 + 8 -> pure W8; 9 -> pure W9; 10 -> pure W10; 11 -> pure W11 + 12 -> pure W12; 13 -> pure W13; 14 -> pure W14; 15 -> pure W15 + 16 -> pure W16; 17 -> pure W17; 18 -> pure W18; 19 -> pure W19 + 20 -> pure W20; 21 -> pure W21; 22 -> pure W22; 23 -> pure W23 + 24 -> pure W24; 25 -> pure W25; 26 -> pure W26; 27 -> pure W27 + 28 -> pure W28; 29 -> pure W29; 30 -> pure W30 + _ -> fail "invalid w register number" + +pDReg :: Parser Reg +pDReg = do + void (char' 'd') + n <- L.decimal + case n :: Int of + 0 -> pure D0; 1 -> pure D1; 2 -> pure D2; 3 -> pure D3 + 4 -> pure D4; 5 -> pure D5; 6 -> pure D6; 7 -> pure D7 + 8 -> pure D8; 9 -> pure D9; 10 -> pure D10; 11 -> pure D11 + 12 -> pure D12; 13 -> pure D13; 14 -> pure D14; 15 -> pure D15 + 16 -> pure D16; 17 -> pure D17; 18 -> pure D18; 19 -> pure D19 + 20 -> pure D20; 21 -> pure D21; 22 -> pure D22; 23 -> pure D23 + 24 -> pure D24; 25 -> pure D25; 26 -> pure D26; 27 -> pure D27 + 28 -> pure D28; 29 -> pure D29; 30 -> pure D30; 31 -> pure D31 + _ -> fail "invalid d register number" + +pSReg :: Parser Reg +pSReg = do + void (char' 's') + n <- L.decimal + case n :: Int of + 0 -> pure S0; 1 -> pure S1; 2 -> pure S2; 3 -> pure S3 + 4 -> pure S4; 5 -> pure S5; 6 -> pure S6; 7 -> pure S7 + 8 -> pure S8; 9 -> pure S9; 10 -> pure S10; 11 -> pure S11 + 12 -> pure S12; 13 -> pure S13; 14 -> pure S14; 15 -> pure S15 + 16 -> pure S16; 17 -> pure S17; 18 -> pure S18; 19 -> pure S19 + 20 -> pure S20; 21 -> pure S21; 22 -> pure S22; 23 -> pure S23 + 24 -> pure S24; 25 -> pure S25; 26 -> pure S26; 27 -> pure S27 + 28 -> pure S28; 29 -> pure S29; 30 -> pure S30; 31 -> pure S31 + _ -> fail "invalid s register number" + +pQReg :: Parser Reg +pQReg = do + void (char' 'q') + n <- L.decimal + case n :: Int of + 0 -> pure Q0; 1 -> pure Q1; 2 -> pure Q2; 3 -> pure Q3 + 4 -> pure Q4; 5 -> pure Q5; 6 -> pure Q6; 7 -> pure Q7 + 8 -> pure Q8; 9 -> pure Q9; 10 -> pure Q10; 11 -> pure Q11 + 12 -> pure Q12; 13 -> pure Q13; 14 -> pure Q14; 15 -> pure Q15 + 16 -> pure Q16; 17 -> pure Q17; 18 -> pure Q18; 19 -> pure Q19 + 20 -> pure Q20; 21 -> pure Q21; 22 -> pure Q22; 23 -> pure Q23 + 24 -> pure Q24; 25 -> pure Q25; 26 -> pure Q26; 27 -> pure Q27 + 28 -> pure Q28; 29 -> pure Q29; 30 -> pure Q30; 31 -> pure Q31 + _ -> fail "invalid q register number" + +pSpecialReg :: Parser Reg +pSpecialReg = choice + [ SP <$ string' "sp" + , XZR <$ string' "xzr" + , WZR <$ string' "wzr" + , X29 <$ string' "fp" -- fp is alias for x29 + , X30 <$ string' "lr" -- lr is alias for x30 + ] + +pOperand :: Parser Operand +pOperand = choice + [ try (OpAddr <$> pAddrMode) + , try pShiftedRegOp + , OpImm <$> pImm + , OpReg <$> pReg + , OpLabel <$> pSymbolRef + ] + +pShiftedRegOp :: Parser Operand +pShiftedRegOp = do + r <- pReg + pComma + sh <- pShift + pure (OpShiftedReg r sh) + +pImm :: Parser Integer +pImm = lexeme $ do + void (optional (char '#')) + signed L.decimal + +signed :: Parser Integer -> Parser Integer +signed p = do + neg <- optional (char '-') + n <- p + case neg of + Just _ -> pure (negate n) + Nothing -> pure n + +pInt :: Parser Int +pInt = lexeme $ do + void (optional (char '#')) + fromInteger <$> signed L.decimal + +pShift :: Parser Shift +pShift = lexeme $ choice + [ LSL <$> (string' "lsl" *> sc *> pShiftAmt) + , LSR <$> (string' "lsr" *> sc *> pShiftAmt) + , ASR <$> (string' "asr" *> sc *> pShiftAmt) + ] + +pShiftAmt :: Parser Int +pShiftAmt = do + void (optional (char '#')) + fromInteger <$> L.decimal + +pSymbolRef :: Parser Text +pSymbolRef = lexeme $ do + -- Handle =literal, @PAGE, @PAGEOFF suffixes, etc. + prefix <- optional (char '=') + name <- pIdentifier + suffix <- optional pSymbolSuffix + let base = maybe name (T.cons '=') prefix <> name + pure (maybe base (base <>) suffix) + +pSymbolSuffix :: Parser Text +pSymbolSuffix = do + void (char '@') + suf <- takeWhile1P Nothing isAlphaNum + pure (T.cons '@' suf) + +pCondCode :: Parser Text +pCondCode = lexeme $ takeWhile1P Nothing isAlphaNum + +pAddrMode :: Parser AddrMode +pAddrMode = choice + [ try pLiteralAddr + , pBracketAddr + ] + +pLiteralAddr :: Parser AddrMode +pLiteralAddr = Literal <$> (char '=' *> pIdentifier) + +pBracketAddr :: Parser AddrMode +pBracketAddr = lexeme $ do + void (char '[') + sc + base <- pReg + sc + mode <- optional (pComma *> pAddrModeRest base) + sc + void (char ']') + -- Check for pre/post index + postIdx <- optional pPostIndex + case (mode, postIdx) of + (Nothing, Nothing) -> pure (BaseImm base 0) + (Nothing, Just imm) -> pure (PostIndex base imm) + (Just (Left imm, True), Nothing) -> pure (PreIndex base imm) + (Just (Left imm, False), Nothing) -> pure (BaseImm base imm) + (Just (Right (r, Nothing), _), Nothing) -> pure (BaseReg base r) + (Just (Right (r, Just (Left sh)), _), Nothing) -> pure (BaseRegShift base r sh) + (Just (Right (r, Just (Right ext)), _), Nothing) -> pure (BaseRegExtend base r ext) + _ -> fail "invalid addressing mode" + +pAddrModeRest :: Reg -> Parser (Either Integer (Reg, Maybe (Either Shift Extend)), Bool) +pAddrModeRest _ = do + choice + [ try pImmOffset + , pRegOffset + ] + +pImmOffset :: Parser (Either Integer (Reg, Maybe (Either Shift Extend)), Bool) +pImmOffset = do + imm <- pImm + preIdx <- optional (char '!') + pure (Left imm, maybe False (const True) preIdx) + +pRegOffset :: Parser (Either Integer (Reg, Maybe (Either Shift Extend)), Bool) +pRegOffset = do + r <- pReg + modifier <- optional (pComma *> pShiftOrExtend) + pure (Right (r, modifier), False) + +pShiftOrExtend :: Parser (Either Shift Extend) +pShiftOrExtend = choice + [ Left <$> pShift + , Right <$> pExtend + ] + +pExtend :: Parser Extend +pExtend = lexeme $ choice + [ UXTW <$ string' "uxtw" + , SXTW <$ string' "sxtw" + , UXTX <$ string' "uxtx" + , SXTX <$ string' "sxtx" + ] + +pPostIndex :: Parser Integer +pPostIndex = do + sc + void (char ',') + sc + pImm diff --git a/lib/Audit/AArch64/Taint.hs b/lib/Audit/AArch64/Taint.hs @@ -0,0 +1,215 @@ +{-# OPTIONS_HADDOCK prune #-} +{-# LANGUAGE OverloadedStrings #-} + +-- | +-- Module: Audit.AArch64.Taint +-- Copyright: (c) 2025 Jared Tobin +-- License: MIT +-- Maintainer: jared@ppad.tech +-- +-- Taint analysis for AArch64 registers. +-- +-- Implements forward dataflow analysis to track register "publicness". +-- Public registers are those derived from known-safe sources (stack +-- pointers, heap pointers, constants). Registers with unknown or +-- secret-derived values are flagged when used in memory addressing. + +module Audit.AArch64.Taint ( + TaintState + , initTaintState + , analyzeLine + , analyzeBlock + , getTaint + , publicRoots + ) where + +import Audit.AArch64.Types +import Data.Map.Strict (Map) +import qualified Data.Map.Strict as Map + +-- | Taint state: maps registers to their publicness. +type TaintState = Map Reg Taint + +-- | GHC 9.10.3 AArch64 public root registers. +-- +-- These are known to be public (derived from stack/heap pointers or +-- constants): +-- +-- - SP: hardware stack pointer +-- - X29/FP: frame pointer +-- - X19: GHC Base register +-- - X20: GHC Sp (STG stack pointer) +-- - X21: GHC Hp (heap pointer) +-- - X28: GHC SpLim (stack limit) +-- - X18: TLS/platform register (Darwin) +-- - XZR/WZR: zero registers +publicRoots :: [Reg] +publicRoots = + [ SP, X29 -- Hardware stack/frame pointers + , X19 -- GHC Base + , X20 -- GHC Sp (STG stack pointer) + , X21 -- GHC Hp (heap pointer) + , X28 -- GHC SpLim + , X18 -- TLS (Darwin platform register) + , XZR, WZR -- Zero registers + ] + +-- | Initial taint state with public roots marked. +initTaintState :: TaintState +initTaintState = Map.fromList [(r, Public) | r <- publicRoots] + +-- | Get the taint of a register. +getTaint :: Reg -> TaintState -> Taint +getTaint r st = Map.findWithDefault Unknown r st + +-- | Analyze a single line, updating taint state. +analyzeLine :: Line -> TaintState -> TaintState +analyzeLine l st = case lineInstr l of + Nothing -> st + Just instr -> transfer instr st + +-- | Analyze a basic block, threading taint state through. +analyzeBlock :: [Line] -> TaintState -> TaintState +analyzeBlock lns st = foldl (flip analyzeLine) st lns + +-- | Transfer function for taint analysis. +-- +-- For each instruction, determine how it affects register taints. +transfer :: Instr -> TaintState -> TaintState +transfer instr st = case instr of + -- Move: destination gets source taint + Mov dst op -> setTaint dst (operandTaint op st) st + Movz dst _ _ -> setTaint dst Public st -- Immediate is public + Movk dst _ _ -> st -- Keeps existing value, modifies bits + Movn dst _ _ -> setTaint dst Public st -- Immediate is public + + -- Arithmetic: result is join of operand taints + Add dst r1 op -> setTaint dst (join2 (getTaint r1 st) (operandTaint op st)) st + Sub dst r1 op -> setTaint dst (join2 (getTaint r1 st) (operandTaint op st)) st + Adds dst r1 op -> setTaint dst (join2 (getTaint r1 st) (operandTaint op st)) st + Subs dst r1 op -> setTaint dst (join2 (getTaint r1 st) (operandTaint op st)) st + Adc dst r1 r2 -> setTaint dst (join2 (getTaint r1 st) (getTaint r2 st)) st + Sbc dst r1 r2 -> setTaint dst (join2 (getTaint r1 st) (getTaint r2 st)) st + Neg dst op -> setTaint dst (operandTaint op st) st + Mul dst r1 r2 -> setTaint dst (join2 (getTaint r1 st) (getTaint r2 st)) st + Madd dst r1 r2 r3 -> setTaint dst (join3 (getTaint r1 st) (getTaint r2 st) + (getTaint r3 st)) st + Msub dst r1 r2 r3 -> setTaint dst (join3 (getTaint r1 st) (getTaint r2 st) + (getTaint r3 st)) st + Umulh dst r1 r2 -> setTaint dst (join2 (getTaint r1 st) (getTaint r2 st)) st + Smulh dst r1 r2 -> setTaint dst (join2 (getTaint r1 st) (getTaint r2 st)) st + Udiv dst r1 r2 -> setTaint dst (join2 (getTaint r1 st) (getTaint r2 st)) st + Sdiv dst r1 r2 -> setTaint dst (join2 (getTaint r1 st) (getTaint r2 st)) st + + -- Logical: result is join of operand taints + And dst r1 op -> setTaint dst (join2 (getTaint r1 st) (operandTaint op st)) st + Orr dst r1 op -> setTaint dst (join2 (getTaint r1 st) (operandTaint op st)) st + Eor dst r1 op -> setTaint dst (join2 (getTaint r1 st) (operandTaint op st)) st + Bic dst r1 op -> setTaint dst (join2 (getTaint r1 st) (operandTaint op st)) st + Mvn dst op -> setTaint dst (operandTaint op st) st + Tst _ _ -> st -- No destination + + -- Shifts: result is join of operand taints + Lsl dst r1 op -> setTaint dst (join2 (getTaint r1 st) (operandTaint op st)) st + Lsr dst r1 op -> setTaint dst (join2 (getTaint r1 st) (operandTaint op st)) st + Asr dst r1 op -> setTaint dst (join2 (getTaint r1 st) (operandTaint op st)) st + Ror dst r1 op -> setTaint dst (join2 (getTaint r1 st) (operandTaint op st)) st + + -- Bit manipulation + Ubfx dst r1 _ _ -> setTaint dst (getTaint r1 st) st + Sbfx dst r1 _ _ -> setTaint dst (getTaint r1 st) st + Bfi dst r1 _ _ -> setTaint dst (join2 (getTaint dst st) (getTaint r1 st)) st + Extr dst r1 r2 _ -> setTaint dst (join2 (getTaint r1 st) (getTaint r2 st)) st + + -- Address generation: result is public (constant pool / PC-relative) + Adr dst _ -> setTaint dst Public st + Adrp dst _ -> setTaint dst Public st + + -- Loads: destination becomes Unknown (memory content is unknown) + -- This is conservative; could refine if we track memory regions + Ldr dst _ -> setTaint dst Unknown st + Ldrb dst _ -> setTaint dst Unknown st + Ldrh dst _ -> setTaint dst Unknown st + Ldrsb dst _ -> setTaint dst Unknown st + Ldrsh dst _ -> setTaint dst Unknown st + Ldrsw dst _ -> setTaint dst Unknown st + Ldp dst1 dst2 _ -> setTaint dst1 Unknown (setTaint dst2 Unknown st) + + -- Stores: no destination register change + Str _ _ -> st + Strb _ _ -> st + Strh _ _ -> st + Stp _ _ _ -> st + + -- Conditionals: conservative join + Cmp _ _ -> st + Cmn _ _ -> st + Csel dst r1 r2 _ -> setTaint dst (join2 (getTaint r1 st) (getTaint r2 st)) st + Csinc dst r1 r2 _ -> setTaint dst (join2 (getTaint r1 st) (getTaint r2 st)) st + Csinv dst r1 r2 _ -> setTaint dst (join2 (getTaint r1 st) (getTaint r2 st)) st + Csneg dst r1 r2 _ -> setTaint dst (join2 (getTaint r1 st) (getTaint r2 st)) st + Cset dst _ -> setTaint dst Public st -- Condition flag derived + Cinc dst r1 _ -> setTaint dst (getTaint r1 st) st + + -- Branches: no register change + B _ -> st + BCond _ _ -> st + Bl _ -> invalidateCallerSaved st -- Call may clobber + Blr _ -> invalidateCallerSaved st + Br _ -> st + Ret _ -> st + Cbz _ _ -> st + Cbnz _ _ -> st + Tbz _ _ _ -> st + Tbnz _ _ _ -> st + + -- System + Nop -> st + Svc _ -> invalidateCallerSaved st + + -- Unknown instruction: conservative + Other _ _ -> st + +-- | Set taint for a register. +setTaint :: Reg -> Taint -> TaintState -> TaintState +setTaint = Map.insert + +-- | Get taint of an operand. +operandTaint :: Operand -> TaintState -> Taint +operandTaint op st = case op of + OpReg r -> getTaint r st + OpImm _ -> Public + OpShiftedReg r _ -> getTaint r st + OpExtendedReg r _ -> getTaint r st + OpLabel _ -> Public + OpAddr addr -> addrBaseTaint addr st + +-- | Get taint of address base register. +addrBaseTaint :: AddrMode -> TaintState -> Taint +addrBaseTaint addr st = case addr of + BaseImm r _ -> getTaint r st + BaseReg r _ -> getTaint r st + BaseRegShift r _ _ -> getTaint r st + BaseRegExtend r _ _ -> getTaint r st + PreIndex r _ -> getTaint r st + PostIndex r _ -> getTaint r st + Literal _ -> Public + +-- | Join two taints. +join2 :: Taint -> Taint -> Taint +join2 = joinTaint + +-- | Join three taints. +join3 :: Taint -> Taint -> Taint -> Taint +join3 a b c = joinTaint a (joinTaint b c) + +-- | Invalidate caller-saved registers after a call. +-- Per AArch64 ABI, x0-x17 are caller-saved. +invalidateCallerSaved :: TaintState -> TaintState +invalidateCallerSaved st = foldr (\r -> Map.insert r Unknown) st callerSaved + where + callerSaved = + [ X0, X1, X2, X3, X4, X5, X6, X7 + , X8, X9, X10, X11, X12, X13, X14, X15 + , X16, X17 + ] diff --git a/lib/Audit/AArch64/Types.hs b/lib/Audit/AArch64/Types.hs @@ -0,0 +1,271 @@ +{-# OPTIONS_HADDOCK prune #-} +{-# LANGUAGE DeriveGeneric #-} +{-# LANGUAGE OverloadedStrings #-} + +-- | +-- Module: Audit.AArch64.Types +-- Copyright: (c) 2025 Jared Tobin +-- License: MIT +-- Maintainer: jared@ppad.tech +-- +-- Core types for AArch64 assembly analysis. + +module Audit.AArch64.Types ( + -- * Registers + Reg(..) + , regName + + -- * Operands and addressing + , Shift(..) + , Extend(..) + , Operand(..) + , AddrMode(..) + + -- * Instructions + , Instr(..) + , Line(..) + + -- * Taint lattice + , Taint(..) + , joinTaint + + -- * Analysis results + , Violation(..) + , ViolationReason(..) + ) where + +import Data.Aeson (ToJSON(..), (.=), object) +import Data.Text (Text) +import qualified Data.Text as T +import GHC.Generics (Generic) + +-- | AArch64 registers (64-bit, 32-bit views, and SIMD). +data Reg + -- 64-bit general purpose + = X0 | X1 | X2 | X3 | X4 | X5 | X6 | X7 + | X8 | X9 | X10 | X11 | X12 | X13 | X14 | X15 + | X16 | X17 | X18 | X19 | X20 | X21 | X22 | X23 + | X24 | X25 | X26 | X27 | X28 | X29 | X30 + -- 32-bit views + | W0 | W1 | W2 | W3 | W4 | W5 | W6 | W7 + | W8 | W9 | W10 | W11 | W12 | W13 | W14 | W15 + | W16 | W17 | W18 | W19 | W20 | W21 | W22 | W23 + | W24 | W25 | W26 | W27 | W28 | W29 | W30 + -- Special + | SP -- ^ Stack pointer + | XZR -- ^ Zero register (64-bit) + | WZR -- ^ Zero register (32-bit) + -- SIMD/FP (64-bit double) + | D0 | D1 | D2 | D3 | D4 | D5 | D6 | D7 + | D8 | D9 | D10 | D11 | D12 | D13 | D14 | D15 + | D16 | D17 | D18 | D19 | D20 | D21 | D22 | D23 + | D24 | D25 | D26 | D27 | D28 | D29 | D30 | D31 + -- SIMD/FP (32-bit single) + | S0 | S1 | S2 | S3 | S4 | S5 | S6 | S7 + | S8 | S9 | S10 | S11 | S12 | S13 | S14 | S15 + | S16 | S17 | S18 | S19 | S20 | S21 | S22 | S23 + | S24 | S25 | S26 | S27 | S28 | S29 | S30 | S31 + -- SIMD/FP (128-bit quad) + | Q0 | Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 + | Q8 | Q9 | Q10 | Q11 | Q12 | Q13 | Q14 | Q15 + | Q16 | Q17 | Q18 | Q19 | Q20 | Q21 | Q22 | Q23 + | Q24 | Q25 | Q26 | Q27 | Q28 | Q29 | Q30 | Q31 + deriving (Eq, Ord, Show, Generic) + +instance ToJSON Reg where + toJSON = toJSON . regName + +-- | Canonical lowercase register name. +regName :: Reg -> Text +regName r = T.toLower (T.pack (show r)) + +-- | Shift operations for indexed addressing. +data Shift + = LSL !Int -- ^ Logical shift left + | LSR !Int -- ^ Logical shift right + | ASR !Int -- ^ Arithmetic shift right + deriving (Eq, Show, Generic) + +instance ToJSON Shift + +-- | Extend operations. +data Extend + = UXTW -- ^ Unsigned extend word + | SXTW -- ^ Signed extend word + | UXTX -- ^ Unsigned extend doubleword + | SXTX -- ^ Signed extend doubleword + deriving (Eq, Show, Generic) + +instance ToJSON Extend + +-- | Instruction operand. +data Operand + = OpReg !Reg -- ^ Register operand + | OpImm !Integer -- ^ Immediate value + | OpShiftedReg !Reg !Shift -- ^ Shifted register + | OpExtendedReg !Reg !Extend -- ^ Extended register + | OpLabel !Text -- ^ Label reference + | OpAddr !AddrMode -- ^ Memory address + deriving (Eq, Show, Generic) + +instance ToJSON Operand + +-- | Memory addressing modes. +data AddrMode + = BaseImm !Reg !Integer -- ^ [xN, #imm] or [xN] + | BaseReg !Reg !Reg -- ^ [xN, xM] + | BaseRegShift !Reg !Reg !Shift -- ^ [xN, xM, lsl #k] + | BaseRegExtend !Reg !Reg !Extend -- ^ [xN, xM, sxtw] + | PreIndex !Reg !Integer -- ^ [xN, #imm]! + | PostIndex !Reg !Integer -- ^ [xN], #imm + | Literal !Text -- ^ =label or PC-relative + deriving (Eq, Show, Generic) + +instance ToJSON AddrMode + +-- | Parsed assembly instruction. +data Instr + -- Data movement + = Mov !Reg !Operand + | Movz !Reg !Integer !(Maybe Shift) + | Movk !Reg !Integer !(Maybe Shift) + | Movn !Reg !Integer !(Maybe Shift) + -- Arithmetic + | Add !Reg !Reg !Operand + | Sub !Reg !Reg !Operand + | Adds !Reg !Reg !Operand + | Subs !Reg !Reg !Operand + | Adc !Reg !Reg !Reg + | Sbc !Reg !Reg !Reg + | Neg !Reg !Operand + | Mul !Reg !Reg !Reg + | Madd !Reg !Reg !Reg !Reg + | Msub !Reg !Reg !Reg !Reg + | Umulh !Reg !Reg !Reg + | Smulh !Reg !Reg !Reg + | Udiv !Reg !Reg !Reg + | Sdiv !Reg !Reg !Reg + -- Logical + | And !Reg !Reg !Operand + | Orr !Reg !Reg !Operand + | Eor !Reg !Reg !Operand + | Bic !Reg !Reg !Operand + | Mvn !Reg !Operand + | Tst !Reg !Operand + -- Shifts + | Lsl !Reg !Reg !Operand + | Lsr !Reg !Reg !Operand + | Asr !Reg !Reg !Operand + | Ror !Reg !Reg !Operand + -- Bit manipulation + | Ubfx !Reg !Reg !Int !Int + | Sbfx !Reg !Reg !Int !Int + | Bfi !Reg !Reg !Int !Int + | Extr !Reg !Reg !Reg !Int + -- Address generation + | Adr !Reg !Text + | Adrp !Reg !Text + -- Load/Store + | Ldr !Reg !AddrMode + | Ldrb !Reg !AddrMode + | Ldrh !Reg !AddrMode + | Ldrsb !Reg !AddrMode + | Ldrsh !Reg !AddrMode + | Ldrsw !Reg !AddrMode + | Str !Reg !AddrMode + | Strb !Reg !AddrMode + | Strh !Reg !AddrMode + | Ldp !Reg !Reg !AddrMode + | Stp !Reg !Reg !AddrMode + -- Compare and select + | Cmp !Reg !Operand + | Cmn !Reg !Operand + | Csel !Reg !Reg !Reg !Text -- condition code + | Csinc !Reg !Reg !Reg !Text + | Csinv !Reg !Reg !Reg !Text + | Csneg !Reg !Reg !Reg !Text + | Cset !Reg !Text + | Cinc !Reg !Reg !Text + -- Branches + | B !Text + | BCond !Text !Text -- condition, target + | Bl !Text + | Blr !Reg + | Br !Reg + | Ret !(Maybe Reg) + | Cbz !Reg !Text + | Cbnz !Reg !Text + | Tbz !Reg !Int !Text + | Tbnz !Reg !Int !Text + -- System + | Nop + | Svc !Integer + -- Other (unparsed but recorded) + | Other !Text ![Text] + deriving (Eq, Show, Generic) + +instance ToJSON Instr + +-- | A line of assembly (with source location). +data Line = Line + { lineNum :: !Int + , lineLabel :: !(Maybe Text) + , lineInstr :: !(Maybe Instr) + } deriving (Eq, Show, Generic) + +instance ToJSON Line + +-- | Taint lattice for register publicness. +data Taint + = Public -- ^ Known to be public (derived from stack/heap pointers) + | Secret -- ^ Known or assumed to be secret + | Unknown -- ^ Not yet determined + deriving (Eq, Ord, Show, Generic) + +instance ToJSON Taint + +-- | Join operation for taint lattice (least upper bound). +-- Unknown < Public < Secret +joinTaint :: Taint -> Taint -> Taint +joinTaint Secret _ = Secret +joinTaint _ Secret = Secret +joinTaint Public _ = Public +joinTaint _ Public = Public +joinTaint Unknown Unknown = Unknown + +-- | Reason for a violation. +data ViolationReason + = SecretBase !Reg -- ^ Base register is secret + | SecretIndex !Reg -- ^ Index register is secret + | UnknownBase !Reg -- ^ Base register taint unknown + | UnknownIndex !Reg -- ^ Index register taint unknown + | NonConstOffset -- ^ Non-constant offset without masking + deriving (Eq, Show, Generic) + +instance ToJSON ViolationReason where + toJSON (SecretBase r) = + object ["type" .= ("secret_base" :: Text), "register" .= r] + toJSON (SecretIndex r) = + object ["type" .= ("secret_index" :: Text), "register" .= r] + toJSON (UnknownBase r) = + object ["type" .= ("unknown_base" :: Text), "register" .= r] + toJSON (UnknownIndex r) = + object ["type" .= ("unknown_index" :: Text), "register" .= r] + toJSON NonConstOffset = + object ["type" .= ("non_const_offset" :: Text)] + +-- | A memory access violation. +data Violation = Violation + { vSymbol :: !Text + , vLine :: !Int + , vInstr :: !Instr + , vReason :: !ViolationReason + } deriving (Eq, Show, Generic) + +instance ToJSON Violation where + toJSON v = object + [ "symbol" .= vSymbol v + , "line" .= vLine v + , "instr" .= show (vInstr v) + , "reason" .= vReason v + ] diff --git a/plans/ARCH0.md b/plans/ARCH0.md @@ -0,0 +1,101 @@ +# ARCH0: AArch64 ASM Constant-Time Memory Access Audit + +## Goal + +Prototype a static analyzer that ingests GHC+LLVM aarch64 assembly +and proves a narrow property: all memory accesses are based on public +registers plus constant offsets (or other whitelisted addressing +modes), and no loads/stores use secret-derived addresses. + +This is an automated sanity check, not a full constant-time proof. + +## Scope + +- Target: aarch64-darwin GHC output (LLVM backend), no inline asm. +- Input: text assembly dumps (.s or llc output) with symbols. +- Property: memory access address derivations are public and bounded. +- Non-goals: micro-architectural timing proof, full CT branching proof. + +## Assumptions + +- GHC calling convention for aarch64 is stable enough to whitelist + public base registers (e.g. sp, STG stack/heap pointers, RTS regs). +- LLVM does not introduce secret-dependent memory addressing without + explicit arithmetic in the asm. +- External calls are treated as opaque unless summarized. + +## Prototype Architecture + +1) Parser +- Tokenize instructions, labels, and directives from .s. +- Extract instruction mnemonics and operands. +- Normalize addressing modes: [xN, #imm], [xN, xM], [xN, xM, lsl #k]. + +2) CFG and basic blocks +- Identify basic blocks by labels and control flow instructions. +- Build a simple CFG for forward dataflow; no SSA needed. + +3) Taint lattice (publicness) +- States per register: Public, Secret, Unknown. +- Initialize public roots: sp, fp, STG stack/heap pointers, TLS regs, + and known constant pools; all others Unknown. +- Propagate for arithmetic and moves (e.g. add, sub, mov, lsl, orr). +- Conservative joins across CFG edges. + +4) Memory access rules +- For each load/store: + - Base register must be Public. + - Index register (if any) must be Public and either constant or + masked to a fixed range (configurable). + - Offset must be constant immediate unless whitelisted pattern. +- Flag any instruction that violates the above. + +5) Whitelisting and summaries +- Config file for: + - public root registers per platform and GHC version. + - allowed addressing modes. + - known-safe helper symbols (e.g. RTS/GC stubs). + +6) Reporting +- Emit a report with: + - first offending instruction per symbol, + - reason (base not public, non-const offset, etc.), + - suggested remediation. + +## Prototype Rule Set (initial) + +- Public roots: sp, fp/x29, x28 (STG stack ptr), x27 (STG heap ptr), + x19-x26 when used as GHC-prescribed callee-saves, x18 (TLS). +- Public constants: literal pools, constant addresses from adrp/add. +- Allowed addr modes: + - [xN, #imm] + - [xN, xM] only if xM is Public and range-checked via and/lsr mask + - [xN, xM, lsl #k] only if xM is Public and range-checked + +## Prototype Flow + +- Input: .s file +- Output: JSON + text summary +- CLI: arch-audit --ghc-aarch64 --input foo.s --config cfg.toml + +## Validation Plan + +- Create small, controlled asm fixtures: + - known-good (stack and GHC heap accesses only) + - known-bad (secret-derived base or index) +- Compare findings to manual inspection of ppad-fixed and ppad-secp256k1 + dumps; check for false positives and negatives. + +## Risks and Limitations + +- Register reuse across calls makes publicness hard to track without + interprocedural summaries. +- LLVM may fold complex addressing in ways that require more semantics. +- This does not prove constant-time; it only enforces a strict memory + access policy. + +## Next Steps + +- Confirm the exact GHC aarch64 register roles to seed the whitelist. +- Pick a parsing strategy (simple regex vs. small grammar). +- Implement a minimal dataflow engine and run on one target dump. diff --git a/plans/IMPL0.md b/plans/IMPL0.md @@ -0,0 +1,67 @@ +# IMPL0: AArch64 ASM Constant-Time Memory Access Audit Tool + +## Goal + +Implement a Haskell (GHC 9.10.3) prototype static analyzer for +AArch64 GHC+LLVM assembly that enforces a strict memory-access policy +(public base register + constant offset or whitelisted addressing). + +## Inputs and References + +- Model project structure, flake layout, and cabal style after + `../sha256/flake.nix` and `../sha256/ppad-sha256.cabal`. +- Figure out the register map to use for GHC 9.10.3 by locating the + public AArch64 calling-convention details in GHC headers. + +## Implementation Steps + +1) Scaffold project layout +- Mirror ppad project structure (lib/, test/, bench/, etc.). +- Draft `flake.nix` patterned after `../sha256/flake.nix`, tailored to + this tool (GHC 9.10.3, LLVM 19, cabal). +- Draft the `.cabal` file patterned after `../sha256/ppad-sha256.cabal`. + +2) Register map research (GHC 9.10.3) +- Locate the AArch64 calling convention definitions in GHC source, + likely in RTS headers or codegen headers (e.g. `rts/include/` or + `compiler/`), and confirm register roles for: + - STG stack pointer, heap pointer, TLS, base reg, etc. +- Produce a concrete register whitelist and document it in code and + README. + +3) Parser and instruction model +- Implement a minimal assembly tokenizer/parser for AArch64 `.s`. +- Normalize addressing modes into a small sum type. +- Represent labels, instructions, and operands with explicit types. + +4) CFG and dataflow engine +- Build basic blocks and a CFG. +- Implement a simple forward dataflow analysis with a 3-point lattice: + Public | Secret | Unknown. +- Initialize public roots using the GHC register map. + +5) Memory access audit rules +- Check all loads/stores for base/index publicness and constant offset. +- Add a configurable whitelist for safe symbols and addressing patterns. +- Emit structured findings (symbol, line, reason). + +6) CLI and reporting +- Provide a small CLI (input asm path, config path, output format). +- Emit both a human summary and JSON output for automation. + +7) Tests and fixtures +- Create small assembly fixtures with known-good and known-bad accesses. +- Add unit tests for parser, CFG builder, and audit rules. + +## Deliverables + +- Haskell library with core analysis logic in `lib/`. +- CLI executable in `app/`. +- Initial config file (TOML or YAML) with register map and whitelist. +- Tests under `test/` and minimal fixtures under `test/fixtures/`. + +## Open Questions + +- Exact GHC 9.10.3 AArch64 register map and public-root definitions. +- Whether to depend on a parsing library (megaparsec) or keep it minimal + with hand-rolled parsing to avoid extra deps. diff --git a/ppad-auditor.cabal b/ppad-auditor.cabal @@ -0,0 +1,67 @@ +cabal-version: 3.0 +name: ppad-auditor +version: 0.1.0 +synopsis: AArch64 constant-time memory access auditor +license: MIT +license-file: LICENSE +author: Jared Tobin +maintainer: jared@ppad.tech +category: Development +build-type: Simple +tested-with: GHC == 9.10.3 +description: + Static analyzer for AArch64 GHC+LLVM assembly that enforces a strict + memory-access policy (public base register + constant offset) to help + verify constant-time properties. + +source-repository head + type: git + location: git.ppad.tech/auditor.git + +library + default-language: Haskell2010 + hs-source-dirs: lib + ghc-options: + -Wall + exposed-modules: + Audit.AArch64 + Audit.AArch64.Types + Audit.AArch64.Parser + Audit.AArch64.CFG + Audit.AArch64.Taint + Audit.AArch64.Check + build-depends: + base >= 4.9 && < 5 + , bytestring >= 0.9 && < 0.13 + , containers >= 0.6 && < 0.8 + , megaparsec >= 9.0 && < 10 + , text >= 1.2 && < 2.2 + , aeson >= 2.0 && < 2.3 + +executable auditor + default-language: Haskell2010 + hs-source-dirs: app + main-is: Main.hs + ghc-options: + -Wall -O2 -threaded + build-depends: + base + , bytestring + , optparse-applicative >= 0.16 && < 0.19 + , ppad-auditor + , text + +test-suite auditor-tests + type: exitcode-stdio-1.0 + default-language: Haskell2010 + hs-source-dirs: test + main-is: Main.hs + ghc-options: + -rtsopts -Wall -O2 + build-depends: + base + , bytestring + , ppad-auditor + , tasty + , tasty-hunit + , text diff --git a/test/Main.hs b/test/Main.hs @@ -0,0 +1,168 @@ +{-# LANGUAGE OverloadedStrings #-} + +module Main where + +import Audit.AArch64 +import Audit.AArch64.Parser +import Audit.AArch64.Taint +import Audit.AArch64.Types +import Data.Text (Text) +import qualified Data.Text as T +import Test.Tasty +import Test.Tasty.HUnit + +main :: IO () +main = defaultMain $ testGroup "ppad-auditor" [ + parserTests + , taintTests + , auditTests + ] + +-- Parser tests + +parserTests :: TestTree +parserTests = testGroup "Parser" [ + testCase "parse label" $ do + let src = "foo:\n" + case parseAsm src of + Left _ -> assertFailure "parse failed" + Right lns -> do + assertEqual "line count" 1 (length lns) + assertEqual "label" (Just "foo") (lineLabel (head lns)) + + , testCase "parse ldr base+imm" $ do + let src = "ldr x0, [x20, #8]\n" + case parseAsm src of + Left _ -> assertFailure "parse failed" + Right lns -> case lineInstr (head lns) of + Just (Ldr X0 (BaseImm X20 8)) -> pure () + other -> assertFailure $ "unexpected: " ++ show other + + , testCase "parse str base+reg" $ do + let src = "str x1, [x21, x2]\n" + case parseAsm src of + Left _ -> assertFailure "parse failed" + Right lns -> case lineInstr (head lns) of + Just (Str X1 (BaseReg X21 X2)) -> pure () + other -> assertFailure $ "unexpected: " ++ show other + + , testCase "parse add" $ do + let src = "add x0, x1, x2\n" + case parseAsm src of + Left _ -> assertFailure "parse failed" + Right lns -> case lineInstr (head lns) of + Just (Add X0 X1 (OpReg X2)) -> pure () + other -> assertFailure $ "unexpected: " ++ show other + + , testCase "parse adrp" $ do + let src = "adrp x5, _symbol@PAGE\n" + case parseAsm src of + Left _ -> assertFailure "parse failed" + Right lns -> case lineInstr (head lns) of + Just (Adrp X5 _) -> pure () + other -> assertFailure $ "unexpected: " ++ show other + + , testCase "skip directives" $ do + let src = ".section __TEXT\n.globl _foo\n_foo:\n ret\n" + case parseAsm src of + Left e -> assertFailure $ "parse failed: " ++ show e + Right lns -> do + -- Should have 4 lines, first two are directives (no instr) + let instrs = filter ((/= Nothing) . lineInstr) lns + assertEqual "instruction count" 1 (length instrs) + ] + +-- Taint tests + +taintTests :: TestTree +taintTests = testGroup "Taint" [ + testCase "public roots" $ do + let st = initTaintState + assertEqual "X20 is public" Public (getTaint X20 st) + assertEqual "X21 is public" Public (getTaint X21 st) + assertEqual "SP is public" Public (getTaint SP st) + + , testCase "unknown register" $ do + let st = initTaintState + assertEqual "X0 is unknown" Unknown (getTaint X0 st) + + , testCase "mov propagates taint" $ do + let st = initTaintState + l = Line 1 Nothing (Just (Mov X0 (OpReg X20))) + st' = analyzeLine l st + assertEqual "X0 becomes public" Public (getTaint X0 st') + + , testCase "add joins taints" $ do + let st = initTaintState + -- X0 = X20 (public) + X1 (unknown) -> unknown + l = Line 1 Nothing (Just (Add X0 X20 (OpReg X1))) + st' = analyzeLine l st + assertEqual "result is public" Public (getTaint X0 st') + + , testCase "load makes unknown" $ do + let st = initTaintState + l = Line 1 Nothing (Just (Ldr X0 (BaseImm X20 0))) + st' = analyzeLine l st + assertEqual "loaded value is unknown" Unknown (getTaint X0 st') + ] + +-- Audit tests + +auditTests :: TestTree +auditTests = testGroup "Audit" [ + testCase "good: public base" $ do + let src = T.unlines + [ "foo:" + , " ldr x0, [x20, #8]" + , " str x0, [x21]" + , " ret" + ] + case audit "test" src of + Left e -> assertFailure $ "parse failed: " ++ show e + Right ar -> assertEqual "no violations" 0 (length (arViolations ar)) + + , testCase "bad: unknown base" $ do + let src = T.unlines + [ "foo:" + , " ldr x0, [x0]" + , " ret" + ] + case audit "test" src of + Left e -> assertFailure $ "parse failed: " ++ show e + Right ar -> do + assertEqual "one violation" 1 (length (arViolations ar)) + case arViolations ar of + [v] -> case vReason v of + UnknownBase X0 -> pure () + other -> assertFailure $ "wrong reason: " ++ show other + _ -> assertFailure "expected one violation" + + , testCase "bad: secret-derived index" $ do + let src = T.unlines + [ "foo:" + , " ldr x0, [x20]" -- x0 = unknown (loaded value) + , " ldr x1, [x21, x0]" -- x0 as index -> violation + , " ret" + ] + case audit "test" src of + Left e -> assertFailure $ "parse failed: " ++ show e + Right ar -> do + assertEqual "one violation" 1 (length (arViolations ar)) + case arViolations ar of + [v] -> case vReason v of + UnknownIndex X0 -> pure () + other -> assertFailure $ "wrong reason: " ++ show other + _ -> assertFailure "expected one violation" + + , testCase "good: adrp+add pattern" $ do + let src = T.unlines + [ "foo:" + , " adrp x8, _const@PAGE" + , " add x8, x8, _const@PAGEOFF" + , " ldr x0, [x8]" + , " ret" + ] + case audit "test" src of + Left e -> assertFailure $ "parse failed: " ++ show e + Right ar -> assertEqual "no violations" 0 (length (arViolations ar)) + ] diff --git a/test/fixtures/bad.s b/test/fixtures/bad.s @@ -0,0 +1,31 @@ +; bad.s - Known-bad assembly (secret-derived memory accesses) +; Should produce violations + +.section __TEXT,__text +.globl _unsafe_function +.p2align 2 + +_unsafe_function: + ; Stack frame setup + stp x29, x30, [sp, #-16]! + mov x29, sp + + ; Load a secret value + ldr x0, [x20] ; x0 = secret from memory + + ; BAD: Use secret as base address (cache-timing leak) + ldr x1, [x0] ; violation: unknown base x0 + + ; BAD: Use secret as index (cache-timing leak) + ldr x2, [x21, x0] ; violation: unknown index x0 + + ; BAD: Secret-derived address via arithmetic + add x3, x0, #256 ; x3 = secret + offset + ldr x4, [x3] ; violation: unknown base x3 + + ; BAD: Secret used in shifted index + ldr x5, [x21, x0, lsl #3] ; violation: unknown index x0 + + ; Frame teardown + ldp x29, x30, [sp], #16 + ret diff --git a/test/fixtures/good.s b/test/fixtures/good.s @@ -0,0 +1,37 @@ +; good.s - Known-good assembly (stack/heap accesses only) +; Should produce no violations + +.section __TEXT,__text +.globl _safe_function +.p2align 2 + +_safe_function: + ; Stack frame setup + stp x29, x30, [sp, #-16]! + mov x29, sp + + ; Load from GHC stack pointer (x20) + ldr x0, [x20, #8] + ldr x1, [x20, #16] + + ; Store to GHC heap pointer (x21) + str x0, [x21] + str x1, [x21, #8] + + ; Load from hardware stack + ldr x2, [sp, #24] + + ; PC-relative constant pool + adrp x8, _constant@PAGE + add x8, x8, _constant@PAGEOFF + ldr x3, [x8] + + ; Store pair to stack + stp x0, x1, [sp, #-32]! + + ; Load pair from stack + ldp x4, x5, [sp], #32 + + ; Frame teardown + ldp x29, x30, [sp], #16 + ret