chacha

The ChaCha20 stream cipher (docs.ppad.tech/chacha).
git clone git://git.ppad.tech/chacha.git
Log | Files | Refs | README | LICENSE

commit f35e58b1f912f0f37fa7f2a88635a0238e80ea7d
parent f52b5ee9a8273b95c461cc71fa278495fc48d029
Author: Jared Tobin <jared@jtobin.io>
Date:   Sat, 16 May 2026 12:51:31 -0230

lib: dispatch cipher and block to ARM NEON when available

Wire 'Crypto.Cipher.ChaCha20.cipher' and 'block' to the NEON path
added in the previous commit, with the existing scalar
implementations as the fallback.  Mirrors the dispatch pattern in
'Crypto.Hash.SHA256.hs' and 'Data.ByteString.Base16.hs':

    block key counter nonce
      | kl /= 32 = Left InvalidKey
      | nl /= 12 = Left InvalidNonce
      | Arm.chacha20_arm_available =
          Right (Arm.block key counter nonce)
      | otherwise = pure $ runST $ do ...   -- scalar

Same shape for 'cipher'.  Length validation stays in the dispatcher
so the Arm wrappers can assume valid inputs.

Performance on the existing 114-byte RFC 8439 test vector (M4
MacBook Air, GHC 9.10.3 + LLVM 19, '-fllvm'):

  cipher time:  478 ns -> 282 ns   (~1.7x)

Allocation per call (via 'weigh') drops dramatically across the
size range, because the scalar path was accumulating intermediate
per-block ByteStrings through a Builder while the NEON path writes
into one 'BI.unsafeCreate plen' buffer:

  block:                4,968 B ->   312 B  (~16x less)
  cipher  64B input:   42,584 B ->   448 B  (~95x less)
  cipher 256B input:   61,568 B ->   448 B  (~137x less)
  cipher 1024B input: 121,376 B -> 4,072 B  (~30x less)
  cipher 4096B input: 406,168 B -> 4,568 B  (~89x less)

The 1.7x wall-time on the 114B vector is a floor figure — that
input is only ~2 blocks, so FFI overhead and per-call setup
dominate.  Larger inputs amortise the FFI call across more SIMD
work and recover proportionally more.

All 8 tasty cases (including RFC 8439 A.2 vectors 1, 2, 3) pass
through the dispatched path, both under '-fllvm' and under
'-fllvm -fsanitize' (ASan + UBSan over the C kernel — no
diagnostics).

Diffstat:
Mlib/Crypto/Cipher/ChaCha20.hs | 9+++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/lib/Crypto/Cipher/ChaCha20.hs b/lib/Crypto/Cipher/ChaCha20.hs @@ -34,6 +34,7 @@ module Crypto.Cipher.ChaCha20 ( ) where import Control.Monad.ST +import qualified Crypto.Cipher.ChaCha20.Arm as Arm import qualified Data.Bits as B import Data.Bits ((.|.), (.<<.), (.^.)) import qualified Data.ByteString as BS @@ -289,6 +290,8 @@ block block key@(BI.PS _ _ kl) counter nonce@(BI.PS _ _ nl) | kl /= 32 = Left InvalidKey | nl /= 12 = Left InvalidNonce + | Arm.chacha20_arm_available = + Right (Arm.block key counter nonce) | otherwise = pure $ runST $ do let k = _parse_key key n = _parse_nonce nonce @@ -341,8 +344,10 @@ cipher -> BS.ByteString -- ^ arbitrary-length plaintext -> Either Error BS.ByteString -- ^ ciphertext cipher raw_key@(BI.PS _ _ kl) counter raw_nonce@(BI.PS _ _ nl) plaintext - | kl /= 32 = Left InvalidKey - | nl /= 12 = Left InvalidNonce + | kl /= 32 = Left InvalidKey + | nl /= 12 = Left InvalidNonce + | Arm.chacha20_arm_available = + Right (Arm.cipher raw_key counter raw_nonce plaintext) | otherwise = pure $ runST $ do let key = _parse_key raw_key non = _parse_nonce raw_nonce