commit f606ebee8dd5da7c25005f5c0e7c7bdefc20f52f
parent 0e83ab9538f4c79593e70ae5d338c349fdfe0e8b
Author: Jared Tobin <jared@jtobin.io>
Date: Sat, 16 May 2026 13:01:38 -0230
readme: ARM intrinsics note + bench figures
Update README tagline from "Pure" to "Fast" and rewrite the
Performance section to note hardware acceleration via ARM NEON
intrinsics. New 1 KiB criterion figures from an M4 MacBook Air,
GHC 9.10.3 + LLVM 19, -fllvm:
encode time: 2.279 μs -> 102 ns (~22×)
decode time: 649.2 ns -> 160 ns (~4×)
Diffstat:
| M | README.md | | | 31 | +++++++++++++++---------------- |
1 file changed, 15 insertions(+), 16 deletions(-)
diff --git a/README.md b/README.md
@@ -4,7 +4,7 @@

[](https://docs.ppad.tech/base64)
-Pure base64 encoding & decoding on strict ByteStrings.
+Fast base64 encoding & decoding on strict ByteStrings.
## Usage
@@ -31,28 +31,27 @@ Haddocks (API documentation, etc.) are hosted at
## Performance
-The aim is best-in-class performance for pure, highly-auditable Haskell
-code. We could go slightly faster by using direct allocation and writes,
-but we get pretty close to the best impure versions with only builders.
-
-Current benchmark figures on a 1024-byte input on an Apple M4 MacBook Air,
-GHC 9.10.3 with the LLVM backend, look like (use `cabal bench` to run the
-benchmark suite):
+The aim is best-in-class performance. Current benchmark figures on 1kb
+inputs on an M4 Silicon MacBook Air, where we avail of hardware
+acceleration via ARM NEON intrinsics, look like (use `cabal bench` to
+run the benchmark suite):
```
benchmarking ppad-base64/encode
- time 2.279 μs (2.253 μs .. 2.316 μs)
- 0.999 R² (0.998 R² .. 1.000 R²)
- mean 2.284 μs (2.270 μs .. 2.308 μs)
- std dev 74.77 ns (50.21 ns .. 124.4 ns)
+ time 102.0 ns (101.9 ns .. 102.2 ns)
+ 1.000 R² (1.000 R² .. 1.000 R²)
+ mean 102.0 ns (101.9 ns .. 102.1 ns)
+ std dev 386.6 ps (313.4 ps .. 521.5 ps)
benchmarking ppad-base64/decode
- time 649.2 ns (637.2 ns .. 659.0 ns)
- 0.998 R² (0.997 R² .. 0.999 R²)
- mean 618.5 ns (611.8 ns .. 625.5 ns)
- std dev 29.46 ns (25.76 ns .. 35.06 ns)
+ time 160.3 ns (160.3 ns .. 160.4 ns)
+ 1.000 R² (1.000 R² .. 1.000 R²)
+ mean 160.3 ns (160.2 ns .. 160.4 ns)
+ std dev 242.8 ps (201.8 ps .. 301.2 ps)
```
+You should compile with the 'llvm' flag for maximum performance.
+
## Security
This library aims at the maximum security achievable in a