base64

Fast Haskell base64 encoding/decoding (docs.ppad.tech/base64).
git clone git://git.ppad.tech/base64.git
Log | Files | Refs | README | LICENSE

commit 3d622446b5ee3af52511cc9770895cb1acf4d940
parent b4dd9ff6c285bfb9db834cdcca3d460688c3297d
Author: Jared Tobin <jared@jtobin.io>
Date:   Sat, 16 May 2026 13:04:11 -0230

Merge branch 'perf-refactor'

Performance refactor + ARM NEON intrinsics, mirroring the analogous
work merged to ppad-base16 master.

Five commits, organized as two logical changes:

1. Drop the bytestring 'Builder' pipeline in favour of 'BI.unsafeCreate'
   plus two static-rodata lookup tables (encode alphabet + decode
   table), with the 0x40-offset trick keeping the decode table's
   string literal NUL-free so it lives in rodata via the bytestring
   IsString rewrite. Encode falls from ~2.3 μs to ~270 ns on 1 KiB
   inputs.

2. Add an aarch64 NEON kernel in 'cbits/base64_arm.c' exposed via the
   new 'Data.ByteString.Base64.Arm' module:

   * Encode kernel processes 12 input bytes -> 16 output chars per
     iteration via a vqtbl1q_u8 shuffle, four parallel u32 shifts +
     masks, and a vqtbl4q_u8 alphabet lookup.

   * Decode kernel processes 16 input chars -> 12 output bytes per
     iteration. Range-compare validation with OR-accumulated 'bad'
     masks, per-u32-lane 24-bit pack, vqtbl1q_u8 reorder to BE
     triplets. The Haskell side hands the C kernel both inlen and
     outlen; padding detection and the padded final quartet
     (including RFC 4648 §3.5 non-data-bit validation) are handled
     in C for symmetry with encode.

   'Data.ByteString.Base64.encode' and 'decode' dispatch to the NEON
   path when 'base64_arm_available' returns true, falling back to the
   scalar path otherwise. Cabal adds the C sources, an aarch64
   '-march=armv8-a' cc-option, and a 'sanitize' flag for ASan + UBSan
   builds.

Performance on 1 KiB inputs, M4 MacBook Air, GHC 9.10.3 + LLVM 19,
'cabal bench -f+llvm':

  encode time:   2.279 μs ->   102 ns   (~22×)
  decode time:   649.2 ns ->   160 ns   (~4×)

The existing tasty suite (5000 QuickCheck cases × 3 properties + the
RFC 4648 §10 unit vectors) passes through the dispatched path under
'cabal test', 'cabal test -fllvm', and 'cabal test -fsanitize'.

Also rebrands the cabal/flake/README descriptions from "Pure" to
"Fast" to reflect that the hot path is no longer purely Haskell.

Diffstat:
MREADME.md | 31+++++++++++++++----------------
Acbits/base64_arm.c | 307+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mflake.nix | 2+-
Mlib/Data/ByteString/Base64.hs | 511+++++++++++++++++++++++++++++++++----------------------------------------------
Alib/Data/ByteString/Base64/Arm.hs | 86+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Mppad-base64.cabal | 19+++++++++++++++++--
6 files changed, 637 insertions(+), 319 deletions(-)

diff --git a/README.md b/README.md @@ -4,7 +4,7 @@ ![](https://img.shields.io/badge/license-MIT-brightgreen) [![](https://img.shields.io/badge/haddock-base64-lightblue)](https://docs.ppad.tech/base64) -Pure base64 encoding & decoding on strict ByteStrings. +Fast base64 encoding & decoding on strict ByteStrings. ## Usage @@ -31,28 +31,27 @@ Haddocks (API documentation, etc.) are hosted at ## Performance -The aim is best-in-class performance for pure, highly-auditable Haskell -code. We could go slightly faster by using direct allocation and writes, -but we get pretty close to the best impure versions with only builders. - -Current benchmark figures on a 1024-byte input on an Apple M4 MacBook Air, -GHC 9.10.3 with the LLVM backend, look like (use `cabal bench` to run the -benchmark suite): +The aim is best-in-class performance. Current benchmark figures on 1kb +inputs on an M4 Silicon MacBook Air, where we avail of hardware +acceleration via ARM NEON intrinsics, look like (use `cabal bench` to +run the benchmark suite): ``` benchmarking ppad-base64/encode - time 2.279 μs (2.253 μs .. 2.316 μs) - 0.999 R² (0.998 R² .. 1.000 R²) - mean 2.284 μs (2.270 μs .. 2.308 μs) - std dev 74.77 ns (50.21 ns .. 124.4 ns) + time 102.0 ns (101.9 ns .. 102.2 ns) + 1.000 R² (1.000 R² .. 1.000 R²) + mean 102.0 ns (101.9 ns .. 102.1 ns) + std dev 386.6 ps (313.4 ps .. 521.5 ps) benchmarking ppad-base64/decode - time 649.2 ns (637.2 ns .. 659.0 ns) - 0.998 R² (0.997 R² .. 0.999 R²) - mean 618.5 ns (611.8 ns .. 625.5 ns) - std dev 29.46 ns (25.76 ns .. 35.06 ns) + time 160.3 ns (160.3 ns .. 160.4 ns) + 1.000 R² (1.000 R² .. 1.000 R²) + mean 160.3 ns (160.2 ns .. 160.4 ns) + std dev 242.8 ps (201.8 ps .. 301.2 ps) ``` +You should compile with the 'llvm' flag for maximum performance. + ## Security This library aims at the maximum security achievable in a diff --git a/cbits/base64_arm.c b/cbits/base64_arm.c @@ -0,0 +1,307 @@ +#include <stddef.h> +#include <stdint.h> + +#if defined(__aarch64__) + +#include <arm_neon.h> + +static const uint8_t b64_alphabet[64] = + "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"; + +/* + * Encode 'l' input bytes at 'src' into ((l+2)/3)*4 ASCII chars at 'dst'. + * + * NEON kernel processes 12 input bytes per iteration: + * - vld1q_u8 loads 16 bytes (we use the first 12; reading 4 ahead is + * safe as long as l - i >= 16) + * - vqtbl1q_u8 with a shuffle mask gathers each 4-byte output lane as + * [b1, b0, b2, b1], the order that lets a single shift+mask extract + * each 6-bit index + * - 4 vshrq_n_u32 + vandq_u32 pull out indices i0..i3 (one per lane + * byte); see comments below for the bit math + * - vqtbl4q_u8 looks up each index in the 64-byte alphabet + * - vst1q_u8 stores 16 output chars + * + * A scalar loop finishes any full triplet that didn't make the NEON + * cut-off, and a final branch emits the 0/1/2-byte padded tail. + */ +void base64_encode_arm(const uint8_t *src, uint8_t *dst, size_t l) { + uint8x16x4_t lut; + lut.val[0] = vld1q_u8(b64_alphabet); + lut.val[1] = vld1q_u8(b64_alphabet + 16); + lut.val[2] = vld1q_u8(b64_alphabet + 32); + lut.val[3] = vld1q_u8(b64_alphabet + 48); + + /* For each 4-byte lane of output of vqtbl1q_u8, we want + * [b1, b0, b2, b1] in memory order — viewed as a little-endian u32 + * lane that is (b1) | (b0 << 8) | (b2 << 16) | (b1 << 24). */ + static const uint8_t shuf_enc[16] = { + 1, 0, 2, 1, + 4, 3, 5, 4, + 7, 6, 8, 7, + 10, 9,11,10, + }; + uint8x16_t shuf = vld1q_u8(shuf_enc); + + size_t i = 0, o = 0; + while (i + 16 <= l) { + uint8x16_t in = vld1q_u8(src + i); + uint8x16_t shuffled = vqtbl1q_u8(in, shuf); + uint32x4_t lane = vreinterpretq_u32_u8(shuffled); + uint32x4_t mask6 = vdupq_n_u32(0x3F); + + /* lane (LE) = b1 | (b0 << 8) | (b2 << 16) | (b1 << 24) + * i0 (top 6 of b0) = (lane >> 10) & 0x3F + * i1 (lo 2 of b0|hi 4 of b1)= (lane >> 4) & 0x3F + * i2 (lo 4 of b1|hi 2 of b2)= (lane >> 22) & 0x3F [uses b1 copy at byte 3] + * i3 (lo 6 of b2) = (lane >> 16) & 0x3F */ + uint32x4_t i0 = vandq_u32(vshrq_n_u32(lane, 10), mask6); + uint32x4_t i1 = vandq_u32(vshrq_n_u32(lane, 4), mask6); + uint32x4_t i2 = vandq_u32(vshrq_n_u32(lane, 22), mask6); + uint32x4_t i3 = vandq_u32(vshrq_n_u32(lane, 16), mask6); + + /* assemble per-lane u32 = i0 | (i1 << 8) | (i2 << 16) | (i3 << 24) */ + uint32x4_t idx_u32 = vorrq_u32( + vorrq_u32(i0, vshlq_n_u32(i1, 8)), + vorrq_u32(vshlq_n_u32(i2, 16), vshlq_n_u32(i3, 24))); + + uint8x16_t indices = vreinterpretq_u8_u32(idx_u32); + uint8x16_t chars = vqtbl4q_u8(lut, indices); + vst1q_u8(dst + o, chars); + + i += 12; + o += 16; + } + + /* scalar tail: full triplets */ + for (; i + 3 <= l; i += 3, o += 4) { + uint32_t v = ((uint32_t)src[i] << 16) + | ((uint32_t)src[i + 1] << 8) + | (uint32_t)src[i + 2]; + dst[o] = b64_alphabet[(v >> 18) & 0x3F]; + dst[o + 1] = b64_alphabet[(v >> 12) & 0x3F]; + dst[o + 2] = b64_alphabet[(v >> 6) & 0x3F]; + dst[o + 3] = b64_alphabet[ v & 0x3F]; + } + + /* 1- or 2-byte padded tail */ + if (i + 1 == l) { + uint8_t b = src[i]; + dst[o] = b64_alphabet[(b >> 2) & 0x3F]; + dst[o + 1] = b64_alphabet[(b & 0x03) << 4]; + dst[o + 2] = '='; + dst[o + 3] = '='; + } else if (i + 2 == l) { + uint8_t b0 = src[i]; + uint8_t b1 = src[i + 1]; + dst[o] = b64_alphabet[(b0 >> 2) & 0x3F]; + dst[o + 1] = b64_alphabet[((b0 & 0x03) << 4) | (b1 >> 4)]; + dst[o + 2] = b64_alphabet[(b1 & 0x0F) << 2]; + dst[o + 3] = '='; + } +} + +/* + * Convert 16 ASCII base64 chars to 6-bit values in 'val'. + * Each lane of 'bad' is 0xff if the corresponding input is not a + * valid base64 char ('A'..'Z', 'a'..'z', '0'..'9', '+', '/'), else 0. + * '=' is treated as invalid here; the caller handles padding. + */ +static inline void ascii_to_b64(uint8x16_t c, + uint8x16_t *val, + uint8x16_t *bad) { + uint8x16_t is_upper = vandq_u8(vcgeq_u8(c, vdupq_n_u8('A')), + vcleq_u8(c, vdupq_n_u8('Z'))); + uint8x16_t is_lower = vandq_u8(vcgeq_u8(c, vdupq_n_u8('a')), + vcleq_u8(c, vdupq_n_u8('z'))); + uint8x16_t is_digit = vandq_u8(vcgeq_u8(c, vdupq_n_u8('0')), + vcleq_u8(c, vdupq_n_u8('9'))); + uint8x16_t is_plus = vceqq_u8(c, vdupq_n_u8('+')); + uint8x16_t is_slash = vceqq_u8(c, vdupq_n_u8('/')); + + /* Per-lane additive offset that takes c to its 6-bit value: + * 'A'..'Z': +(-65) = 0xBF mod 256 ('A' + 0xBF = 0) + * 'a'..'z': +(-71) = 0xB9 + * '0'..'9': +4 + * '+': +19 + * '/': +16 + * Invalid lanes get +0; 'bad' flags them. */ + uint8x16_t add = vorrq_u8( + vandq_u8(is_upper, vdupq_n_u8((uint8_t)(0u - 65))), + vorrq_u8( + vandq_u8(is_lower, vdupq_n_u8((uint8_t)(0u - 71))), + vorrq_u8( + vandq_u8(is_digit, vdupq_n_u8(4)), + vorrq_u8( + vandq_u8(is_plus, vdupq_n_u8(19)), + vandq_u8(is_slash, vdupq_n_u8(16)))))); + + *val = vaddq_u8(c, add); + + uint8x16_t any_valid = vorrq_u8(is_upper, + vorrq_u8(is_lower, + vorrq_u8(is_digit, + vorrq_u8(is_plus, is_slash)))); + *bad = vmvnq_u8(any_valid); +} + +static inline uint8_t scalar_b64(uint8_t c) { + if (c >= 'A' && c <= 'Z') return (uint8_t)(c - 'A'); + if (c >= 'a' && c <= 'z') return (uint8_t)(c - 'a' + 26); + if (c >= '0' && c <= '9') return (uint8_t)(c - '0' + 52); + if (c == '+') return 62; + if (c == '/') return 63; + return 0x80; /* invalid sentinel */ +} + +/* + * Decode 'inlen' ASCII base64 chars at 'src' into 'outlen' bytes at + * 'dst'. Returns 1 on success, 0 on any decoding error: malformed + * length, malformed padding, invalid char in body, or invalid char / + * non-zero non-data bits in the padded final quartet (RFC 4648 §3.5). + * + * Caller must allocate 'outlen' bytes at 'dst' and pass the correct + * outlen for the given inlen and padding; mismatch returns 0 with + * 'dst' unspecified. + * + * Body NEON kernel processes 16 input chars (= 4 quartets) per + * iteration: + * - vld1q_u8 loads 16 chars + * - ascii_to_b64 validates each lane and yields 6-bit values + * - per u32x4 lane: build the 24-bit packed value V = (v0 << 18) | + * (v1 << 12) | (v2 << 6) | v3, whose bytes in LE are [V_low, + * V_mid, V_high, 0] + * - vqtbl1q_u8 reshuffles those bytes into [V_high, V_mid, V_low] + * per triplet, yielding 12 output bytes at the bottom of the + * output vector + * - vst1q_u8 stores 16 bytes (writing 12 valid + 4 spurious; the + * loop bound 'o + 16 <= body_outlen' keeps the overrun within + * the allocated buffer, and the spurious bytes get clobbered by + * the next iteration or by the scalar tail / final quartet) + * + * A scalar tail finishes any body quartets that didn't make the + * NEON cut-off, then the padded final quartet is decoded explicitly. + */ +int base64_decode_arm(const uint8_t *src, uint8_t *dst, + size_t inlen, size_t outlen) { + if (inlen == 0) return outlen == 0; + if (inlen & 0x3) return 0; + + uint8_t c_pre = src[inlen - 2]; + uint8_t c_end = src[inlen - 1]; + size_t pad = 0; + if (c_end == '=') { + if (c_pre == '=') pad = 2; + else pad = 1; + } else if (c_pre == '=') { + return 0; /* '=' at offset -2 only is malformed */ + } + + size_t nfull = inlen >> 2; + if (outlen != nfull * 3 - pad) return 0; + + size_t body_chars = (pad > 0) ? (inlen - 4) : inlen; + size_t body_outlen = (body_chars >> 2) * 3; + + uint8x16_t bad_acc = vdupq_n_u8(0); + + static const uint8_t pack_shuf[16] = { + 2, 1, 0, + 6, 5, 4, + 10, 9, 8, + 14,13,12, + 0xFF, 0xFF, 0xFF, 0xFF + }; + uint8x16_t pshuf = vld1q_u8(pack_shuf); + + size_t i = 0, o = 0; + while (o + 16 <= body_outlen) { + uint8x16_t c = vld1q_u8(src + i); + uint8x16_t val, this_bad; + ascii_to_b64(c, &val, &this_bad); + bad_acc = vorrq_u8(bad_acc, this_bad); + + uint32x4_t v32 = vreinterpretq_u32_u8(val); + uint32x4_t mask8 = vdupq_n_u32(0xFF); + + uint32x4_t p0 = vshlq_n_u32(vandq_u32(v32, mask8), 18); + uint32x4_t p1 = vshlq_n_u32( + vandq_u32(vshrq_n_u32(v32, 8), mask8), 12); + uint32x4_t p2 = vshlq_n_u32( + vandq_u32(vshrq_n_u32(v32, 16), mask8), 6); + uint32x4_t p3 = vshrq_n_u32(v32, 24); + + uint32x4_t V = vorrq_u32(vorrq_u32(p0, p1), + vorrq_u32(p2, p3)); + uint8x16_t V_bytes = vreinterpretq_u8_u32(V); + uint8x16_t packed = vqtbl1q_u8(V_bytes, pshuf); + + vst1q_u8(dst + o, packed); /* 12 valid bytes + 4 spurious */ + + i += 16; + o += 12; + } + + uint8_t tail_bad = 0; + + /* scalar body tail (full quartets, no '=') */ + while (o + 3 <= body_outlen) { + uint8_t v0 = scalar_b64(src[i]); + uint8_t v1 = scalar_b64(src[i + 1]); + uint8_t v2 = scalar_b64(src[i + 2]); + uint8_t v3 = scalar_b64(src[i + 3]); + tail_bad |= (v0 | v1 | v2 | v3) & 0x80; + dst[o] = (uint8_t)((v0 << 2) | (v1 >> 4)); + dst[o + 1] = (uint8_t)(((v1 & 0x0F) << 4) | (v2 >> 2)); + dst[o + 2] = (uint8_t)(((v2 & 0x03) << 6) | (v3 & 0x3F)); + i += 4; + o += 3; + } + + /* padded final quartet */ + if (pad > 0) { + uint8_t v0 = scalar_b64(src[i]); + uint8_t v1 = scalar_b64(src[i + 1]); + if ((v0 | v1) & 0x80) return 0; + + if (pad == 2) { + /* "XX==" -> 1 output byte; bottom 4 bits of v1 must be 0 */ + if (v1 & 0x0F) return 0; + dst[o] = (uint8_t)((v0 << 2) | (v1 >> 4)); + } else { + /* "XXX=" -> 2 output bytes; bottom 2 bits of v2 must be 0 */ + uint8_t v2 = scalar_b64(src[i + 2]); + if (v2 & 0x80) return 0; + if (v2 & 0x03) return 0; + dst[o] = (uint8_t)((v0 << 2) | (v1 >> 4)); + dst[o + 1] = (uint8_t)(((v1 & 0x0F) << 4) | (v2 >> 2)); + } + } + + return (vmaxvq_u8(bad_acc) == 0) && (tail_bad == 0); +} + +int base64_arm_available(void) { + return 1; +} + +#else + +/* stubs for non-aarch64 builds; never reached because dispatch is + * gated on 'base64_arm_available' returning 0 */ + +void base64_encode_arm(const uint8_t *src, uint8_t *dst, size_t l) { + (void)src; (void)dst; (void)l; +} + +int base64_decode_arm(const uint8_t *src, uint8_t *dst, + size_t inlen, size_t outlen) { + (void)src; (void)dst; (void)inlen; (void)outlen; + return 0; +} + +int base64_arm_available(void) { + return 0; +} + +#endif diff --git a/flake.nix b/flake.nix @@ -1,5 +1,5 @@ { - description = "Pure Haskell base64 encoding and decoding on bytestrings."; + description = "Fast Haskell base64 encoding and decoding on bytestrings."; inputs = { ppad-nixpkgs = { diff --git a/lib/Data/ByteString/Base64.hs b/lib/Data/ByteString/Base64.hs @@ -1,7 +1,5 @@ {-# OPTIONS_HADDOCK prune #-} -{-# LANGUAGE ApplicativeDo #-} {-# LANGUAGE BangPatterns #-} -{-# LANGUAGE BinaryLiterals #-} {-# LANGUAGE OverloadedStrings #-} -- | @@ -20,324 +18,237 @@ module Data.ByteString.Base64 ( import qualified Data.Bits as B import Data.Bits ((.&.), (.|.)) import qualified Data.ByteString as BS -import qualified Data.ByteString.Builder as BSB -import qualified Data.ByteString.Builder.Extra as BE +import qualified Data.ByteString.Base64.Arm as Arm import qualified Data.ByteString.Internal as BI -import qualified Data.ByteString.Unsafe as BU -import Data.Word (Word8, Word16, Word32, Word64) - -to_strict :: BSB.Builder -> BS.ByteString -to_strict = BS.toStrict . BSB.toLazyByteString -{-# INLINE to_strict #-} - -to_strict_small :: BSB.Builder -> BS.ByteString -to_strict_small = BS.toStrict - . BE.toLazyByteStringWith (BE.safeStrategy 128 BE.smallChunkSize) mempty -{-# INLINE to_strict_small #-} +import Data.Word (Word8) +import Foreign.ForeignPtr (withForeignPtr) +import Foreign.Ptr (Ptr, plusPtr) +import Foreign.Storable (peekElemOff, pokeElemOff) +import System.IO.Unsafe (unsafeDupablePerformIO) fi :: (Num a, Integral b) => b -> a fi = fromIntegral {-# INLINE fi #-} -b64_charset :: BS.ByteString -b64_charset = +-- 64-byte table. Indexed by 6-bit value (0..63), yields the +-- corresponding base64 alphabet character. All-ASCII content means +-- the bytestring 'IsString' rule rewrites this to 'unsafePackAddress' +-- and the bytes live in static rodata. +enc_tab :: BS.ByteString +enc_tab = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/" +{-# NOINLINE enc_tab #-} --- 3 input bytes -> 4 output chars packed in a Word32 (big-endian) -expand_w24 :: Word8 -> Word8 -> Word8 -> Word32 -expand_w24 a b c = - let !v = (fi a `B.shiftL` 16 :: Word32) - .|. (fi b `B.shiftL` 8) - .|. fi c - !c0 = BU.unsafeIndex b64_charset (fi ((v `B.shiftR` 18) .&. 0x3F)) - !c1 = BU.unsafeIndex b64_charset (fi ((v `B.shiftR` 12) .&. 0x3F)) - !c2 = BU.unsafeIndex b64_charset (fi ((v `B.shiftR` 6) .&. 0x3F)) - !c3 = BU.unsafeIndex b64_charset (fi (v .&. 0x3F)) - in fi c0 `B.shiftL` 24 - .|. fi c1 `B.shiftL` 16 - .|. fi c2 `B.shiftL` 8 - .|. fi c3 -{-# INLINE expand_w24 #-} - --- 6 input bytes -> 8 output chars packed in a Word64 (big-endian) -expand_w48 :: Word8 -> Word8 -> Word8 -> Word8 -> Word8 -> Word8 -> Word64 -expand_w48 a b c d e f = - let !hi = expand_w24 a b c - !lo = expand_w24 d e f - in (fi hi `B.shiftL` 32) .|. fi lo -{-# INLINE expand_w48 #-} +-- 256-byte table. Index by an ASCII byte to obtain its 6-bit value; +-- valid base64 chars ('A'..'Z', 'a'..'z', '0'..'9', '+', '/') map to +-- 0x40..0x7f, every other byte (including '=') maps to 0x80. +-- +-- The encoding is chosen so the literal is strictly ASCII and contains +-- no embedded NUL, which is what the bytestring 'IsString' rule needs +-- to rewrite it into 'unsafePackAddress' (cf. 'enc_tab') — the bytes +-- end up in static rodata, with no CAF allocation. +-- +-- The 0x80 sentinel is distinguished by bit 7; no value 0x40..0x7f +-- carries that bit, so 'decode' OR-folds every lookup into an +-- accumulator and tests 'acc .&. 0x80 == 0' once at the end. The +-- low 6 bits of each entry are the 6-bit value, possibly contaminated +-- by the 0x40 flag bit; the b0/b1/b2 formulas mask each subexpression +-- before combining so the flag never bleeds into the output bytes. +dec_tab :: BS.ByteString +dec_tab = + "\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\ + \\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\ + \\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x7E\x80\x80\x80\x7F\ + \\x74\x75\x76\x77\x78\x79\x7A\x7B\x7C\x7D\x80\x80\x80\x80\x80\x80\ + \\x80\x40\x41\x42\x43\x44\x45\x46\x47\x48\x49\x4A\x4B\x4C\x4D\x4E\ + \\x4F\x50\x51\x52\x53\x54\x55\x56\x57\x58\x59\x80\x80\x80\x80\x80\ + \\x80\x5A\x5B\x5C\x5D\x5E\x5F\x60\x61\x62\x63\x64\x65\x66\x67\x68\ + \\x69\x6A\x6B\x6C\x6D\x6E\x6F\x70\x71\x72\x73\x80\x80\x80\x80\x80\ + \\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\ + \\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\ + \\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\ + \\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\ + \\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\ + \\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\ + \\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\ + \\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80\x80" +{-# NOINLINE dec_tab #-} -- | Encode a base256 'ByteString' as base64. -- +-- Uses ARM NEON extensions when available, otherwise a pure +-- Haskell scalar loop. +-- -- >>> encode "hello world" -- "aGVsbG8gd29ybGQ=" encode :: BS.ByteString -> BS.ByteString -encode bs@(BI.PS _ _ l) - | l < 64 = to_strict_small loop - | otherwise = to_strict loop - where - loop - | l `rem` 6 == 0 = - go64 bs - | (l - 3) `rem` 6 == 0 = case BS.splitAt (l - 3) bs of - (chunk, etc) -> - go64 chunk - <> go32 etc - | (l - 1) `rem` 6 == 0 = case BS.splitAt (l - 1) bs of - (chunk, etc) -> - go64 chunk - <> tail1 etc - | (l - 2) `rem` 6 == 0 = case BS.splitAt (l - 2) bs of - (chunk, etc) -> - go64 chunk - <> tail2 etc - | (l - 4) `rem` 6 == 0 = case BS.splitAt (l - 4) bs of - (chunk, etc) -> - go64 chunk - <> go32 (BU.unsafeTake 3 etc) - <> tail1 (BU.unsafeDrop 3 etc) - | (l - 5) `rem` 6 == 0 = case BS.splitAt (l - 5) bs of - (chunk, etc) -> - go64 chunk - <> go32 (BU.unsafeTake 3 etc) - <> tail2 (BU.unsafeDrop 3 etc) - | otherwise = - mempty -- unreachable: l `rem` 6 in [0..5] - - go64 b = case BS.splitAt 6 b of - (chunk, etc) - | BS.null chunk -> mempty - | otherwise -> - let !w64 = expand_w48 - (BU.unsafeIndex chunk 0) - (BU.unsafeIndex chunk 1) - (BU.unsafeIndex chunk 2) - (BU.unsafeIndex chunk 3) - (BU.unsafeIndex chunk 4) - (BU.unsafeIndex chunk 5) - in BSB.word64BE w64 <> go64 etc - - go32 b = case BS.splitAt 3 b of - (chunk, etc) - | BS.null chunk -> mempty - | otherwise -> - let !w32 = expand_w24 - (BU.unsafeIndex chunk 0) - (BU.unsafeIndex chunk 1) - (BU.unsafeIndex chunk 2) - in BSB.word32BE w32 <> go32 etc - - -- final 1 byte -> "XX==" (one Word32 BE) - tail1 b = - let !a = BU.unsafeIndex b 0 - !c0 = BU.unsafeIndex b64_charset (fi (a `B.shiftR` 2)) - !c1 = BU.unsafeIndex b64_charset (fi ((a .&. 0x03) `B.shiftL` 4)) - !w32 = (fi c0 `B.shiftL` 24 :: Word32) - .|. (fi c1 `B.shiftL` 16) - .|. 0x00003D3D - in BSB.word32BE w32 - - -- final 2 bytes -> "XXX=" (one Word32 BE) - tail2 b = - let !a = BU.unsafeIndex b 0 - !c = BU.unsafeIndex b 1 - !c0 = BU.unsafeIndex b64_charset (fi (a `B.shiftR` 2)) - !c1 = BU.unsafeIndex b64_charset - (fi (((a .&. 0x03) `B.shiftL` 4) .|. (c `B.shiftR` 4))) - !c2 = BU.unsafeIndex b64_charset (fi ((c .&. 0x0F) `B.shiftL` 2)) - !w32 = (fi c0 `B.shiftL` 24 :: Word32) - .|. (fi c1 `B.shiftL` 16) - .|. (fi c2 `B.shiftL` 8) - .|. 0x0000003D - in BSB.word32BE w32 - --- word8 base64 character -> 6-bit value -word6 :: Word8 -> Maybe Word8 -word6 c - | c >= 65 && c <= 90 = pure $! c - 65 -- A-Z - | c >= 97 && c <= 122 = pure $! c - 71 -- a-z - | c >= 48 && c <= 57 = pure $! c + 4 -- 0-9 - | c == 43 = pure 62 -- '+' - | c == 47 = pure 63 -- '/' - | otherwise = Nothing -{-# INLINE word6 #-} - --- decode 4 chars at offset i to a 24-bit value (in low bits of Word32) -dec_quartet :: BS.ByteString -> Int -> Maybe Word32 -dec_quartet b i = do - !v0 <- word6 (BU.unsafeIndex b i) - !v1 <- word6 (BU.unsafeIndex b (i + 1)) - !v2 <- word6 (BU.unsafeIndex b (i + 2)) - !v3 <- word6 (BU.unsafeIndex b (i + 3)) - pure $! (fi v0 `B.shiftL` 18 :: Word32) - .|. (fi v1 `B.shiftL` 12) - .|. (fi v2 `B.shiftL` 6) - .|. fi v3 -{-# INLINE dec_quartet #-} +encode bs + | Arm.base64_arm_available = Arm.encode bs + | otherwise = encode_scalar bs +{-# INLINABLE encode #-} -- | Decode a base64 'ByteString' to base256. -- --- Invalid inputs (including incorrectly-padded or non-canonical --- inputs) will produce 'Nothing'. +-- Uses ARM NEON extensions when available, otherwise a pure +-- Haskell scalar loop. Invalid inputs (including incorrectly- +-- padded or non-canonical inputs) will produce 'Nothing'. -- -- >>> decode "aGVsbG8gd29ybGQ=" -- Just "hello world" -- >>> decode "aGVsbG8gd29ybGQ" -- missing padding -- Nothing decode :: BS.ByteString -> Maybe BS.ByteString -decode bs@(BI.PS _ _ l) - | l == 0 = pure BS.empty - | l `rem` 4 /= 0 = Nothing - | (l `quot` 4) * 3 < 128 = fmap to_strict_small loop - | otherwise = fmap to_strict loop - where - !bl = l - 4 - !body = BU.unsafeTake bl bs - !final = BU.unsafeDrop bl bs - - loop = do - !b0 <- decode_body body - !b1 <- decode_final final - pure (b0 <> b1) - - decode_body b - | bl `rem` 32 == 0 = - go64 mempty b - | (bl - 4) `rem` 32 == 0 = case BS.splitAt (bl - 4) b of - (chunk, etc) -> do - !acc <- go64 mempty chunk - go16 acc etc - | (bl - 8) `rem` 32 == 0 = case BS.splitAt (bl - 8) b of - (chunk, etc) -> do - !acc <- go64 mempty chunk - go32 acc etc - | (bl - 12) `rem` 32 == 0 = case BS.splitAt (bl - 12) b of - (chunk, etc) -> do - !acc0 <- go64 mempty chunk - !acc1 <- go32 acc0 (BU.unsafeTake 8 etc) - go16 acc1 (BU.unsafeDrop 8 etc) - | (bl - 16) `rem` 32 == 0 = case BS.splitAt (bl - 16) b of - (chunk, etc) -> do - !acc <- go64 mempty chunk - go48 acc etc - | (bl - 20) `rem` 32 == 0 = case BS.splitAt (bl - 20) b of - (chunk, etc) -> do - !acc0 <- go64 mempty chunk - !acc1 <- go48 acc0 (BU.unsafeTake 16 etc) - go16 acc1 (BU.unsafeDrop 16 etc) - | (bl - 24) `rem` 32 == 0 = case BS.splitAt (bl - 24) b of - (chunk, etc) -> do - !acc0 <- go64 mempty chunk - !acc1 <- go48 acc0 (BU.unsafeTake 16 etc) - go32 acc1 (BU.unsafeDrop 16 etc) - | (bl - 28) `rem` 32 == 0 = case BS.splitAt (bl - 28) b of - (chunk, etc) -> do - !acc0 <- go64 mempty chunk - !acc1 <- go48 acc0 (BU.unsafeTake 16 etc) - !acc2 <- go32 acc1 (BU.unsafeTake 8 (BU.unsafeDrop 16 etc)) - go16 acc2 (BU.unsafeDrop 24 etc) - | otherwise = Nothing -- unreachable - - decode_final b = - let !c0 = BU.unsafeIndex b 0 - !c1 = BU.unsafeIndex b 1 - !c2 = BU.unsafeIndex b 2 - !c3 = BU.unsafeIndex b 3 - in case (c2 == 0x3D, c3 == 0x3D) of - (True, True) -> do - !v0 <- word6 c0 - !v1 <- word6 c1 - if v1 .&. 0x0F /= 0 - then Nothing - else - let !w8 = (v0 `B.shiftL` 2) .|. (v1 `B.shiftR` 4) - in pure $! BSB.word8 w8 - (False, True) -> do - !v0 <- word6 c0 - !v1 <- word6 c1 - !v2 <- word6 c2 - if v2 .&. 0x03 /= 0 - then Nothing - else - let !w16 = (fi v0 `B.shiftL` 10 :: Word16) - .|. (fi v1 `B.shiftL` 4) - .|. (fi v2 `B.shiftR` 2) - in pure $! BSB.word16BE w16 - (True, False) -> Nothing - (False, False) -> do - !v0 <- word6 c0 - !v1 <- word6 c1 - !v2 <- word6 c2 - !v3 <- word6 c3 - let !w24 = (fi v0 `B.shiftL` 18 :: Word32) - .|. (fi v1 `B.shiftL` 12) - .|. (fi v2 `B.shiftL` 6) - .|. fi v3 - !w16 = fi (w24 `B.shiftR` 8) :: Word16 - !w8 = fi w24 :: Word8 - pure $! BSB.word16BE w16 <> BSB.word8 w8 - - -- 4 chars -> 3 bytes (1 word16BE + 1 word8) - go16 acc b = case BS.splitAt 4 b of - (chunk, etc) - | BS.null chunk -> pure acc - | otherwise -> do - !q <- dec_quartet chunk 0 - let !w16 = fi (q `B.shiftR` 8) :: Word16 - !w8 = fi q :: Word8 - go16 (acc <> BSB.word16BE w16 <> BSB.word8 w8) etc - - -- 8 chars -> 6 bytes (1 word32BE + 1 word16BE) - go32 acc b = case BS.splitAt 8 b of - (chunk, etc) - | BS.null chunk -> pure acc - | otherwise -> do - !q0 <- dec_quartet chunk 0 - !q1 <- dec_quartet chunk 4 - let !w48 = (fi q0 `B.shiftL` 24 :: Word64) - .|. fi q1 - !w32 = fi (w48 `B.shiftR` 16) :: Word32 - !w16 = fi w48 :: Word16 - go32 (acc <> BSB.word32BE w32 <> BSB.word16BE w16) etc - - -- 16 chars -> 12 bytes (1 word64BE + 1 word32BE) - go48 acc b = case BS.splitAt 16 b of - (chunk, etc) - | BS.null chunk -> pure acc - | otherwise -> do - !q0 <- dec_quartet chunk 0 - !q1 <- dec_quartet chunk 4 - !q2 <- dec_quartet chunk 8 - !q3 <- dec_quartet chunk 12 - let !w64 = (fi q0 `B.shiftL` 40 :: Word64) - .|. (fi q1 `B.shiftL` 16) - .|. fi (q2 `B.shiftR` 8) - !w32 = ((q2 .&. 0xFF) `B.shiftL` 24) .|. q3 - go48 (acc <> BSB.word64BE w64 <> BSB.word32BE w32) etc - - -- 32 chars -> 24 bytes (3 × word64BE) - go64 acc b = case BS.splitAt 32 b of - (chunk, etc) - | BS.null chunk -> pure acc - | otherwise -> do - !q0 <- dec_quartet chunk 0 - !q1 <- dec_quartet chunk 4 - !q2 <- dec_quartet chunk 8 - !q3 <- dec_quartet chunk 12 - !q4 <- dec_quartet chunk 16 - !q5 <- dec_quartet chunk 20 - !q6 <- dec_quartet chunk 24 - !q7 <- dec_quartet chunk 28 - let !w64a = (fi q0 `B.shiftL` 40 :: Word64) - .|. (fi q1 `B.shiftL` 16) - .|. fi (q2 `B.shiftR` 8) - !w64b = (fi (q2 .&. 0xFF) `B.shiftL` 56 :: Word64) - .|. (fi q3 `B.shiftL` 32) - .|. (fi q4 `B.shiftL` 8) - .|. fi (q5 `B.shiftR` 16) - !w64c = (fi (q5 .&. 0xFFFF) `B.shiftL` 48 :: Word64) - .|. (fi q6 `B.shiftL` 24) - .|. fi q7 - go64 (acc <> BSB.word64BE w64a - <> BSB.word64BE w64b - <> BSB.word64BE w64c) etc +decode bs + | Arm.base64_arm_available = Arm.decode bs + | otherwise = decode_scalar bs +{-# INLINABLE decode #-} + +encode_scalar :: BS.ByteString -> BS.ByteString +encode_scalar (BI.PS sfp soff l) = + case enc_tab of + BI.PS tfp toff _ -> + BI.unsafeCreate ((l + 2) `quot` 3 * 4) $ \dst -> + withForeignPtr sfp $ \sp0 -> + withForeignPtr tfp $ \tp0 -> do + let !sp = sp0 `plusPtr` soff :: Ptr Word8 + !tp = tp0 `plusPtr` toff :: Ptr Word8 + !nfull = l `quot` 3 + !rmn = l - nfull * 3 + loop !i + | i == nfull = pure () + | otherwise = do + let !ii = i * 3 + !oo = i * 4 + b0 <- peekElemOff sp ii + b1 <- peekElemOff sp (ii + 1) + b2 <- peekElemOff sp (ii + 2) + c0 <- peekElemOff tp (fi (b0 `B.shiftR` 2)) + c1 <- peekElemOff tp (fi + (((b0 .&. 0x03) `B.shiftL` 4) + .|. (b1 `B.shiftR` 4))) + c2 <- peekElemOff tp (fi + (((b1 .&. 0x0F) `B.shiftL` 2) + .|. (b2 `B.shiftR` 6))) + c3 <- peekElemOff tp (fi (b2 .&. 0x3F)) + pokeElemOff dst oo (c0 :: Word8) + pokeElemOff dst (oo + 1) c1 + pokeElemOff dst (oo + 2) c2 + pokeElemOff dst (oo + 3) c3 + loop (i + 1) + loop 0 + case rmn of + 0 -> pure () + 1 -> do + let !ii = nfull * 3 + !oo = nfull * 4 + b0 <- peekElemOff sp ii + c0 <- peekElemOff tp (fi (b0 `B.shiftR` 2)) + c1 <- peekElemOff tp (fi ((b0 .&. 0x03) `B.shiftL` 4)) + pokeElemOff dst oo (c0 :: Word8) + pokeElemOff dst (oo + 1) c1 + pokeElemOff dst (oo + 2) 0x3D + pokeElemOff dst (oo + 3) 0x3D + _ -> do + let !ii = nfull * 3 + !oo = nfull * 4 + b0 <- peekElemOff sp ii + b1 <- peekElemOff sp (ii + 1) + c0 <- peekElemOff tp (fi (b0 `B.shiftR` 2)) + c1 <- peekElemOff tp (fi + (((b0 .&. 0x03) `B.shiftL` 4) + .|. (b1 `B.shiftR` 4))) + c2 <- peekElemOff tp (fi ((b1 .&. 0x0F) `B.shiftL` 2)) + pokeElemOff dst oo (c0 :: Word8) + pokeElemOff dst (oo + 1) c1 + pokeElemOff dst (oo + 2) c2 + pokeElemOff dst (oo + 3) 0x3D + +decode_scalar :: BS.ByteString -> Maybe BS.ByteString +decode_scalar (BI.PS sfp soff l) + | l == 0 = Just BS.empty + | l .&. 0x03 /= 0 = Nothing + | otherwise = case dec_tab of + BI.PS tfp toff _ -> unsafeDupablePerformIO $ + withForeignPtr sfp $ \sp0 -> + withForeignPtr tfp $ \tp0 -> do + let !sp = sp0 `plusPtr` soff :: Ptr Word8 + !tp = tp0 `plusPtr` toff :: Ptr Word8 + c_pre <- peekElemOff sp (l - 2) + c_end <- peekElemOff sp (l - 1) + let !pad_pre = c_pre == 0x3D + !pad_end = c_end == 0x3D + if pad_pre && not pad_end + then pure Nothing + else do + let !pad = (if pad_pre then 2 else if pad_end then 1 else 0) + :: Int + !nfull = l `B.shiftR` 2 + !nbody = if pad > 0 then nfull - 1 else nfull + !outlen = nfull * 3 - pad + fp <- BI.mallocByteString outlen + ok <- withForeignPtr fp $ \dst -> do + let body_loop !acc !i + | i == nbody = pure acc + | otherwise = do + let !ii = i `B.shiftL` 2 + !oo = i * 3 + c0 <- peekElemOff sp ii + c1 <- peekElemOff sp (ii + 1) + c2 <- peekElemOff sp (ii + 2) + c3 <- peekElemOff sp (ii + 3) + v0 <- peekElemOff tp (fi c0) + v1 <- peekElemOff tp (fi c1) + v2 <- peekElemOff tp (fi c2) + v3 <- peekElemOff tp (fi c3) + let !b0 = (v0 `B.shiftL` 2) + .|. ((v1 `B.shiftR` 4) .&. 0x03) + !b1 = ((v1 .&. 0x0F) `B.shiftL` 4) + .|. ((v2 `B.shiftR` 2) .&. 0x0F) + !b2 = ((v2 .&. 0x03) `B.shiftL` 6) + .|. (v3 .&. 0x3F) + pokeElemOff dst oo b0 + pokeElemOff dst (oo + 1) b1 + pokeElemOff dst (oo + 2) b2 + body_loop + (acc .|. v0 .|. v1 .|. v2 .|. v3) (i + 1) + acc <- body_loop 0 0 + if acc .&. 0x80 /= 0 + then pure False + else case pad of + 0 -> pure True + 1 -> do + let !ii = nbody `B.shiftL` 2 + !oo = nbody * 3 + c0 <- peekElemOff sp ii + c1 <- peekElemOff sp (ii + 1) + c2 <- peekElemOff sp (ii + 2) + v0 <- peekElemOff tp (fi c0) + v1 <- peekElemOff tp (fi c1) + v2 <- peekElemOff tp (fi c2) + let !tail_acc = v0 .|. v1 .|. v2 + if tail_acc .&. 0x80 /= 0 || v2 .&. 0x03 /= 0 + then pure False + else do + let !b0 = (v0 `B.shiftL` 2) + .|. ((v1 `B.shiftR` 4) .&. 0x03) + !b1 = ((v1 .&. 0x0F) `B.shiftL` 4) + .|. ((v2 `B.shiftR` 2) .&. 0x0F) + pokeElemOff dst oo b0 + pokeElemOff dst (oo + 1) b1 + pure True + _ -> do + let !ii = nbody `B.shiftL` 2 + !oo = nbody * 3 + c0 <- peekElemOff sp ii + c1 <- peekElemOff sp (ii + 1) + v0 <- peekElemOff tp (fi c0) + v1 <- peekElemOff tp (fi c1) + let !tail_acc = v0 .|. v1 + if tail_acc .&. 0x80 /= 0 || v1 .&. 0x0F /= 0 + then pure False + else do + let !b0 = (v0 `B.shiftL` 2) + .|. ((v1 `B.shiftR` 4) .&. 0x03) + pokeElemOff dst oo b0 + pure True + pure $! if ok then Just (BI.PS fp 0 outlen) else Nothing diff --git a/lib/Data/ByteString/Base64/Arm.hs b/lib/Data/ByteString/Base64/Arm.hs @@ -0,0 +1,86 @@ +{-# OPTIONS_HADDOCK hide #-} +{-# LANGUAGE BangPatterns #-} + +-- | +-- Module: Data.ByteString.Base64.Arm +-- Copyright: (c) 2026 Jared Tobin +-- License: MIT +-- Maintainer: Jared Tobin <jared@ppad.tech> +-- +-- ARM NEON support for base64 encoding and decoding. + +module Data.ByteString.Base64.Arm ( + base64_arm_available + , encode + , decode + ) where + +import qualified Data.Bits as B +import Data.Bits ((.&.)) +import qualified Data.ByteString as BS +import qualified Data.ByteString.Internal as BI +import Data.Word (Word8) +import Foreign.C.Types (CInt(..), CSize(..)) +import Foreign.ForeignPtr (withForeignPtr) +import Foreign.Ptr (Ptr, plusPtr) +import Foreign.Storable (peekElemOff) +import System.IO.Unsafe (unsafeDupablePerformIO) + +-- ffi ------------------------------------------------------------------------ + +foreign import ccall unsafe "base64_encode_arm" + c_base64_encode :: Ptr Word8 -> Ptr Word8 -> CSize -> IO () + +foreign import ccall unsafe "base64_decode_arm" + c_base64_decode :: Ptr Word8 -> Ptr Word8 -> CSize -> CSize -> IO CInt + +foreign import ccall unsafe "base64_arm_available" + c_base64_arm_available :: IO CInt + +-- utilities ------------------------------------------------------------------ + +fi :: (Integral a, Num b) => a -> b +fi = fromIntegral +{-# INLINE fi #-} + +-- api ------------------------------------------------------------------------ + +-- | Are ARM NEON extensions available? +base64_arm_available :: Bool +base64_arm_available = + unsafeDupablePerformIO c_base64_arm_available /= 0 +{-# NOINLINE base64_arm_available #-} + +-- | Encode a base256 'ByteString' as base64 using NEON. +encode :: BS.ByteString -> BS.ByteString +encode (BI.PS sfp soff l) = + BI.unsafeCreate ((l + 2) `quot` 3 * 4) $ \dst -> + withForeignPtr sfp $ \sp0 -> + c_base64_encode (sp0 `plusPtr` soff) dst (fi l) + +-- | Decode a base64 'ByteString' to base256 using NEON. Returns +-- 'Nothing' on malformed input. +decode :: BS.ByteString -> Maybe BS.ByteString +decode (BI.PS sfp soff l) + | l == 0 = Just BS.empty + | l .&. 0x03 /= 0 = Nothing + | otherwise = unsafeDupablePerformIO $ + withForeignPtr sfp $ \sp0 -> do + let !sp = sp0 `plusPtr` soff :: Ptr Word8 + c_pre <- peekElemOff sp (l - 2) + c_end <- peekElemOff sp (l - 1) + let !pad_pre = c_pre == 0x3D + !pad_end = c_end == 0x3D + if pad_pre && not pad_end + then pure Nothing + else do + let !pad = (if pad_pre then 2 else if pad_end then 1 else 0) + :: Int + !nfull = l `B.shiftR` 2 + !outlen = nfull * 3 - pad + fp <- BI.mallocByteString outlen + ok <- withForeignPtr fp $ \dst -> + c_base64_decode sp dst (fi l) (fi outlen) + pure $! if ok /= 0 + then Just (BI.PS fp 0 outlen) + else Nothing diff --git a/ppad-base64.cabal b/ppad-base64.cabal @@ -1,7 +1,7 @@ cabal-version: 3.0 name: ppad-base64 version: 0.1.0 -synopsis: Pure base64 encoding and decoding on bytestrings. +synopsis: Fast base64 encoding and decoding on bytestrings. license: MIT license-file: LICENSE author: Jared Tobin @@ -11,13 +11,18 @@ build-type: Simple tested-with: GHC == { 9.10.3 } extra-doc-files: CHANGELOG description: - Pure base64 encoding and decoding on bytestrings. + Fast base64 encoding and decoding on bytestrings. flag llvm description: Use GHC's LLVM backend. default: False manual: True +flag sanitize + description: Build with AddressSanitizer and UndefinedBehaviorSanitizer. + default: False + manual: True + source-repository head type: git location: git.ppad.tech/base64.git @@ -31,9 +36,17 @@ library ghc-options: -fllvm -O2 exposed-modules: Data.ByteString.Base64 + Data.ByteString.Base64.Arm build-depends: base >= 4.9 && < 5 , bytestring >= 0.9 && < 0.13 + c-sources: + cbits/base64_arm.c + if arch(aarch64) + cc-options: -march=armv8-a + if flag(sanitize) + cc-options: -fsanitize=address,undefined -fno-omit-frame-pointer + ghc-options: -optl=-fsanitize=address,undefined test-suite base64-tests type: exitcode-stdio-1.0 @@ -43,6 +56,8 @@ test-suite base64-tests ghc-options: -rtsopts -Wall -O2 + if flag(sanitize) + ghc-options: -optl=-fsanitize=address,undefined build-depends: base