<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>base64, branch HEAD</title>
<subtitle>Fast Haskell base64 encoding/decoding (docs.ppad.tech/base64).
</subtitle>
<entry>
<id>690785db542a958976fe289044f612a240a2cc9e</id>
<published>2026-05-16T15:41:15Z</published>
<updated>2026-05-16T15:41:35Z</updated>
<title type="text">release: v0.1.0</title>
<link rel="alternate" type="text/html" href="commit/690785db542a958976fe289044f612a240a2cc9e.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit 690785db542a958976fe289044f612a240a2cc9e
parent 3d622446b5ee3af52511cc9770895cb1acf4d940
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 13:11:15 -0230

release: v0.1.0

</content>
</entry>
<entry>
<id>3d622446b5ee3af52511cc9770895cb1acf4d940</id>
<published>2026-05-16T15:34:11Z</published>
<updated>2026-05-16T15:39:17Z</updated>
<title type="text">Merge branch &#39;perf-refactor&#39;</title>
<link rel="alternate" type="text/html" href="commit/3d622446b5ee3af52511cc9770895cb1acf4d940.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit 3d622446b5ee3af52511cc9770895cb1acf4d940
parent b4dd9ff6c285bfb9db834cdcca3d460688c3297d
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 13:04:11 -0230

Merge branch &#39;perf-refactor&#39;

Performance refactor + ARM NEON intrinsics, mirroring the analogous
work merged to ppad-base16 master.

Five commits, organized as two logical changes:

1. Drop the bytestring &#39;Builder&#39; pipeline in favour of &#39;BI.unsafeCreate&#39;
   plus two static-rodata lookup tables (encode alphabet + decode
   table), with the 0x40-offset trick keeping the decode table&#39;s
   string literal NUL-free so it lives in rodata via the bytestring
   IsString rewrite. Encode falls from ~2.3 μs to ~270 ns on 1 KiB
   inputs.

2. Add an aarch64 NEON kernel in &#39;cbits/base64_arm.c&#39; exposed via the
   new &#39;Data.ByteString.Base64.Arm&#39; module:

   * Encode kernel processes 12 input bytes -&gt; 16 output chars per
     iteration via a vqtbl1q_u8 shuffle, four parallel u32 shifts +
     masks, and a vqtbl4q_u8 alphabet lookup.

   * Decode kernel processes 16 input chars -&gt; 12 output bytes per
     iteration. Range-compare validation with OR-accumulated &#39;bad&#39;
     masks, per-u32-lane 24-bit pack, vqtbl1q_u8 reorder to BE
     triplets. The Haskell side hands the C kernel both inlen and
     outlen; padding detection and the padded final quartet
     (including RFC 4648 §3.5 non-data-bit validation) are handled
     in C for symmetry with encode.

   &#39;Data.ByteString.Base64.encode&#39; and &#39;decode&#39; dispatch to the NEON
   path when &#39;base64_arm_available&#39; returns true, falling back to the
   scalar path otherwise. Cabal adds the C sources, an aarch64
   &#39;-march=armv8-a&#39; cc-option, and a &#39;sanitize&#39; flag for ASan + UBSan
   builds.

Performance on 1 KiB inputs, M4 MacBook Air, GHC 9.10.3 + LLVM 19,
&#39;cabal bench -f+llvm&#39;:

  encode time:   2.279 μs -&gt;   102 ns   (~22×)
  decode time:   649.2 ns -&gt;   160 ns   (~4×)

The existing tasty suite (5000 QuickCheck cases × 3 properties + the
RFC 4648 §10 unit vectors) passes through the dispatched path under
&#39;cabal test&#39;, &#39;cabal test -fllvm&#39;, and &#39;cabal test -fsanitize&#39;.

Also rebrands the cabal/flake/README descriptions from &quot;Pure&quot; to
&quot;Fast&quot; to reflect that the hot path is no longer purely Haskell.

</content>
</entry>
<entry>
<id>f606ebee8dd5da7c25005f5c0e7c7bdefc20f52f</id>
<published>2026-05-16T15:31:38Z</published>
<updated>2026-05-16T15:31:38Z</updated>
<title type="text">readme: ARM intrinsics note + bench figures</title>
<link rel="alternate" type="text/html" href="commit/f606ebee8dd5da7c25005f5c0e7c7bdefc20f52f.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit f606ebee8dd5da7c25005f5c0e7c7bdefc20f52f
parent 0e83ab9538f4c79593e70ae5d338c349fdfe0e8b
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 13:01:38 -0230

readme: ARM intrinsics note + bench figures

Update README tagline from &quot;Pure&quot; to &quot;Fast&quot; and rewrite the
Performance section to note hardware acceleration via ARM NEON
intrinsics. New 1 KiB criterion figures from an M4 MacBook Air,
GHC 9.10.3 + LLVM 19, -fllvm:

  encode time:   2.279 μs -&gt;   102 ns   (~22×)
  decode time:   649.2 ns -&gt;   160 ns   (~4×)

</content>
</entry>
<entry>
<id>0e83ab9538f4c79593e70ae5d338c349fdfe0e8b</id>
<published>2026-05-16T15:31:30Z</published>
<updated>2026-05-16T15:31:30Z</updated>
<title type="text">meta: rebrand from Pure to Fast</title>
<link rel="alternate" type="text/html" href="commit/0e83ab9538f4c79593e70ae5d338c349fdfe0e8b.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit 0e83ab9538f4c79593e70ae5d338c349fdfe0e8b
parent e01f8d10d9bafcab783a6a3bce9ae1b31d6223b6
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 13:01:30 -0230

meta: rebrand from Pure to Fast

Now that the library uses ARM NEON intrinsics for the hot path
(when available) it&#39;s no longer purely Haskell. Update the cabal
synopsis/description and flake.nix description accordingly.

</content>
</entry>
<entry>
<id>e01f8d10d9bafcab783a6a3bce9ae1b31d6223b6</id>
<published>2026-05-16T15:30:26Z</published>
<updated>2026-05-16T15:30:26Z</updated>
<title type="text">lib: dispatch encode/decode to ARM NEON when available</title>
<link rel="alternate" type="text/html" href="commit/e01f8d10d9bafcab783a6a3bce9ae1b31d6223b6.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit e01f8d10d9bafcab783a6a3bce9ae1b31d6223b6
parent 72fa80fdb1438d0d10e0f536558afd2ddbd593c8
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 13:00:26 -0230

lib: dispatch encode/decode to ARM NEON when available

Wire &#39;Data.ByteString.Base64.encode&#39; and &#39;decode&#39; to the NEON
implementation added in the previous commit, with the pure Haskell
scalar loop kept as a fallback.

Mirrors the dispatch pattern in ppad-base16 / ppad-sha256:

    encode bs
      | Arm.base64_arm_available = Arm.encode bs
      | otherwise                = encode_scalar bs

No behavioural change beyond dispatch: on aarch64 the NEON path is
taken, on every other arch the C stubs return availability = 0 and
the scalar bodies run.

Existing tasty suite (5000 QuickCheck cases × 3 properties + the
RFC 4648 §10 unit vectors) passes through the dispatched path,
including under &#39;cabal test -fllvm -fsanitize&#39; which exercises the
C kernel under AddressSanitizer + UndefinedBehaviorSanitizer.

Performance on 1 KiB inputs, M4 MacBook Air, GHC 9.10.3 + LLVM 19,
-fllvm:

  encode time:   270 ns -&gt; 102 ns   (~2.6×)
  decode time:   273 ns -&gt; 160 ns   (~1.7×)

</content>
</entry>
<entry>
<id>72fa80fdb1438d0d10e0f536558afd2ddbd593c8</id>
<published>2026-05-16T15:28:58Z</published>
<updated>2026-05-16T15:28:58Z</updated>
<title type="text">lib: add ARM NEON implementation</title>
<link rel="alternate" type="text/html" href="commit/72fa80fdb1438d0d10e0f536558afd2ddbd593c8.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit 72fa80fdb1438d0d10e0f536558afd2ddbd593c8
parent d9c21f51a123552c70e582d98e14593860259889
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 12:58:58 -0230

lib: add ARM NEON implementation

Mirror ppad-base16&#39;s arm-neon branch. Add an aarch64 NEON kernel for
base64 encode and decode in a small C file with intrinsics gated by
&#39;#if defined(__aarch64__)&#39; + stubs in the &#39;#else&#39; branch, exposed to
Haskell via &#39;foreign import ccall unsafe&#39; in a new module
&#39;Data.ByteString.Base64.Arm&#39;.

The C kernel:

* Encode processes 12 input bytes per NEON iteration. &#39;vld1q_u8&#39; loads
  16 bytes (the 4-byte over-read is safe under the loop bound);
  &#39;vqtbl1q_u8&#39; with a fixed shuffle gathers each 4-byte output lane as
  [b1, b0, b2, b1], the order that lets four &#39;vshrq_n_u32 + vandq_u32&#39;
  pairs extract the six-bit indices i0..i3 directly into byte slots;
  &#39;vqtbl4q_u8&#39; looks each index up in the 64-byte alphabet table; one
  &#39;vst1q_u8&#39; stores all 16 output chars. A scalar tail finishes any
  full triplet that fell outside the NEON cut-off, then a final branch
  emits the 0/1/2-byte padded tail.

* Decode processes 16 input chars per NEON iteration. &#39;ascii_to_b64&#39;
  validates each lane with byte-range compares and yields its 6-bit
  value via an additive offset; the per-iter &#39;bad&#39; masks are OR-
  accumulated and reduced once at the end with &#39;vmaxvq_u8&#39;. Each u32
  lane packs four 6-bit values into a 24-bit V; &#39;vqtbl1q_u8&#39; reorders
  V&#39;s LE bytes into BE triplets, giving 12 valid output bytes in the
  low 12 lanes; &#39;vst1q_u8&#39; stores 16 with the loop bound keeping the
  4-byte overrun inside the allocated buffer. A scalar tail handles
  the remaining body quartets, then the padded final quartet (1- or
  2-byte output) is decoded explicitly with non-data-bit checks per
  RFC 4648 §3.5.

The Haskell wrapper:

* &#39;base64_arm_available :: Bool&#39; NOINLINE CAF queries the C-side
  availability probe once; returns &#39;True&#39; on aarch64, &#39;False&#39; on
  every other arch (where the C stubs are linked in).
* &#39;encode&#39; wraps &#39;BI.unsafeCreate&#39;; &#39;decode&#39; computes the padded
  outlen up front, allocates with &#39;BI.mallocByteString&#39;, and passes
  both inlen and outlen to the C kernel.
* &#39;OPTIONS_HADDOCK hide&#39; keeps the module out of public docs.

Cabal:

* &#39;c-sources: cbits/base64_arm.c&#39; compiles the kernel into the
  library on every platform; the &#39;#if&#39;-gated body means the
  contributed code is empty on non-aarch64.
* &#39;if arch(aarch64) cc-options: -march=armv8-a&#39; pins the target to
  baseline armv8.
* New &#39;sanitize&#39; flag adds &#39;-fsanitize=address,undefined
  -fno-omit-frame-pointer&#39; to both the C source and the test-suite
  link, mirroring ppad-base16 and ppad-sha256. Built with
  &#39;cabal test -fllvm -fsanitize&#39;.
* &#39;Data.ByteString.Base64.Arm&#39; added to &#39;exposed-modules&#39; so
  consumers can call the NEON path directly if they want to bypass
  dispatch.

No call sites in &#39;Data.ByteString.Base64&#39; wired yet — the existing
tasty + criterion suites still go through the scalar path after this
commit, and pass unchanged (verified under cabal test, cabal test
-fllvm, and cabal test -fsanitize).

</content>
</entry>
<entry>
<id>d9c21f51a123552c70e582d98e14593860259889</id>
<published>2026-05-16T15:17:33Z</published>
<updated>2026-05-16T15:17:33Z</updated>
<title type="text">lib: drop bytestring builder, use unsafeCreate + lookup tables</title>
<link rel="alternate" type="text/html" href="commit/d9c21f51a123552c70e582d98e14593860259889.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit d9c21f51a123552c70e582d98e14593860259889
parent b4dd9ff6c285bfb9db834cdcca3d460688c3297d
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 12:47:33 -0230

lib: drop bytestring builder, use unsafeCreate + lookup tables

Mirror ppad-base16&#39;s perf-refactor.

* enc_tab is the 64-byte alphabet, indexed by 6-bit value.
* dec_tab is a 256-byte table mapping each ASCII byte to its 6-bit
  value (offset by 0x40, in the range 0x40..0x7F) or 0x80 for any
  invalid byte (including &#39;=&#39;). The offset keeps the literal NUL-
  free so it lives in static rodata via the bytestring IsString
  rewrite.
* Decode OR-folds every lookup into an accumulator and tests
  &#39;acc .&amp;. 0x80 == 0&#39; once at the end, mirroring base16&#39;s bit-5
  sentinel trick.
* encode_scalar walks 3 input bytes at a time via direct pointer
  ops in BI.unsafeCreate; final 1- or 2-byte tail emits padding.
* decode_scalar peels off the padded final quartet, runs a tight
  body loop, then validates non-data bits per RFC §3.5.

Encode falls from ~2.3 μs to ~270 ns on 1 KB inputs under -fllvm.

</content>
</entry>
<entry>
<id>b4dd9ff6c285bfb9db834cdcca3d460688c3297d</id>
<published>2026-05-16T14:16:33Z</published>
<updated>2026-05-16T14:16:33Z</updated>
<title type="text">meta: benchmark figures from m4 macbook air</title>
<link rel="alternate" type="text/html" href="commit/b4dd9ff6c285bfb9db834cdcca3d460688c3297d.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit b4dd9ff6c285bfb9db834cdcca3d460688c3297d
parent 5a89ef39a87510cfb42fef8356e1efd26d2c1f2e
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 11:46:33 -0230

meta: benchmark figures from m4 macbook air

Captured with cabal bench -f+llvm on an Apple M4 MacBook Air, GHC
9.10.3 with the LLVM backend, on a 1024-byte input.

</content>
</entry>
<entry>
<id>5a89ef39a87510cfb42fef8356e1efd26d2c1f2e</id>
<published>2026-05-16T14:16:20Z</published>
<updated>2026-05-16T14:16:20Z</updated>
<title type="text">meta: align README title with package name</title>
<link rel="alternate" type="text/html" href="commit/5a89ef39a87510cfb42fef8356e1efd26d2c1f2e.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit 5a89ef39a87510cfb42fef8356e1efd26d2c1f2e
parent 011d1f446a94c0ac72eb372accdc6951b5d797ea
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 11:46:20 -0230

meta: align README title with package name

Use &quot;ppad-base64&quot; instead of &quot;base64&quot; to match the cabal package name.

</content>
</entry>
<entry>
<id>011d1f446a94c0ac72eb372accdc6951b5d797ea</id>
<published>2026-05-16T14:08:25Z</published>
<updated>2026-05-16T14:08:25Z</updated>
<title type="text">bench: criterion and weigh suites</title>
<link rel="alternate" type="text/html" href="commit/011d1f446a94c0ac72eb372accdc6951b5d797ea.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit 011d1f446a94c0ac72eb372accdc6951b5d797ea
parent d4c704d005ceedbac7cb11b3b7abec818a22bdb2
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 11:38:25 -0230

bench: criterion and weigh suites

Criterion bench for encode (1024B) and decode (1024-char input),
plus opt-in groups comparing against base64-bytestring and base64.
Weigh suite measures allocation on a ~1KB string against the same
two references.

</content>
</entry>
<entry>
<id>d4c704d005ceedbac7cb11b3b7abec818a22bdb2</id>
<published>2026-05-16T14:08:19Z</published>
<updated>2026-05-16T14:08:19Z</updated>
<title type="text">test: property tests and RFC vectors</title>
<link rel="alternate" type="text/html" href="commit/d4c704d005ceedbac7cb11b3b7abec818a22bdb2.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit d4c704d005ceedbac7cb11b3b7abec818a22bdb2
parent c84cc9b184e71f455d0cd8d6b829f20f34bf232b
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 11:38:19 -0230

test: property tests and RFC vectors

QuickCheck properties (5000 iters each) for decode-inverts-encode
and agreement with base64-bytestring on both encode and decode.
Unit test covers the seven RFC 4648 §10 vectors (&quot;&quot;, &quot;f&quot;, &quot;fo&quot;,
&quot;foo&quot;, &quot;foob&quot;, &quot;fooba&quot;, &quot;foobar&quot;), checking both directions.

</content>
</entry>
<entry>
<id>c84cc9b184e71f455d0cd8d6b829f20f34bf232b</id>
<published>2026-05-16T14:08:09Z</published>
<updated>2026-05-16T14:08:09Z</updated>
<title type="text">lib: base64 encoding and decoding</title>
<link rel="alternate" type="text/html" href="commit/c84cc9b184e71f455d0cd8d6b829f20f34bf232b.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit c84cc9b184e71f455d0cd8d6b829f20f34bf232b
parent 634f91042b13e9512fa8db4c2191bcf3e4a3f18c
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 11:38:09 -0230

lib: base64 encoding and decoding

Standard RFC 4648 §4 base64 (charset A-Za-z0-9+/, &#39;=&#39; padding).
Strict decode: rejects unpadded inputs, non-multiple-of-4 lengths,
invalid characters, and non-canonical encodings (non-zero
non-data bits in the final quartet, per RFC §3.5).

Encode dispatches over l rem 6 into six arms using go64 (6 bytes
→ word64BE), go32 (3 bytes → word32BE), and tail1/tail2 for the
final padded quartet.

Decode peels off the final 4-char quartet, then processes the
body in chunks of 32/16/8/4 chars writing 3·word64BE,
word64BE+word32BE, word32BE+word16BE, or word16BE+word8.

</content>
</entry>
<entry>
<id>634f91042b13e9512fa8db4c2191bcf3e4a3f18c</id>
<published>2026-05-16T14:07:21Z</published>
<updated>2026-05-16T14:07:21Z</updated>
<title type="text">meta: initial scaffolding</title>
<link rel="alternate" type="text/html" href="commit/634f91042b13e9512fa8db4c2191bcf3e4a3f18c.html" />
<author>
<name>Jared Tobin</name>
<email>jared@jtobin.io</email>
</author>
<content type="text">commit 634f91042b13e9512fa8db4c2191bcf3e4a3f18c
Author: Jared Tobin &lt;jared@jtobin.io&gt;
Date:   Sat, 16 May 2026 11:37:21 -0230

meta: initial scaffolding

Mirror ppad-base16 (master, v0.2.1) project layout: LICENSE,
.ghci, .gitignore, CHANGELOG, README, flake.nix/lock, and cabal
file. Library set up to expose Data.ByteString.Base64 with the
same llvm flag and dep bounds as ppad-base16.

</content>
</entry>
</feed>
