meta: readme bump - fixed - Pure Haskell large fixed-width integers.

commit e1ae0a4292247be85048b66ff411c62ddec34ecc
parent 59753b492b5a6e64e35a93ae4cd4ceedc53e7f7c
Author: Jared Tobin <jared@jtobin.io>
Date:   Sat,  6 Dec 2025 17:04:13 +0400

meta: readme bump

Diffstat:
M README.md  | 169 +++++++++++++++++++++++++++++++++++++++++++------------------------------------

1 file changed, 92 insertions(+), 77 deletions(-)
diff --git a/README.md b/README.md
@@ -1,7 +1,10 @@
 # ppad-fixed
 
-A pure (**pre-release**, under construction, not-yet-constant-time)
-implementation of large fixed-width integers and supporting operations.
+A pure (**pre-release**, under construction,
+not-yet-guaranteed-constant-time) high-performance implementation
+of large fixed-width integers and supporting operations, including
+Montgomery-form arithmetic on domains related to the the elliptic curve
+secp256k1.
 
 ## Usage
 
@@ -43,84 +46,96 @@ A sample GHCi session:
 
 ## Performance
 
-The aim is best-in-class performance for pure, highly-auditable Haskell
-code. Most operations beat out their variable-length Integer variants,
-*but*, we have yet to reach parity with the all-important division and
-modulo operations.
+The aim is best-in-class performance for pure Haskell code.
 
-Current benchmark figures on my mid-2020 MacBook Air look like (use
-`cabal bench` to run the benchmark suite):
+Current benchmark figures on my M4 Silicon MacBook Air look like:
 
 ```
-  benchmarking addition & subtraction/add
-  time                 11.20 ns   (11.11 ns .. 11.30 ns)
-                       0.999 R²   (0.999 R² .. 1.000 R²)
-  mean                 11.47 ns   (11.32 ns .. 11.75 ns)
-  std dev              679.6 ps   (380.7 ps .. 1.038 ns)
-  variance introduced by outliers: 80% (severely inflated)
-
-  benchmarking addition & subtraction/add (baseline)
-  time                 25.09 ns   (24.36 ns .. 26.00 ns)
-                       0.991 R²   (0.981 R² .. 0.998 R²)
-  mean                 24.81 ns   (24.27 ns .. 25.85 ns)
-  std dev              2.408 ns   (1.289 ns .. 4.439 ns)
-  variance introduced by outliers: 91% (severely inflated)
-
-  benchmarking addition & subtraction/sub
-  time                 11.51 ns   (11.42 ns .. 11.63 ns)
-                       0.999 R²   (0.999 R² .. 1.000 R²)
-  mean                 11.66 ns   (11.54 ns .. 11.81 ns)
-  std dev              475.9 ps   (350.7 ps .. 642.7 ps)
-  variance introduced by outliers: 65% (severely inflated)
-
-  benchmarking addition & subtraction/sub (baseline)
-  time                 31.93 ns   (31.71 ns .. 32.13 ns)
-                       1.000 R²   (0.999 R² .. 1.000 R²)
-  mean                 31.65 ns   (31.39 ns .. 31.89 ns)
-  std dev              841.0 ps   (710.8 ps .. 1.002 ns)
-  variance introduced by outliers: 42% (moderately inflated)
-
-  benchmarking multiplication/mul
-  time                 29.41 ns   (29.06 ns .. 29.84 ns)
-                       0.995 R²   (0.988 R² .. 0.999 R²)
-  mean                 30.50 ns   (29.58 ns .. 33.10 ns)
-  std dev              4.628 ns   (1.452 ns .. 9.437 ns)
-  variance introduced by outliers: 96% (severely inflated)
-
-  benchmarking multiplication/mul (baseline)
-  time                 46.08 ns   (45.71 ns .. 46.43 ns)
-                       0.999 R²   (0.999 R² .. 1.000 R²)
-  mean                 46.12 ns   (45.65 ns .. 46.56 ns)
-  std dev              1.564 ns   (1.218 ns .. 2.063 ns)
-  variance introduced by outliers: 54% (severely inflated)
-
-  benchmarking division/div
-  time                 81.94 ns   (81.11 ns .. 82.84 ns)
-                       0.999 R²   (0.999 R² .. 1.000 R²)
-  mean                 81.97 ns   (81.43 ns .. 82.89 ns)
-  std dev              2.422 ns   (1.696 ns .. 3.578 ns)
-  variance introduced by outliers: 46% (moderately inflated)
-
-  benchmarking division/div (baseline)
-  time                 72.08 ns   (71.26 ns .. 72.89 ns)
-                       0.999 R²   (0.999 R² .. 0.999 R²)
-  mean                 72.35 ns   (71.52 ns .. 73.15 ns)
-  std dev              2.566 ns   (2.203 ns .. 3.044 ns)
-  variance introduced by outliers: 55% (severely inflated)
-
-  benchmarking division/mod
-  time                 106.6 ns   (105.7 ns .. 107.5 ns)
-                       0.999 R²   (0.999 R² .. 0.999 R²)
-  mean                 107.9 ns   (106.5 ns .. 109.2 ns)
-  std dev              4.520 ns   (3.705 ns .. 6.013 ns)
-  variance introduced by outliers: 63% (severely inflated)
-
-  benchmarking division/mod (baseline)
-  time                 77.94 ns   (76.87 ns .. 79.02 ns)
-                       0.998 R²   (0.998 R² .. 0.999 R²)
-  mean                 78.50 ns   (77.49 ns .. 79.74 ns)
-  std dev              3.694 ns   (2.729 ns .. 5.235 ns)
-  variance introduced by outliers: 68% (severely inflated)
+benchmarking add/curve:  M(1) + M(2 ^ 255 - 19)
+time                 6.890 ns   (6.882 ns .. 6.906 ns)
+                     1.000 R²   (1.000 R² .. 1.000 R²)
+mean                 6.889 ns   (6.885 ns .. 6.897 ns)
+std dev              16.73 ps   (8.753 ps .. 30.75 ps)
+
+benchmarking sub/curve:  M(2 ^ 255 - 1) - M(2 ^ 255 - 19)
+time                 6.876 ns   (6.872 ns .. 6.883 ns)
+                     1.000 R²   (1.000 R² .. 1.000 R²)
+mean                 6.877 ns   (6.874 ns .. 6.881 ns)
+std dev              11.60 ps   (7.865 ps .. 18.34 ps)
+
+benchmarking mul/curve:  M(2) * M(2 ^ 255 - 19)
+time                 14.78 ns   (14.75 ns .. 14.81 ns)
+                     1.000 R²   (1.000 R² .. 1.000 R²)
+mean                 14.82 ns   (14.79 ns .. 14.85 ns)
+std dev              99.51 ps   (84.22 ps .. 118.2 ps)
+
+benchmarking sqr/curve:  M(2 ^ 255 - 19) ^ 2
+time                 14.23 ns   (14.20 ns .. 14.27 ns)
+                     1.000 R²   (1.000 R² .. 1.000 R²)
+mean                 14.26 ns   (14.23 ns .. 14.29 ns)
+std dev              114.3 ps   (84.98 ps .. 181.1 ps)
+
+benchmarking inv/curve:  M(2 ^ 255 - 19) ^ -1
+time                 7.004 μs   (6.988 μs .. 7.030 μs)
+                     1.000 R²   (1.000 R² .. 1.000 R²)
+mean                 7.027 μs   (7.013 μs .. 7.047 μs)
+std dev              57.73 ns   (50.57 ns .. 74.05 ns)
+```
+
+The library can be used either via a boxed or unboxed API. Unboxed
+functions, suffixed by magic hashes, do not allocate. Boxed functions
+allocate only for top-level data constructors used when boxing results
+-- any internal arithmetic is entirely unboxed.
+
+For example, a boxed 'Wide' word, defined as:
+
+``` haskell
+  data Wide = Wide !(# Limb, Limb #)
+```
+
+will allocate precisely three machine words -- one for the constructor,
+and one for each 'Limb' (so, 24 bytes on a 64-bit machine). A 'Wider'
+word, defined as:
+
+``` haskell
+  data Wider = Wider !(# Limb, Limb, Limb, Limb #)
+```
+
+will allocate five machine words (40 bytes) in turn.
+
+Thus, the allocation benchmark story for 256-bit Montgomery arithmetic
+using the boxed API looks like:
+
+```
+add
+
+  Case                            Allocated  GCs
+  curve:  M(1) + M(2)                    40    0
+  curve:  M(1) + M(2 ^ 255 - 19)         40    0
+
+sub
+
+  Case                                      Allocated  GCs
+  curve:  M(2 ^ 255 - 1) - M(1)                    40    0
+  curve:  M(2 ^ 255 - 1) - M(2 ^ 255 - 19)         40    0
+
+mul
+
+  Case                            Allocated  GCs
+  curve:  M(2) * M(2)                    40    0
+  curve:  M(2) * M(2 ^ 255 - 19)         40    0
+
+sqr
+
+  Case                         Allocated  GCs
+  curve:  M(2) ^ 2                    40    0
+  curve:  M(2 ^ 255 - 19) ^ 2         40    0
+
+inv
+
+  Case                          Allocated  GCs
+  curve:  M(2) ^ -1                    40    0
+  curve:  M(2 ^ 255 - 19) ^ -1         40    0
 ```
 
 Note that you can compile with GHC's LLVM backend and filter on specific

	fixed Pure Haskell large fixed-width integers.
	git clone git://git.ppad.tech/fixed.git
	Log \| Files \| Refs \| README \| LICENSE