fixed

Pure Haskell large fixed-width integers.
git clone git://git.ppad.tech/fixed.git
Log | Files | Refs | README | LICENSE

commit e1ae0a4292247be85048b66ff411c62ddec34ecc
parent 59753b492b5a6e64e35a93ae4cd4ceedc53e7f7c
Author: Jared Tobin <jared@jtobin.io>
Date:   Sat,  6 Dec 2025 17:04:13 +0400

meta: readme bump

Diffstat:
MREADME.md | 169+++++++++++++++++++++++++++++++++++++++++++------------------------------------
1 file changed, 92 insertions(+), 77 deletions(-)

diff --git a/README.md b/README.md @@ -1,7 +1,10 @@ # ppad-fixed -A pure (**pre-release**, under construction, not-yet-constant-time) -implementation of large fixed-width integers and supporting operations. +A pure (**pre-release**, under construction, +not-yet-guaranteed-constant-time) high-performance implementation +of large fixed-width integers and supporting operations, including +Montgomery-form arithmetic on domains related to the the elliptic curve +secp256k1. ## Usage @@ -43,84 +46,96 @@ A sample GHCi session: ## Performance -The aim is best-in-class performance for pure, highly-auditable Haskell -code. Most operations beat out their variable-length Integer variants, -*but*, we have yet to reach parity with the all-important division and -modulo operations. +The aim is best-in-class performance for pure Haskell code. -Current benchmark figures on my mid-2020 MacBook Air look like (use -`cabal bench` to run the benchmark suite): +Current benchmark figures on my M4 Silicon MacBook Air look like: ``` - benchmarking addition & subtraction/add - time 11.20 ns (11.11 ns .. 11.30 ns) - 0.999 R² (0.999 R² .. 1.000 R²) - mean 11.47 ns (11.32 ns .. 11.75 ns) - std dev 679.6 ps (380.7 ps .. 1.038 ns) - variance introduced by outliers: 80% (severely inflated) - - benchmarking addition & subtraction/add (baseline) - time 25.09 ns (24.36 ns .. 26.00 ns) - 0.991 R² (0.981 R² .. 0.998 R²) - mean 24.81 ns (24.27 ns .. 25.85 ns) - std dev 2.408 ns (1.289 ns .. 4.439 ns) - variance introduced by outliers: 91% (severely inflated) - - benchmarking addition & subtraction/sub - time 11.51 ns (11.42 ns .. 11.63 ns) - 0.999 R² (0.999 R² .. 1.000 R²) - mean 11.66 ns (11.54 ns .. 11.81 ns) - std dev 475.9 ps (350.7 ps .. 642.7 ps) - variance introduced by outliers: 65% (severely inflated) - - benchmarking addition & subtraction/sub (baseline) - time 31.93 ns (31.71 ns .. 32.13 ns) - 1.000 R² (0.999 R² .. 1.000 R²) - mean 31.65 ns (31.39 ns .. 31.89 ns) - std dev 841.0 ps (710.8 ps .. 1.002 ns) - variance introduced by outliers: 42% (moderately inflated) - - benchmarking multiplication/mul - time 29.41 ns (29.06 ns .. 29.84 ns) - 0.995 R² (0.988 R² .. 0.999 R²) - mean 30.50 ns (29.58 ns .. 33.10 ns) - std dev 4.628 ns (1.452 ns .. 9.437 ns) - variance introduced by outliers: 96% (severely inflated) - - benchmarking multiplication/mul (baseline) - time 46.08 ns (45.71 ns .. 46.43 ns) - 0.999 R² (0.999 R² .. 1.000 R²) - mean 46.12 ns (45.65 ns .. 46.56 ns) - std dev 1.564 ns (1.218 ns .. 2.063 ns) - variance introduced by outliers: 54% (severely inflated) - - benchmarking division/div - time 81.94 ns (81.11 ns .. 82.84 ns) - 0.999 R² (0.999 R² .. 1.000 R²) - mean 81.97 ns (81.43 ns .. 82.89 ns) - std dev 2.422 ns (1.696 ns .. 3.578 ns) - variance introduced by outliers: 46% (moderately inflated) - - benchmarking division/div (baseline) - time 72.08 ns (71.26 ns .. 72.89 ns) - 0.999 R² (0.999 R² .. 0.999 R²) - mean 72.35 ns (71.52 ns .. 73.15 ns) - std dev 2.566 ns (2.203 ns .. 3.044 ns) - variance introduced by outliers: 55% (severely inflated) - - benchmarking division/mod - time 106.6 ns (105.7 ns .. 107.5 ns) - 0.999 R² (0.999 R² .. 0.999 R²) - mean 107.9 ns (106.5 ns .. 109.2 ns) - std dev 4.520 ns (3.705 ns .. 6.013 ns) - variance introduced by outliers: 63% (severely inflated) - - benchmarking division/mod (baseline) - time 77.94 ns (76.87 ns .. 79.02 ns) - 0.998 R² (0.998 R² .. 0.999 R²) - mean 78.50 ns (77.49 ns .. 79.74 ns) - std dev 3.694 ns (2.729 ns .. 5.235 ns) - variance introduced by outliers: 68% (severely inflated) +benchmarking add/curve: M(1) + M(2 ^ 255 - 19) +time 6.890 ns (6.882 ns .. 6.906 ns) + 1.000 R² (1.000 R² .. 1.000 R²) +mean 6.889 ns (6.885 ns .. 6.897 ns) +std dev 16.73 ps (8.753 ps .. 30.75 ps) + +benchmarking sub/curve: M(2 ^ 255 - 1) - M(2 ^ 255 - 19) +time 6.876 ns (6.872 ns .. 6.883 ns) + 1.000 R² (1.000 R² .. 1.000 R²) +mean 6.877 ns (6.874 ns .. 6.881 ns) +std dev 11.60 ps (7.865 ps .. 18.34 ps) + +benchmarking mul/curve: M(2) * M(2 ^ 255 - 19) +time 14.78 ns (14.75 ns .. 14.81 ns) + 1.000 R² (1.000 R² .. 1.000 R²) +mean 14.82 ns (14.79 ns .. 14.85 ns) +std dev 99.51 ps (84.22 ps .. 118.2 ps) + +benchmarking sqr/curve: M(2 ^ 255 - 19) ^ 2 +time 14.23 ns (14.20 ns .. 14.27 ns) + 1.000 R² (1.000 R² .. 1.000 R²) +mean 14.26 ns (14.23 ns .. 14.29 ns) +std dev 114.3 ps (84.98 ps .. 181.1 ps) + +benchmarking inv/curve: M(2 ^ 255 - 19) ^ -1 +time 7.004 μs (6.988 μs .. 7.030 μs) + 1.000 R² (1.000 R² .. 1.000 R²) +mean 7.027 μs (7.013 μs .. 7.047 μs) +std dev 57.73 ns (50.57 ns .. 74.05 ns) +``` + +The library can be used either via a boxed or unboxed API. Unboxed +functions, suffixed by magic hashes, do not allocate. Boxed functions +allocate only for top-level data constructors used when boxing results +-- any internal arithmetic is entirely unboxed. + +For example, a boxed 'Wide' word, defined as: + +``` haskell + data Wide = Wide !(# Limb, Limb #) +``` + +will allocate precisely three machine words -- one for the constructor, +and one for each 'Limb' (so, 24 bytes on a 64-bit machine). A 'Wider' +word, defined as: + +``` haskell + data Wider = Wider !(# Limb, Limb, Limb, Limb #) +``` + +will allocate five machine words (40 bytes) in turn. + +Thus, the allocation benchmark story for 256-bit Montgomery arithmetic +using the boxed API looks like: + +``` +add + + Case Allocated GCs + curve: M(1) + M(2) 40 0 + curve: M(1) + M(2 ^ 255 - 19) 40 0 + +sub + + Case Allocated GCs + curve: M(2 ^ 255 - 1) - M(1) 40 0 + curve: M(2 ^ 255 - 1) - M(2 ^ 255 - 19) 40 0 + +mul + + Case Allocated GCs + curve: M(2) * M(2) 40 0 + curve: M(2) * M(2 ^ 255 - 19) 40 0 + +sqr + + Case Allocated GCs + curve: M(2) ^ 2 40 0 + curve: M(2 ^ 255 - 19) ^ 2 40 0 + +inv + + Case Allocated GCs + curve: M(2) ^ -1 40 0 + curve: M(2 ^ 255 - 19) ^ -1 40 0 ``` Note that you can compile with GHC's LLVM backend and filter on specific