fixed

Pure Haskell large fixed-width integers.
git clone git://git.ppad.tech/fixed.git
Log | Files | Refs | README | LICENSE

commit d27e503317a3685939a5bea559126b8c86264b7b
parent 83dfead9e9cba75d7b927a61741df8bc28e20be8
Author: Jared Tobin <jared@jtobin.io>
Date:   Sun, 26 Jan 2025 20:36:48 +0400

meta: perf note

(cherry picked from commit 3627073e287e6917c9d60f96d9cb37da5bc418cf)

Diffstat:
MREADME.md | 31++++++++++++++++++++++++++-----
1 file changed, 26 insertions(+), 5 deletions(-)

diff --git a/README.md b/README.md @@ -41,11 +41,6 @@ A sample GHCi session: 65535 ``` -## Documentation - -Haddocks (API documentation, etc.) are hosted at -[docs.ppad.tech/fixed](https://docs.ppad.tech/fixed). - ## Performance The aim is best-in-class performance for pure, highly-auditable Haskell @@ -53,6 +48,32 @@ code. Most operations beat out their variable-length Integer variants, *but*, we have yet to reach parity with the all-important division and modulo operations. +There are a few ways I can imagine this being improved: + +* Extremely careful use of Data.Primitive.PrimArray can likely beat + the pure code that currently implements most of the library. A + single allocation of a maximum 21-long mutable PrimArray is not + terribly expensive. In particular finding a remainder, e.g. for + modulo, requires comparatively few expensive writes. + + We require, for cases like modular multplication, at most 8 elements + for the quotient, 9 elements for the dividend/normalized dividend + (i.e. reusing the same area for both), and 8 elements for the + divisor/normalized divisor (ditto), assuming we stick with Knuth's division + algorithm. In other situations, e.g. division or modulo, we need less. + +* Avoid unnecessarily calculating the quotient or remainder if only one + of the two is desired. + +* It's possible that manually writing internals using primitives (think + MagicHash, UnboxedTuples, GHC.Exts) could yield an improvement; I've + noticed that GHC doesn't always unbox everything automatically. This + is possibly due to register contention, though, so it could also + potentially degrade performance. + +* An alternative division algorithm might be worth looking into, if it + proves to be impossible to squeeze more out of Knuth's. + Current benchmark figures on my mid-2020 MacBook Air look like (use `cabal bench` to run the benchmark suite):