commit ff08521ffe124aa922f46fde544396af98393fa4
parent 6d44ea017f092a4dccef483d24827c8283e5b830
Author: Jared Tobin <jared@jtobin.io>
Date: Thu, 12 Sep 2024 09:51:53 +0400
meta: update readme w/perf notes
Diffstat:
M | README.md | | | 146 | ++++++++++++++++++++++++++++++++++++++++++++----------------------------------- |
1 file changed, 82 insertions(+), 64 deletions(-)
diff --git a/README.md b/README.md
@@ -8,40 +8,40 @@ lazy ByteStrings, as specified by RFC's [6234][r6234] and [2104][r2104].
A sample GHCi session:
```
-> :set -XOverloadedStrings
->
-> -- import qualified
-> import qualified Crypto.Hash.SHA256 as SHA256
->
-> -- 'hash' and 'hmac' operate on strict bytestrings
->
-> let hash_s = SHA256.hash "strict bytestring input"
-> let hmac_s = SHA256.hmac "strict secret" "strict bytestring input"
->
-> -- 'hash_lazy' and 'hmac_lazy' operate on lazy bytestrings
-> -- but note that the key for HMAC is always strict
->
-> let hash_l = SHA256.hash_lazy "lazy bytestring input"
-> let hmac_l = SHA256.hmac_lazy "strict secret" "lazy bytestring input"
->
-> -- results are always unformatted 256-bit (32-byte) strict bytestrings
->
-> import qualified Data.ByteString as BS
->
-> BS.take 10 hash_s
-"1\223\152Ha\USB\171V\a"
-> BS.take 10 hmac_l
-"\DELSOk\180\242\182'v\187"
->
-> -- you can use third-party libraries for rendering if necessary
-> -- e.g., using base16-bytestring:
->
-> import qualified Data.ByteString.Base16 as B16
->
-> B16.encode hash_s
-"31df9848611f42ab5607ea9e6de84b05d5259085abb30a7917d85efcda42b0e3"
-> B16.encode hmac_l
-"7f534f6bb4f2b62776bba3d6466e384505f2ff89c91f39800d7a0d4623a4711e"
+ > :set -XOverloadedStrings
+ >
+ > -- import qualified
+ > import qualified Crypto.Hash.SHA256 as SHA256
+ >
+ > -- 'hash' and 'hmac' operate on strict bytestrings
+ >
+ > let hash_s = SHA256.hash "strict bytestring input"
+ > let hmac_s = SHA256.hmac "strict secret" "strict bytestring input"
+ >
+ > -- 'hash_lazy' and 'hmac_lazy' operate on lazy bytestrings
+ > -- but note that the key for HMAC is always strict
+ >
+ > let hash_l = SHA256.hash_lazy "lazy bytestring input"
+ > let hmac_l = SHA256.hmac_lazy "strict secret" "lazy bytestring input"
+ >
+ > -- results are always unformatted 256-bit (32-byte) strict bytestrings
+ >
+ > import qualified Data.ByteString as BS
+ >
+ > BS.take 10 hash_s
+ "1\223\152Ha\USB\171V\a"
+ > BS.take 10 hmac_l
+ "\DELSOk\180\242\182'v\187"
+ >
+ > -- you can use third-party libraries for rendering if necessary
+ > -- e.g., using base16-bytestring:
+ >
+ > import qualified Data.ByteString.Base16 as B16
+ >
+ > B16.encode hash_s
+ "31df9848611f42ab5607ea9e6de84b05d5259085abb30a7917d85efcda42b0e3"
+ > B16.encode hmac_l
+ "7f534f6bb4f2b62776bba3d6466e384505f2ff89c91f39800d7a0d4623a4711e"
```
## Documentation
@@ -52,41 +52,59 @@ Haddocks (API documentation, etc.) are hosted at
## Performance
The eventual aim is best-in-class performance for pure, highly-auditable
-Haskell code.
+Haskell code. At present we're not quite there.
-Benchmark figures at present:
+Current benchmark figures look like (use `cabal bench` to run the
+benchmark suite):
```
-benchmarking ppad-sha256/SHA256 (32B input)/hash
-time 2.684 μs (2.658 μs .. 2.714 μs)
- 0.999 R² (0.999 R² .. 1.000 R²)
-mean 2.689 μs (2.674 μs .. 2.706 μs)
-std dev 55.18 ns (44.66 ns .. 66.35 ns)
-variance introduced by outliers: 22% (moderately inflated)
-
-benchmarking ppad-sha256/SHA256 (32B input)/hash_lazy
-time 2.746 μs (2.712 μs .. 2.786 μs)
- 0.999 R² (0.998 R² .. 1.000 R²)
-mean 2.747 μs (2.720 μs .. 2.784 μs)
-std dev 101.1 ns (73.17 ns .. 144.1 ns)
-variance introduced by outliers: 49% (moderately inflated)
-
-benchmarking ppad-sha256/HMAC-SHA256 (32B input)/hmac
-time 10.30 μs (10.18 μs .. 10.48 μs)
- 0.997 R² (0.996 R² .. 0.998 R²)
-mean 10.68 μs (10.48 μs .. 10.92 μs)
-std dev 720.5 ns (603.8 ns .. 874.2 ns)
-variance introduced by outliers: 74% (severely inflated)
-
-benchmarking ppad-sha256/HMAC-SHA256 (32B input)/hmac_lazy
-time 10.58 μs (10.36 μs .. 10.85 μs)
- 0.996 R² (0.991 R² .. 0.998 R²)
-mean 10.72 μs (10.56 μs .. 10.93 μs)
-std dev 634.4 ns (523.1 ns .. 868.8 ns)
-variance introduced by outliers: 68% (severely inflated)
+ benchmarking ppad-sha256/SHA256 (32B input)/hash
+ time 2.684 μs (2.658 μs .. 2.714 μs)
+ 0.999 R² (0.999 R² .. 1.000 R²)
+ mean 2.689 μs (2.674 μs .. 2.706 μs)
+ std dev 55.18 ns (44.66 ns .. 66.35 ns)
+ variance introduced by outliers: 22% (moderately inflated)
+
+ benchmarking ppad-sha256/SHA256 (32B input)/hash_lazy
+ time 2.746 μs (2.712 μs .. 2.786 μs)
+ 0.999 R² (0.998 R² .. 1.000 R²)
+ mean 2.747 μs (2.720 μs .. 2.784 μs)
+ std dev 101.1 ns (73.17 ns .. 144.1 ns)
+ variance introduced by outliers: 49% (moderately inflated)
+
+ benchmarking ppad-sha256/HMAC-SHA256 (32B input)/hmac
+ time 10.30 μs (10.18 μs .. 10.48 μs)
+ 0.997 R² (0.996 R² .. 0.998 R²)
+ mean 10.68 μs (10.48 μs .. 10.92 μs)
+ std dev 720.5 ns (603.8 ns .. 874.2 ns)
+ variance introduced by outliers: 74% (severely inflated)
+
+ benchmarking ppad-sha256/HMAC-SHA256 (32B input)/hmac_lazy
+ time 10.58 μs (10.36 μs .. 10.85 μs)
+ 0.996 R² (0.991 R² .. 0.998 R²)
+ mean 10.72 μs (10.56 μs .. 10.93 μs)
+ std dev 634.4 ns (523.1 ns .. 868.8 ns)
+ variance introduced by outliers: 68% (severely inflated)
```
-Use `cabal bench` to run the benchmark suite.
+When testing `hash_lazy` on a 1GB input, we get a profile like the
+following:
+
+```
+ COST CENTRE %time %alloc
+
+ Crypto.Hash.SHA256.block_hash 72.8 4.9
+ Crypto.Hash.SHA256.prepare_schedule 15.9 32.3
+ Crypto.Hash.SHA256.blocks_lazy 3.7 37.2
+ Crypto.Hash.SHA256.parse 3.6 14.7
+ Crypto.Hash.SHA256.hash_alg 2.1 2.9
+ hash 1.3 8.0
+```
+
+As low-hanging fruit, time and allocation can likely be reduced by
+unpacking the strict bytestrings used to represent 512-bit blocks, and
+also by replacing several internal data structures with unboxed tuples,
+extended literals, etc.
## Security