IMPL1.md (5331B)
1 # IMPL1 - Core Types, Serialisation, and TxId 2 3 ## Goal 4 5 Implement core transaction types, binary serialisation (legacy and segwit 6 formats), and txid computation. 7 8 ## Scope 9 10 - `Bitcoin.Prim.Tx` module: types and serialisation 11 - CompactSize (varint) encoding/decoding 12 - Legacy and segwit tx formats 13 - TxId computation via double SHA256 14 15 ## Types 16 17 Types are already defined in skeleton. Key points: 18 19 - `TxId`: 32-byte ByteString (stored as-is, displayed reversed per convention) 20 - `OutPoint`: TxId + Word32 vout 21 - `TxIn`: OutPoint + scriptSig + sequence 22 - `TxOut`: Word64 value + scriptPubKey 23 - `Witness`: list of stack items (ByteStrings) 24 - `Tx`: version + inputs + outputs + witnesses + locktime 25 26 ## CompactSize Encoding 27 28 Internal helpers for Bitcoin's variable-length integer format: 29 30 ```haskell 31 -- | Encode a Word64 as compactSize. 32 put_compact :: Word64 -> BS.ByteString 33 34 -- | Decode compactSize, returning (value, bytes_consumed). 35 get_compact :: BS.ByteString -> Maybe (Word64, Int) 36 ``` 37 38 Encoding rules: 39 - 0x00-0xfc: 1 byte (value itself) 40 - 0xfd-0xffff: 0xfd ++ 2 bytes LE 41 - 0x10000-0xffffffff: 0xfe ++ 4 bytes LE 42 - larger: 0xff ++ 8 bytes LE 43 44 ## Serialisation Implementation 45 46 ### Encoding (to_bytes) 47 48 Build output via `Data.ByteString.Builder` or direct unsafe writes: 49 50 ``` 51 to_bytes tx: 52 if has_witnesses tx: 53 put_word32_le version 54 put_byte 0x00 -- marker 55 put_byte 0x01 -- flag 56 put_compact (length inputs) 57 for each input: put_txin 58 put_compact (length outputs) 59 for each output: put_txout 60 for each witness: put_witness 61 put_word32_le locktime 62 else: 63 put_word32_le version 64 put_compact (length inputs) 65 for each input: put_txin 66 put_compact (length outputs) 67 for each output: put_txout 68 put_word32_le locktime 69 ``` 70 71 Component encoders: 72 ```haskell 73 put_txin :: TxIn -> Builder 74 -- outpoint (32 + 4 bytes) + scriptSig (compact + bytes) + sequence (4) 75 76 put_txout :: TxOut -> Builder 77 -- value (8 bytes LE) + scriptPubKey (compact + bytes) 78 79 put_witness :: Witness -> Builder 80 -- compact count + for each item: compact len + bytes 81 ``` 82 83 ### Decoding (from_bytes) 84 85 Parse with explicit offset tracking or a simple parser state: 86 87 ``` 88 from_bytes bs: 89 version <- get_word32_le 90 peek next byte: 91 if 0x00 and following byte is 0x01: 92 skip marker/flag 93 parse as segwit 94 else: 95 parse as legacy 96 97 -- segwit parse: 98 input_count <- get_compact 99 inputs <- replicateM input_count get_txin 100 output_count <- get_compact 101 outputs <- replicateM output_count get_txout 102 witnesses <- replicateM input_count get_witness 103 locktime <- get_word32_le 104 105 -- legacy parse: 106 input_count <- get_compact 107 inputs <- replicateM input_count get_txin 108 output_count <- get_compact 109 outputs <- replicateM output_count get_txout 110 locktime <- get_word32_le 111 witnesses = [] 112 ``` 113 114 Component decoders: 115 ```haskell 116 get_txin :: Parser TxIn 117 get_txout :: Parser TxOut 118 get_witness :: Parser Witness 119 ``` 120 121 ### Legacy Serialisation 122 123 ```haskell 124 to_bytes_legacy :: Tx -> BS.ByteString 125 -- Always legacy format (no marker/flag/witnesses) 126 -- Used for txid computation 127 ``` 128 129 ## TxId Computation 130 131 ```haskell 132 txid :: Tx -> TxId 133 txid tx = TxId (SHA256.hash (SHA256.hash (to_bytes_legacy tx))) 134 ``` 135 136 The result is the raw 32-byte hash. Display convention (reversed hex) is 137 separate from storage. 138 139 ## Internal Helpers 140 141 Little-endian word encoding/decoding: 142 143 ```haskell 144 put_word32_le :: Word32 -> Builder 145 put_word64_le :: Word64 -> Builder 146 get_word32_le :: BS.ByteString -> Int -> Maybe Word32 147 get_word64_le :: BS.ByteString -> Int -> Maybe Word64 148 ``` 149 150 Use `Data.Bits` shifts or `Foreign.Storable` with explicit byte order. 151 152 ## Work Items 153 154 ### Phase 1: Encoding (independent) 155 156 1. Implement `put_compact` (compactSize encoding) 157 2. Implement `put_word32_le`, `put_word64_le` 158 3. Implement `put_txin`, `put_txout`, `put_witness` 159 4. Implement `to_bytes` and `to_bytes_legacy` 160 161 ### Phase 2: Decoding (independent of Phase 1) 162 163 1. Implement `get_compact` (compactSize decoding) 164 2. Implement `get_word32_le`, `get_word64_le` 165 3. Implement `get_txin`, `get_txout`, `get_witness` 166 4. Implement `from_bytes` with format detection 167 168 ### Phase 3: TxId (depends on Phase 1) 169 170 1. Implement `txid` using ppad-sha256 171 172 ### Phase 4: Base16 wrappers 173 174 1. `to_base16` wraps `to_bytes` with B16.encode 175 2. `from_base16` decodes hex then calls `from_bytes` 176 177 ## Tests 178 179 - Round-trip: `from_bytes (to_bytes tx) == Just tx` 180 - Known vectors: parse real Bitcoin transactions, verify txid 181 - Edge cases: empty inputs/outputs, max-size compactSize values 182 - Legacy vs segwit format detection 183 184 ## Test Vectors 185 186 ### Simple legacy tx (1 input, 1 output) 187 188 Use a known mainnet transaction, e.g., the pizza transaction or a 189 simple testnet tx with known txid. 190 191 ### Segwit tx (P2WPKH) 192 193 Parse a native segwit transaction, verify witnesses preserved, verify 194 txid matches (should exclude witnesses). 195 196 ### Sources 197 198 - BIP143 test vectors (have full tx hex + expected sighash) 199 - Bitcoin Core tx_valid.json 200 - Manually hex-dump transactions from block explorers 201 202 ## Notes 203 204 - All integers are little-endian except where noted 205 - TxId is stored in natural byte order (not display order) 206 - Witnesses list length must equal inputs list length for segwit 207 - Empty witness list indicates legacy transaction 208 - CompactSize must use minimal encoding (enforced on decode)