tx

Minimal transaction primitives (docs.ppad.tech/tx).
git clone git://git.ppad.tech/tx.git
Log | Files | Refs | README | LICENSE

IMPL1.md (5331B)


      1 # IMPL1 - Core Types, Serialisation, and TxId
      2 
      3 ## Goal
      4 
      5 Implement core transaction types, binary serialisation (legacy and segwit
      6 formats), and txid computation.
      7 
      8 ## Scope
      9 
     10 - `Bitcoin.Prim.Tx` module: types and serialisation
     11 - CompactSize (varint) encoding/decoding
     12 - Legacy and segwit tx formats
     13 - TxId computation via double SHA256
     14 
     15 ## Types
     16 
     17 Types are already defined in skeleton. Key points:
     18 
     19 - `TxId`: 32-byte ByteString (stored as-is, displayed reversed per convention)
     20 - `OutPoint`: TxId + Word32 vout
     21 - `TxIn`: OutPoint + scriptSig + sequence
     22 - `TxOut`: Word64 value + scriptPubKey
     23 - `Witness`: list of stack items (ByteStrings)
     24 - `Tx`: version + inputs + outputs + witnesses + locktime
     25 
     26 ## CompactSize Encoding
     27 
     28 Internal helpers for Bitcoin's variable-length integer format:
     29 
     30 ```haskell
     31 -- | Encode a Word64 as compactSize.
     32 put_compact :: Word64 -> BS.ByteString
     33 
     34 -- | Decode compactSize, returning (value, bytes_consumed).
     35 get_compact :: BS.ByteString -> Maybe (Word64, Int)
     36 ```
     37 
     38 Encoding rules:
     39 - 0x00-0xfc: 1 byte (value itself)
     40 - 0xfd-0xffff: 0xfd ++ 2 bytes LE
     41 - 0x10000-0xffffffff: 0xfe ++ 4 bytes LE
     42 - larger: 0xff ++ 8 bytes LE
     43 
     44 ## Serialisation Implementation
     45 
     46 ### Encoding (to_bytes)
     47 
     48 Build output via `Data.ByteString.Builder` or direct unsafe writes:
     49 
     50 ```
     51 to_bytes tx:
     52   if has_witnesses tx:
     53     put_word32_le version
     54     put_byte 0x00  -- marker
     55     put_byte 0x01  -- flag
     56     put_compact (length inputs)
     57     for each input: put_txin
     58     put_compact (length outputs)
     59     for each output: put_txout
     60     for each witness: put_witness
     61     put_word32_le locktime
     62   else:
     63     put_word32_le version
     64     put_compact (length inputs)
     65     for each input: put_txin
     66     put_compact (length outputs)
     67     for each output: put_txout
     68     put_word32_le locktime
     69 ```
     70 
     71 Component encoders:
     72 ```haskell
     73 put_txin :: TxIn -> Builder
     74   -- outpoint (32 + 4 bytes) + scriptSig (compact + bytes) + sequence (4)
     75 
     76 put_txout :: TxOut -> Builder
     77   -- value (8 bytes LE) + scriptPubKey (compact + bytes)
     78 
     79 put_witness :: Witness -> Builder
     80   -- compact count + for each item: compact len + bytes
     81 ```
     82 
     83 ### Decoding (from_bytes)
     84 
     85 Parse with explicit offset tracking or a simple parser state:
     86 
     87 ```
     88 from_bytes bs:
     89   version <- get_word32_le
     90   peek next byte:
     91     if 0x00 and following byte is 0x01:
     92       skip marker/flag
     93       parse as segwit
     94     else:
     95       parse as legacy
     96 
     97   -- segwit parse:
     98   input_count <- get_compact
     99   inputs <- replicateM input_count get_txin
    100   output_count <- get_compact
    101   outputs <- replicateM output_count get_txout
    102   witnesses <- replicateM input_count get_witness
    103   locktime <- get_word32_le
    104 
    105   -- legacy parse:
    106   input_count <- get_compact
    107   inputs <- replicateM input_count get_txin
    108   output_count <- get_compact
    109   outputs <- replicateM output_count get_txout
    110   locktime <- get_word32_le
    111   witnesses = []
    112 ```
    113 
    114 Component decoders:
    115 ```haskell
    116 get_txin :: Parser TxIn
    117 get_txout :: Parser TxOut
    118 get_witness :: Parser Witness
    119 ```
    120 
    121 ### Legacy Serialisation
    122 
    123 ```haskell
    124 to_bytes_legacy :: Tx -> BS.ByteString
    125   -- Always legacy format (no marker/flag/witnesses)
    126   -- Used for txid computation
    127 ```
    128 
    129 ## TxId Computation
    130 
    131 ```haskell
    132 txid :: Tx -> TxId
    133 txid tx = TxId (SHA256.hash (SHA256.hash (to_bytes_legacy tx)))
    134 ```
    135 
    136 The result is the raw 32-byte hash. Display convention (reversed hex) is
    137 separate from storage.
    138 
    139 ## Internal Helpers
    140 
    141 Little-endian word encoding/decoding:
    142 
    143 ```haskell
    144 put_word32_le :: Word32 -> Builder
    145 put_word64_le :: Word64 -> Builder
    146 get_word32_le :: BS.ByteString -> Int -> Maybe Word32
    147 get_word64_le :: BS.ByteString -> Int -> Maybe Word64
    148 ```
    149 
    150 Use `Data.Bits` shifts or `Foreign.Storable` with explicit byte order.
    151 
    152 ## Work Items
    153 
    154 ### Phase 1: Encoding (independent)
    155 
    156 1. Implement `put_compact` (compactSize encoding)
    157 2. Implement `put_word32_le`, `put_word64_le`
    158 3. Implement `put_txin`, `put_txout`, `put_witness`
    159 4. Implement `to_bytes` and `to_bytes_legacy`
    160 
    161 ### Phase 2: Decoding (independent of Phase 1)
    162 
    163 1. Implement `get_compact` (compactSize decoding)
    164 2. Implement `get_word32_le`, `get_word64_le`
    165 3. Implement `get_txin`, `get_txout`, `get_witness`
    166 4. Implement `from_bytes` with format detection
    167 
    168 ### Phase 3: TxId (depends on Phase 1)
    169 
    170 1. Implement `txid` using ppad-sha256
    171 
    172 ### Phase 4: Base16 wrappers
    173 
    174 1. `to_base16` wraps `to_bytes` with B16.encode
    175 2. `from_base16` decodes hex then calls `from_bytes`
    176 
    177 ## Tests
    178 
    179 - Round-trip: `from_bytes (to_bytes tx) == Just tx`
    180 - Known vectors: parse real Bitcoin transactions, verify txid
    181 - Edge cases: empty inputs/outputs, max-size compactSize values
    182 - Legacy vs segwit format detection
    183 
    184 ## Test Vectors
    185 
    186 ### Simple legacy tx (1 input, 1 output)
    187 
    188 Use a known mainnet transaction, e.g., the pizza transaction or a
    189 simple testnet tx with known txid.
    190 
    191 ### Segwit tx (P2WPKH)
    192 
    193 Parse a native segwit transaction, verify witnesses preserved, verify
    194 txid matches (should exclude witnesses).
    195 
    196 ### Sources
    197 
    198 - BIP143 test vectors (have full tx hex + expected sighash)
    199 - Bitcoin Core tx_valid.json
    200 - Manually hex-dump transactions from block explorers
    201 
    202 ## Notes
    203 
    204 - All integers are little-endian except where noted
    205 - TxId is stored in natural byte order (not display order)
    206 - Witnesses list length must equal inputs list length for segwit
    207 - Empty witness list indicates legacy transaction
    208 - CompactSize must use minimal encoding (enforced on decode)