layout: specification
title: 2019-NOV-15 minimal push and minimal number encoding rules
date: 2019-08-11
category: spec
activation: 1573819200
version: 1.0
author: Mark B. Lundeberg
In the November 2019 upgrade, two new consensus rules are introduced to Bitcoin Cash:
Third-party malleation is when anyone, such as an uninvolved miner, is able to modify parts of a transaction while keeping it valid, yet changing the transaction identifier.
The validity of child transactions is contingent on having the correct transaction identifier for the parent, and so third-party malleability threatens to invalidate chains of transactions, whether they are held in secret, in mempool, or even already confirmed (i.e., during blockchain reorganization).
A variety of past consensus rule changes have tried to address third-party malleability vectors: BIP66 strict ECDSA encoding, the ECDSA low-S rule, the strict encoding rule for hashtype, the scriptSig push-only rule, and the cleanstack rule.
This effort is incomplete, as there remains a significant malleability vector that means that currently, all transactions on BCH are still third-party malleable:
{0x81}
can be equivalently pushed using any of the following five script phrases (in hex): 4f
, 0181
, 4c0181
, 4d010081
, 4e0100000081
.For some transactions, an additional malleability mechanism is also present:
{0x81}
, {0x01, 0x80}
, {0x01, 0x00, 0x80}
, {0x01, 0x00, 0x00, 0x80}
.For years now, the “MINIMALDATA” flag, which restricts both of the aforementioned malleability vectors, has been active at the mempool layer of most nodes but not at the consensus layer.
The upgrade converts the existing MINIMALDATA rules to consensus.
For reference, this document contains a full specification of these rules.
It is of course impossible to completely remove third-party malleability in bitcoin (not even using techniques like SegWit) since a transaction can be made that involves no signature or where the signing key is not a secret, or where permutations are permitted (e.g., SINGLE|ANYONECANPAY).
We can, however, remove it for large classes of transactions, and this has been the goal of the past upgrades.
Bringing MINIMALDATA to the consensus layer, along with the dummy element restrictions in the OP_CHECKMULTISIG upgrade, finally achieves the goal of removing third-party malleability from the vast majority of transactions performed on BCH.
Push opcodes — Bitcoin’s Script system is a stack-based language.
The stack elements are simply byte arrays of length 0 to 520.
Push opcodes append a byte array onto the stack, and there are a variety of different opcodes for pushing arbitrary data of various lengths, or pushing specific one-byte arrays:
{0x81}
.{0x01}
.{0x02}
.{0x0f}
.{0x10}
.It can be seen from the above list that any given byte array can be pushed in a variety of ways.
However, for any given byte array there is a unique shortest possible way to push the byte array.
Number representation — Although bitcoin’s stack is just a sequence of byte arrays, there are numerous Script opcodes that expect to take integers from the stack, which means they decode the byte array to an integer before logically using the integer.
The way Script represents numbers as byte arrays is using a variable-length, little-endian sign-and-magnitude representation.
This is typical for a multiprecision or ‘bignum’ arithmetic computing environment, but may be unfamiliar for programmers who are used to ‘bare-metal’ integer computing that uses fixed-width two’s complement (or rarely, ones’ complement) representation.
Currently, the only consensus restriction is that the byte arrays used during number decoding shall be at most 4 bytes in length (except for four special cases, noted in the specification below).
This restricts the range of numbers to be [-231 + 1 … 2^31 - 1] (inclusive), but does not pose any further restrictions on encoding.
So, there are various ways to encode a given number as a stack element by padding the number with excess groups of zero bits just before the sign bit.
For example, the number -6844 can be represented in three valid ways: {0xbc, 0x9a}
, {0xbc, 0x1a, 0x80}
, {0xbc, 0x1a, 0x00, 0x80}
.
The number 39612 can be represented as {0xbc, 0x9a, 0x00}
or {0xbc, 0x9a, 0x00, 0x00}
.
The number 0 has nine valid representations.
While all opcodes that output numbers will minimally encode said output, at the current time they are happy to accept any representation for a numeric input.
For any given number, there is exactly one minimal (shortest) representation.
A simple test can be applied to a byte array to see whether it is the minimal encoding of the corresponding number:
Note that bitcoin’s number system treats “negative 0” encodings such as {0x80}
, {0x00, 0x80}
, etc.
as a representation of 0, and the minimal encoding of 0 is an empty byte array: {}
.
The above rules indicate that neither {0x80}
nor {0x00}
are minimal encodings.
Though conventionally appearing under one flag “MINIMALDATA”, there are two unrelated rules that do not interact. The specifications have been accordingly split into two sections.
Upon the execution of a push opcode (be it during scriptSig, scriptPubKey, or P2SH redeemScript execution), the data pushed on stack shall be examined in order to decide if the just-executed push opcode was minimal:
{}
must be pushed using OP_0.{0x81}
must be pushed using OP_1NEGATE{0x01}
must be pushed using OP_1{0x02}
must be pushed using OP_2{0x0f}
must be pushed using OP_15{0x10}
must be pushed using OP_16In practice, PUSHDATA2 can only push lengths up to 520, but in case script is upgraded one day, the limit for PUSHDATA2 remains at 65535.
Since the above rules cover all possible stack element lengths, this means that PUSHDATA4 cannot appear in executed parts of scripts (it must still, however, be parsed correctly in an unexecuted branch).
It is worth emphasizing that the above rules only apply at the moment when push opcodes are actually executed, i.e., when data is actually being placed onto the stack.
Thus:
Most opcodes that take numbers from the stack shall require the stack element to be a minimally encoded representation.
To be specific, these operands must be minimally encoded numbers:
However, four opcodes are special in the numeric inputs they accept:
The following opcodes notably do not appear in the above lists since they do not decode their inputs as numbers, and thus they have no minimal number encoding rules: OP_IF, OP_NOTIF, OP_VERIFY, OP_IFDUP, OP_AND, OP_OR, OP_XOR.
To prevent push malleability, it is only necessary to restrict the scriptSig.
The push forms used during scriptPubKey and P2SH redeemScript execution cannot be malleated, since they are committed by hashing into the prior transaction’s identifier.
Thus it may seem like ‘overkill’ to restrict these as well.
Despite this, the MINIMALDATA standardness rule has applied these restrictions to scriptPubKey and redeemScript for quite a while now, and it does not appear to be causing an issue.
In addition, it is technically cleaner in some ways, if the same script interpretation rules can be applied to all executing scripts.
By far, the most common usage of numbers is in OP_CHECKMULTISIG where they are provided in the locking script and cannot be malleated.
Only rare smart contracts take numbers from the scriptSig, and in fact, smart contracts that require minimal number encoding could easily enforce this themselves, by using tricks such as OP_DUP OP_DUP OP_0 OP_ADD OP_EQUALVERIFY
(taking advantage of the fact that adding 0 to a number returns its minimal encoding), or more recently: OP_DUP OP_DUP OP_BIN2NUM OP_EQUALVERIFY
.
However, the number encoding rule has been standard for quite some time, and adopting it now should cause no issue.
It also makes it so that smart contract authors can save their limited opcodes for more valuable tasks, and need not use such tricks.
Four opcodes interpret their input as a boolean without any restriction: OP_IF, OP_NOTIF, OP_VERIFY, OP_IFDUP.
Any byte array of any length that is all zeros, or that is all zeros besides a final byte of 0x80, is interpreted as ‘false’, and any other byte array is interpreted as ‘true’.
The script interpreter also accepts such unrestricted boolean representations for the final stack value used to determine pass/fail of a script.
Two additional ‘boolean’ opcodes (OP_BOOLAND, OP_BOOLOR) have a semi-restricted input, as they interpret their inputs as numbers.
These must be at most 4 bytes long, and as mentioned above they will be restricted according to the number encoding rules.
However, while there will be only one valid representation for ‘false’ (the number 0, i.e., {}
), any nonzero number can be used as ‘true’.
In theory, we could restrict all of these boolean-expecting operations to accept only {}
for ‘false’, and {0x01}
for ‘true’; this would be analogous to the number encoding restrictions.
However, no such standardness rule exists at this time so it would be too sudden to impose any hard rule for this upgrade.
Also, it is easier for scripts to avoid malleable boolean inputs without having to use up additional opcodes, as demonstrated by the following example.
Among smart contracts, it is common to see a construction of a form like OP_IF pubkey_A OP_CHECKSIGVERIFY <clause 1 conditions> OP_ELSE pubkey_B OP_CHECKSIGVERIFY <clause 2 conditions> OP_ENDIF
.
Transactions spending such smart contracts will remain malleable, since the input to OP_IF comes from scriptSig.
However, it is easy for script programmers to tweak such smart contracts to a non-malleable form: pubkey_A OP_CHECKSIG OP_IF <clause 1 conditions> OP_ELSE pubkey_B OP_CHECKSIGVERIFY <clause 2 conditions> OP_ENDIF
.
This takes advantage of the fact that OP_CHECKSIG simply returns false if the provided signature is not valid.
Due to the already-adopted NULLFAIL rule, {}
is the only permitted invalid signature, and cannot be malleated.
Thanks to Antony Zegers and Amaury Sechet for valuable feedback.