What Is a Bitcoin Transaction?
When talking about an existing banking system, a transaction inside a bank simply means editing the balance sheet where the number decreases in front of one name and increases opposite to another name. In the case of interbank transfers, third-party organizations get involved. For instance, a SWIFT, but everything works in pretty much the same way.
When we are dealing with a financial system based on the blockchain, the process of money transfer looks completely different. There’s no general sheet in Bitcoin of the <address, balance> form. It also has no control that would edit this sheet. In this article we will show what a transaction in Bitcoin is, how it is built, and we will explain why Bitcoin has added a programming language of its own that everyone has heard of but nobody has seen.
Introduction
As we have said above, Bitcoin has no single structure, in which every address would correspond to its current balance. The blockchain is used instead. That is, all transactions are stored. For simplicity, you can assume for now that these are messages like
<address 1> sent <amount> BTC to <address 2>
So, searching the entire blockchain, you can calculate how many coins "belong" to a particular address.
Inputs & Outputs
The actual transaction in the Bitcoin network is, in fact, a little more complicated than the one described above. In reality, it’s a bulky structure, the main components of which are inputs and outputs.
Inputs are transactions you “refer to." Let's imagine that three transactions have been sent to your X address:
- TXN_ID — 123456, VALUE — 40 BTC
- TXN_ID — 6453795, VALUE — 10 BTC
- TXN_ID — 888888, VALUE — 100 BTC
If you need to spend, say, 45 BTC, you can refer to transaction 888888, or to two transactions at a time: 123456 and 6453795. If necessary, you can refer even to all three transactions, though it's unclear why you would need to.
Outputs are literally “outputs.” For now, you can assume that these are addresses (although it’s not true) the funds will be "sent" to as a result of the transaction. There can also be several outputs, and each of them has its own amount.
The picture below shows the creation of a new transaction C that refers to the two outputs A and B. As a result, there are 0.008 BTC at the input of the transaction that will be further divided between two outputs. 0.003 BTC will be sent to the first address, and 0.004 BTC to the second one.
The ability to specify several outputs at once is a very important feature because the transaction (or more precisely its output) can be used as an input only once and only as a whole. That is, if you have an incoming transaction for 10 BTC and you need to spend 8 of them in the Starbucks, you simply create a transaction with one input and two outputs: 8 BTC for the store and 2 BTC back to your address. If you create a transaction with the sum of the outputs less than the sum of the inputs (as in the picture), the difference will be sent to the address of the miner who wrote your transaction to the block.
Fee
It is this difference between the sum of the inputs and the sum of the outputs that is called the transaction fee. It’s the second most important source of income for miners and the time for inclusion of the transaction in the blockchain depends on it. This is due to the fact that each miner has a pool of unverified transactions that claim to hit the block. As a rule, the miner simply sorts them in descending order of the fee, maximizing his profits by doing so. Therefore, the larger the fee, the higher you will be in the queue and the sooner your payment will go through.
In the link here you can see a miner who has received a transaction with a fee of 135.000$.
UTXO
As soon as a new transaction enters the block, its outputs can be used as inputs. There’s a special term for such so-far-unspent outputs—UTXO (unspent transaction output). As we have said, each output can be used as an input only once. Therefore, unspent outputs are of interest in practice. The already used ones are stored as a tribute to the security of the system.
By the way, people usually consider UTXO as an entire array of unspent outputs. But educated young people should write UTXO pool or at least UTXO set.
Referring back to the beginning of the article, it should be clear now that to calculate the balance of an address, you don’t need to search the entire blockchain. It’s enough to search the UTXO pool, which is obviously faster.
Txn Structure
The general form of the transaction is described in the official specification of the protocol. Meanwhile, I’m providing a living example taken from Ken Shirriff’s blog.
For some mysterious reason, the value and the previous output hash should be represented in the little endian form. That is, in our case, the hash of the transaction we refer to is actually equal to 81 b4 c8 32... . Although, it’s written as ...32 c8 b4 81 in the transaction. Exactly the same way, the amount of the transaction is equal to 00091234 BTC or 0x016462 in hex . However, it’s written in the protocol as 62 64 01 00 00 00 00 00.
By the way, it’s really simple to calculate the hash of the transaction. You simply take the entire transaction in the form of a sequence of bytes (in the example above, the result is a string of the form 010000000148 .... 00), calculate the SHA-256 hash from it twice and represent it in the little endian form.
previous output index — we refer not to the transaction, the hash of which is specified in the previous output hash, but to one of its outputs. In this parameter, we indicate which output is of interest to us; the numbering starts from zero. I’m going to mention a “reference to the transaction” quite often in this article but it’s just to use a more convenient term.
block lock time — this parameter is rarely used in practice. If it is not equal to 0 and less than 500 million, it’s the number of the block, starting from which we can use this transaction as an input. Whereas blocks occur every 10 minutes, it's easy to estimate the time when the transaction will "open."
If the lock time is more than 500 million, it means the UNIX timestamp, starting from which the transaction will become available. There’s 0 in our case, that is, the transaction is available immediately.
sequence — this feature is no longer used but you can read about it here.
Parameters with the word script in the name are much more complicated, we’ll discuss them below.
Script
You must have heard that there is a mechanism in the Bitcoin network based on crypto-resistant algorithms + a private / public key pair that allows creating a system in which only the owner of a private key can use coins associated with the address obtained from this key. We will show you now how it’s implemented "under the hood."
To begin with, Bitcoin has its own programming language called Script. Here's what Bitcoin wiki writes about it:
Bitcoin uses a scripting system for transactions. Forth-like, Script is simple, stack-based, and processed from left to right. It is purposefully not Turing-complete, with no loops.
The thing is that the language is as simple as ABC. It’s stack-based and Turing-incomplete. Here is an example of a typical program:
1 OP_DUP OP_DUP 5 OP_HASH160
Each instruction is called opcode. There are about 80 of them, so the language is definitely rather primitive. The picture below shows the process of executing the program 2 3 OP_ADD 5 OP_EQUAL
:
Lock & Unlock Transaction
We’ll return to the language a bit later, but first let’s describe why we need it.
To do this, let’s recall the structure of the transaction and two parameters: scriptSig and scriptPubKey. Unlike other parameters, the purpose of these two is not obvious at all, and IMHO this is the most difficult concept in Bitcoin.
I've seen a lot of attempts to explain (usually unsuccessfully) what the Bitcoin scripts are and how to understand them on an intuitive level. Nevertheless, I’ll take the risk and try to draw another analogy. To do this, let's consider a will, something like this:
$1,000,000 will pass to Alice only after she turns 18.
In this case, the text of the will is the condition on which you can use the money (you can use the $1,000,000 transaction as an input), and a photocopy of the passport at the age of 19 is the proof that the condition is met and it's time to get the money.
In order to set a condition, under which you can spend the output, and to be able to confirm that the condition has been met, you need the SCRIPT, private / public keys and other complications.
In the case of Bitcoin, the will is the locking script specified in the transaction inside the field. It is also often called scriptPubKey due to the fact that it is most often a program that contains a public key or address. Although in general it may have nothing to do with cryptography.
A kind of a "proof" of the fact that the condition from the locking script has been fulfilled is called the unlocking script. We write it in the field of the signature script, and it is often called scriptSig, guess why.
The mechanism for verifying the script for validity is very simple. You need to connect the unlocking script + locking script and run the resulting program as a whole. If there is TRUE above the stack after the execution, the transaction is valid, and it’s invalid in any other case.
Multiplication-based Script
Most likely, you haven’t understood any of this, so let’s try to write a simple script to finally figure it out. The idea is to block money with the help of a number, for example 370. The locking script will look like OP_MUL 370 OP_EQUAL
. In order to unlock the transaction, you will need to specify two numbers which result in 370 when multiplied.
We’re going to use an online cheatsheet to run and debug Bitcoin scripts. In the unlocking script, we’re going to write, say, 10 37. Let’s check it:
Pay to Public Key Hash (P2PKH)
P2PKH is probably used in 99 transactions out of 100, so it's worth understanding how it works. Here’s how it looks in general:
This script has been known since the first appearance of Bitcoin and, perhaps, was invented by Satoshi himself. It performs the task that I’ve mentioned above: make sure that only the owner of the private key can use the coins associated with the address obtained from this key.
In lay terms, it looks like this: let your friend B own a private key P. He gets the public key K from it, the address A and notifies you of the address. Then you send 1 BTC to address A and write something like the following in the locking script field:
This transaction can be spent only by the owner of the private key for address A. By way of evidence, you will write the following two things to the unlocking script. The first one is the public key K and the second one is the signature of your transaction by the private key P.
When B decides to use your transaction as an input, he will create one of his own, say, for 0.5 BTC. Then he will insert the signature of his transaction and the public key K — <PubK>
into the unlocking script by his private key P — <sig>
.
- The transaction signature is added to the stack
- The public key is added to the stack
OP_DUP
takes the top element of the stack and duplicates it. Now there are two public keys in the stackOP_HASH160
replaces the top element of the stack with its hashRIPEMD160(SHA256(x))
- The same hash of the public key is added to the stack but it’s already calculated by the sender of the transaction. If you’ve read carefully the part about Cryptography, it should be obvious for you that
RIPEMD160(SHA256(public_key))
and the address is basically the same thing. OP_EQUALVERIFY
removes the top two elements of the stack, and if they are not equal, the execution of the program is interrupted with an errorOP_CHECKSIG
checks the signature for the correspondence to the transaction. If all is correct, it removes the signature, deletes the public key and adds TRUE
P2P Storage
One of the most interesting features of Bitcoin, as well as the blockchain technology in general, is the invariability and hypothetical "eternity" of everything that gets there. It’s no surprise that there were people who wanted to use it for their own purposes. The first thing that came to their mind was to try to store any third-party data in the blockchain and get a P2P dropbox.
You already know how to do this. We take the string Make America Great Again
and simply write it to the locking script. It will still be quite a correct script but the thing is that we can’t come up with an unlocking script to unlock the funds. However, if you send, say, 0.0000001 BTC to the output with such script, you could just let it go. The only limitation is the size of your transaction. Consider that it cannot be more than 100 KB. Although, since it’s a bit more complicated in real life, you can read about it here.
It goes without saying that we don’t really like this. We all know that Bitcoin has big problems with scalability, and now the blockchain, being already big enough, gets clogged with unnecessary data. Moreover, we all remember that such transactions cannot be spent. Thus, they will always be in the UTXO pool, which is not any better.
To reach a compromise, OP_RETURN has been added. It allows “legal” storage of up to 40 bytes of data in the blockchain.
OP_RETURN is a script opcode used to mark a transaction output as invalid. Since the data after OP_RETURN are irrelevant to Bitcoin payments, arbitrary data can be added into the output after an OP_RETURN — Bitcoin wiki
The simplest locking script with it looks like this: OP_RETURN <40 bytes data>. The output with such script acquires the provably unspendable status. It doesn’t even get to the UTXO pool, and thus saves precious space. Other reasons to use OP_RETURN <data>
instead of <data>
are mentioned here.