What is a Merkle Tree?

Background on Merkle Tree data structures and their importance for proving data integrity

Definition

A Merkle Tree is a standard tree data structure but instead of each leaf representing a piece of arbitrary data, each leaf is represented by a hash. Furthermore the hash of each parent is generated by hashing together the children. This pattern goes all the way to the root of the tree which is called a Merkle Root.

Use

Merkle Trees ensure data integrity and make it easy to determine whether or not a certain piece of data exists within a tree. Due to the nature of hash functions a pre-image can vary by one character or bit and result in a vastly different hash. Since all parent leaves in a Merkle Tree are computed from the hashes of the children the tiniest change in data result in entirely different hashes over the entire height of the tree and the resultant root will be significantly different. This property makes Merkle trees ideal for data integrity because any form of tampering with the underlying values would immediately become obvious.

Trading storage for computation

Merkle Trees provide a succinct representation of a set of data without actually requiring the data be stored in the tree. Consequentially it is possible to represent a massive amount of data in a tree without needing to bear the cost storing said data. If underlying data comes into question then once need only recompute the Merkle Tree to determine whether or not a particular value is stored.

Prevalence in Blockchains

Blockchains are notorious for how inefficient they are at storing and managing large sets of data. Each new block requires confirmation that it is canonically valid when being added to the chain's state. If Merkle Trees were not present in blockchain then every single node running the chain would need a copy of every single transaction which would be enormous.

Last updated