There are some things that are crucial for making stuff happen, like blood in the body or petrol in an engine. You’re going nowhere fast without it.
Similarly, blockchain technology is going nowhere without hashing; it underpins the ultimate function of blockchain, it makes all the good things about blockchain possible. Incidentally, imo those good things are the decentralisation and democratisation of online transactions through the secure, global, digital, peer to peer transfer of value, which (I believe) will ultimately make the world a better place (I think I just blacked out temporarily coming up with that).
So what the heck am I talking about?! As per usual, we’ll take it one step at a time. Let’s get properly sorted on what hashing does and why it’s important. Btw I’m not going to explain how it works, because, quite frankly, I don’t really know …..
What this all boils down to is data and data management. Data is the lifeblood of everything we do because so much of what we do is online, and everything online is composed of data. Data is super important to us because it helps us make better decisions; the more data you’re able to collect about things that have happened in the past, the more likely you’ll be able to make statistically significant predictions about what will happen in the future.
What is data? Well it’s basically strings of 0s and 1s. The smallest piece of data is a bit, and is either a 0 or 1. Imagine a computer as having many light bulbs, and the bulbs are either on (1) or off (0). Different pieces of data are represented by the pattern displayed by the bulbs. Large data, such as videos, use many light bulbs. A short email, would use fewer light bulbs. A single light bulb is a bit. Another term you may have heard of is a byte, which is simply a group of 8 bulbs. A megabyte of data is 1 million bytes, which would be 8 million bulbs.
Imagine you have a grid of 256 lightbulbs, so 16 x 16 (as shown above) and each one can be on or off. That means that the number of potential patterns for that grid of bulbs (either on or off) is 2 ^ 256 which is this number:
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
Quite a lot.
And that represents just 32 bytes (because 8 bits to a byte and it’s 256 bits). Now think that at the moment, we’re buying computers that come with 24 GIGAbytes of memory… So the more powerful your computer, the more light bulbs it has basically. That’s the way you need to think about it.
My point of explaining this is 1) so that you can understand what data is and 2) is to show that one thing that can have only two, very simple, easy to determine states, such as a light bulb being on or off, when put into a grid of 16 x 16 light bulbs, leads to 2 ^ 256 potential different states for that grid. So in theory, if you wanted to have pattern of light bulbs for ‘car’, you could map that on to the grid, and then you could have one for ‘cars’, and one for ‘vans’ and so on. You can represent the complexities of the world by just having different patterns of 256 light bulbs that are either on or off so long as you have a rule for how to map the concepts in the real world (e.g. car) onto the patterns of light bulbs.
Hold that concept in your head….
Figuratively speaking, hashing is an algorithm that can take any data input and create a pattern of light bulbs being on or off. In reality, hashing takes any data input and maps it to a string of 64 letters and numbers. Hashing is the rule that maps the data onto more simplified, uniform, fixed length value. A little like taking the word ‘car’ and saying that that word will be represented by a certain pattern of light bulbs.
So you can put whatever you want into the algorithm, it will churn through the data, chop it up into uniform chunks, and then basically map that data onto a different value, which is represented by a number or a letter.
Let’s see this in real life. Check this out:
Using this hash generator, I can put in any data which will end up producing a unique string of 64 letters and numbers. If i type in
My name is Ed
I get the hash
8B21E9D57DAE2388B5651936823D48FE296248EAF81D3DC938CACE6BAFF99D73
Now if I type in
Hi, my name is Ed
I get a different hash
22E6F5B1CDBB569FD8389ED2F309DC9AF6D5D056A1E1797EB67E688D330EAD1A
And note that even if I type in
My name is ed
I get a totally different hash
8C7E20B644C3B65735F4FF0BF92D0592C8D6C7D10F50D90ABA3AF2C897B1BF18
… just by changing the capitalisation of one letter.
The hash for the entire text of Ulysses is
7db326b4afe8944ceedcfc438d1ccdd1eaa175b73c31559cd218a260fc06f41e
And the hash for the entire text of Ulysses, except for the last word (‘Yes’), is:
8226ba56e4843e8f50bdfb79ad68ec35945e81889ae2d3227da361884766569e
So no matter what the size of the data input, you always end up with a 64 character long string of letters and numbers…
Whilst this might boggle the mind, it’s sort of not that surprising that it’s possible, given that there are 64 letters and numbers in that string, and there are therefore lots of variations for what the letters and numbers can be individually.
One of the questions I’m asked a lot is ‘if blockchain is going to take over the world, and blockchain relies on data, then how is blockchain going to be big enough to manage all of the data in the world?’. It’s a reasonable question, and the answer is in hashing. It doesn’t matter about the size of the input, because the output - which is what is stored in the blockchain - is always the same.
There is another cool property to hashing which is around security. I’ll save the explanation on public and private keys and how they relate to hashing for another day. But focus on this for now:
When you put the play-doh into the machine, you get a certain shape of play-doh coming out the other side. But you can’t put that shape back into the machine and get back exactly what you originally put in; it’s a one way street. It’s the same with hashing - you can put the data in and get a string of numbers and letters. But you can’t put them back in and get the data out.
When it comes to blockchain, this is a really useful concept; for something to be 'trustless’ (which is one of the core benefits of blockchain), it helps if you can transfer some data or make a record of something without the contents being known to the other person, or indeed the contents being known by the entities (nodes) that are securing the chain.
What do I mean by that?
Well let’s say I want to make a bet with you that I know who will win the English Premier league this season. I’m going to make that bet with you today and because it’s quite a tight race in the league right now, you say if I get it right, you’ll give me £50. But I don’t want to tell you who I think will win.
So let’s hash it..
‘On the 30th January 2023, Ed predicts that Newcastle United will win the 22/23 Premier League Season’.
The hash of that is:
A85336C57FFC879919141AD9488AF2763896D15C5635F2A380547B4817C95984
I send you that hash on the 30th January 2023 in an email (this is where time stamping of blocks would come in, but that’s for another day) with the confirmation of the bet of £50 if I get the winner of the league right.
It now gets to the end of the season. Newcastle have won.
I get you onto your computer, and ask you to pull up the email I sent you with the hash. I tell you to put in the text ‘On the 30th January 2023, Ed predicts that Newcastle United will win the 22/23 Premier League Season’ into the hash generator, and out comes the hash. You compare that to the hash I sent you in the email on the 30th January and they match. So you know that on that day, I wrote that sentence. And you owe me £50.
So hashing is super important. It’s a bit of a ‘black box’ algorithm that can take any length of data about anything in the world, and turn it into a uniform set of letters and numbers. Remember that a picture, for example, is just written code that tells a computer whether to have pixels on or off, and in what colour. So the code of a picture can be hashed into just 64 characters….
Hashing has 3 key properties:
It’s one way, so you can’t derive the input from the hash value
It’s deterministic, so you always get the same hash value for the same input
It’s fixed length, such that the length of the input doesn’t affect the length of the hash
There you go… hopefully I didn’t make a hash of that. Lol.