Wednesday, 25 July 2018

hexadecimal - Why do we use hex so much, when there are enough letters to use base 32 instead?



You can have 0-255 in hex, stored in 2 characters, so it kind of compresses the data, and is used for all sorts of things including colour, IP and MAC addresses.


My question is why did they stop at 16 bit (or why is that most commonly used)? There are enough letters in the alphabet for 32 bit, which would give a range of 0-65536 contained within the same amount of space, potentially allowing for 280 trillion colours as opposed to just 16 million. If you make the letters case sensitive and add two symbols, you could go to 64 bit, allowing up to 4.3 billion values to be represented by the two characters.



Some examples of situations I think this would work:


IPv4 is running out. I know v6 is being rolled in, but it's very long and will be hard to remember. Take the 192.168.0.1 address, it can also be stored as C0.A8.0.1. Using 64 bit hex but still keeping it to a maximum of 8 characters, you can have 280 trillion combinations instead of 4 billion, and we wouldn't have this problem.


As mentioned above, it also provides a much larger range of colours. The RAW photo format records at 32 bits per colour channel instead of 8, with the downside of a huge increase to the file size. If the RGB values were stored as hex, there should be no change in the size of the file as you increase the range of colours, as it would still be stored within 6 bits per pixel, but with a higher base number. Instead, it's recorded as numerical values at 96 bits per pixel, which is a very unnecessary increase of 1600%, leaving photos at over 20MB (and according to an online calculator, 4K RAW video at 32 bits of colour could go up to 2.5GB per second).



This part isn't really to do with the question, but I wrote a script a while back which can convert the numbers to different base values, ranging from binary to base 88 (ran out of symbols after that), which shows it's easily possible to implement similar things. As an example, here's the output from 66000.
Base 2: 11111111111110000
Base 16: 101D0
Base 32: 20EG
Base 64: G7G
The code is here if anyone is interested, it still has a few bugs though and I've only tried it from within Maya. A bit off topic, but I've also just noticed that normal hex seems to be around 20% fewer bits than the original number, and base 88 is almost a 50% reduction.



One final question: Has anyone attempted my idea of storing photos as hex? Would it potentially work if you used 64 bit hex, and stored the photos with data like [64;1920;Bgh54D;NgDFF4;...]? If not, I might try create something which can do that.



Answer



If I am reading the question correctly, you are saying that the data 'shrinks' when you use larger bases, when in fact it doesn't.


Take your own example: Base 2: 11111111111110000 Base 16: 101D0 Base 32: 20EG Base 64: G7G


We would use 101D0 for that, because hex is standard. What would happen if we used base 64 notation?


The answer is: essentially nothing, since you are still storing and processing the data in bits in your device. Even if you say you have G7G instead of 101D0, you are still storing and working with 11111111111110000 in your device. Imagine you have the number 5. If you put that in binary it would be 101. 101 has 3 digits and 5 has one, and this does not mean 5 is more compressed than 101, since you would still be storing the number as 0101 on your computer.


Just to keep with your examples, the IPv6 thing, or MAC addresses (for this example they are just the same thing, strings of two digits separated by dots).


We have, in hex, 00:00:FF:01:01. That is how you would regularly express it. This translates in binary as 0000 0000 0000 0000 1111 1111 0000 0001 0000 0001 (You are probably starting to see why we use hex now). This is easy, because since 16=2^4, you can convert one hex digit as 4 binary digits and just put the result together to get the actual binary string. In your base 64 system, if we had something like GG:HH:01:02:03, each letter would translate to 6 bits.


What is the problem with this then? The fact that computers work internally with powers of two. They don't really care about the notation you are using. In CPU registers, memory and other devices, you will never see data divided in groups of 6 bits.


TL;DR: Hexadecimal is just a notation to help us humans see binary things easier since a byte can be expressed as two characters (0-F), what is stored and processed in the computer is the same no matter the notation you use to read it.


No comments:

Post a Comment

Where does Skype save my contact's avatars in Linux?

I'm using Skype on Linux. Where can I find images cached by skype of my contact's avatars? Answer I wanted to get those Skype avat...