Tuesday 1 January 2019

memory - Looking for Clarification on Binary Prefix Logic / History vs. SI Prefix


Recently I've looked into the SI and Binary prefixes used for digital storage / computing in general, and I'm still not sure I understand the logic behind the Binary prefix (actual question at the bottom).


Current Understanding (summarized):


SI:


Seems quite simple, every time you end up with 'total # of bytes == current prefix value * 1,000' : use the next largest prefix, which can be expressed as '1000 ^ 1 + the amount of SI prefixes you have already used starting at kilo'. Simple enough.


Binary:


So, apparently memory works better / easier when using units of data that are powers of 2 (the 'systems architecture' explanation of this is over my head to be honest, so I'll just take their word for it). Also, in the early days of computers before Binary prefixes were officially established, a machine with 1024 bytes of memory (which would be the smallest amount of memory using a power of 2 that breaks 1000 bytes), designers decided to use the already established SI standards and describe this amount of memory as a kB even though they weren't actually equal (it was 'close enough' more or less).


This is where my understanding starts to break down, I guess that since the SI standards 'go up' to the next prefix by using the formula (base 1000 ^ 1 + used prefix #), then the equivalent in Binary is (base 1024 ^ 1 + used prefix #) since 1024 is closer to 1000 than 512 or 2048 using powers of 2, and therefore more closely relates to the SI formula.


Actual Question:


So, if that is correct, at this point in time, why use Binary prefixes at all? Is it really that 'bad' to just say '1.024 kB' instead of 1 KiB (or whatever unit you are measuring)? I suppose listing 4 GiB of ram as 4.29497 GB is a bit goofy (Is that the reason? It is just easier to use rounded numbers?). That, and is my general understanding of these prefix standards correct?


Any clarification is appreciated, thank you for reading.



Answer



Well, we use the SI (a.k.a. decadic or Metric) terms because they are the correct terms for weights and measures (a kilogram = 1000 grams, a kilometer = 1000 meters, etc.) and because they’ve been around for a long time (the kilo- and milli- prefixes were introduced in the 1700s, and they derive from Greek and Latin, which were used 2K years ago [har har] ).  We use the binary terms when they are appropriate (in computer contexts) because they are the correct terms in those contexts.


But the binary terms were introduced less than 20 years ago and formalized less than 10 years ago.  And they are not new terms that go with new concepts (as, for example, “laser” was).  They are new terms for established concepts (for which the wrong words were being used).  Therefore, they are slow to catch on (because many people are still using the old terms incorrectly).


Which reminds me: you seem to be confused on this.  When somebody says “4 GB” when he means 4,294,967,296 bytes, he isn’t “rounding the number”, because he doesn’t mean “4.294967296 GB”.  He means “4 GiB”, and he’s using the wrong term.  Because he hasn’t learned the new terms yet, or because he doesn’t understand why the difference is important, or because he’s afraid that the people that he’s talking to will understand “GB” but not “GiB”.  The system of binary terms isn’t being ignored; it’s still in the process of being learned, accepted, and adopted.


This is covered exhaustively in Wikipedia.  For example, in the Binary prefix article:



The computer industry has historically used the units kilobyte, megabyte, and gigabyte, and the corresponding symbols KB, MB, and GB, in at least two slightly different measurement systems.  In citations of main memory (RAM) capacity, gigabyte customarily means 1 073 741 824 bytes.  As this is the third power of 1024, and 1024 is a power of two (210), this usage is referred to as a binary prefix.


In most other contexts, the industry uses the multipliers kilo, mega, giga, etc., in a manner consistent with their meaning in the International System of Units (SI), namely as powers of 1000.  For example, a 500 gigabyte hard disk holds 500 000 000 000 bytes, and a 1 Gbit/s (gigabit-per-second) Ethernet connection transfers data at 1 000 000 000 bit/s.  In contrast with the binary prefix usage, this use is described as a decimal prefix, as 1000 is a power of 10 (103).


The use of the same unit prefixes with two different meanings has caused confusion.  Starting around 1998, the International Electrotechnical Commission (IEC) and several other standards and trade organizations addressed the ambiguity by publishing standards and recommendations for a set of binary prefixes that refer exclusively to powers of 1024.  Accordingly, the US National Institute of Standards and Technology (NIST) requires that SI prefixes only be used in the decimal sense:[1] kilobyte and megabyte denote one thousand bytes and one million bytes respectively (consistent with SI), while new terms such as kibibyte, mebibyte and gibibyte, having the symbols KiB, MiB, and GiB, denote 1024 bytes, 1 048 576 bytes, and 1 073 741 824 bytes, respectively.[2]  In 2008, the IEC prefixes were incorporated into the IEC 80000-13 standard.



[Presumably Wikipedia is using the convention of writing large decimal numbers with groups of three digits, separated by spaces, to respect the people who use . instead of , as a “thousands separator”.]


Similar paragraphs appear on other pages.  In Metric prefix:



In some fields of information technology it has been common to designate non-decimal multiples based on powers of 1024, rather than 1000, for some SI prefixes (kilo, mega, giga), contrary to the definitions in the International System of Units (SI).  This practice has been sanctioned by some industry associations, including JEDEC.  The International Electrotechnical Commission (IEC) standardized the system of binary prefixes (kibi, mebi, gibi, etc.) for this purpose.[23]



And in Kilo- :



A second definition has been in common use in some fields of computer science and information technology, which is, however, inconsistent with the SI definition.  It uses kilo as meaning 210 = 1024, because of the mathematical coincidence that 210 is approximately 103.  The reason for this application is that binary values natively used in computing are base 2 and not the base 10 which is used for the SI prefixes.  The NIST comments on this confusion: “Faced with this reality, the IEEE Standards Board decided that IEEE standards will use the conventional, internationally adopted, definitions of the SI prefixes”, instead of kilo for 1024.[4]



More Wikipedia resources:



This has also been addressed on Super User before:



And for laughs, see this xkcd comic:
   
(but, of course, don’t take it seriously).


No comments:

Post a Comment

Where does Skype save my contact's avatars in Linux?

I'm using Skype on Linux. Where can I find images cached by skype of my contact's avatars? Answer I wanted to get those Skype avat...