1/29/2012

Mass Storage

A friend raised a good point in a comment to a Google+ post of mine: Long books weigh more, so electronic versions of those books must do the same. The question is how much.

I think we can assume that 0s add no weight, so all of the weight must be in the 1s. It stands to reason that the more 1s, the more weight, and 1-heavy text would, on the whole, weigh significantly more than text with many 0s.

A quick glance at an ASCII table for character representation is illuminating. For one thing, in a coincidence that I find too extreme to be anything but a first-order conspiracy by the Knights Templar, *all* vowels have odd representations, meaning that they have an extra 1 in the least significant bit. Meanwhile, the space character is even (0 in the least significant bit) and lower-case letters come later in the code than upper-case letters.

There are some simple consequences of these facts that we can derive:
  • Vowel-heavy text will obviously have more 1s than consonant-heavy text, and will thus weigh more. Languages based on consonant usage, such as some African languages and traditional written Hebrew, fare much better in general.
  • Spaces, having no 1 in the least significant bit, cost less, so the higher the proportion of spaces to overall character count, the less weighty the text. So books with short words will be lighter than those using long words.
  • Books with a higher proportion of CAPITAL letters to lower-case letters will weigh less because they simply require less bits, and therefore less 1s, to represent those letters. As with the word length, books with shorter sentences will weigh less due to the increased frequency of the capital letters that start each sentence. Also, text by accountants and angry emails take less in general from the overuse of caps-lock typing.
So if you just want a little light reading, pick up a children's book on accounting written in traditional Hebrew.

Data compression is something else to consider. There is obviously a savings on the sheer number of bits involved. But the key is that compression makes data even smaller, and even values, as per my previous calculations, are lighter than odd ones, Q.E.D. I'll leave the details of this proof as an exercise for the reader.

No comments: