Challenge: Extraction Incomplete! [Archive]

View Full Version : Challenge: Extraction Incomplete!

Blitzkommando

Mar 27, 2008, 03:10 PM

That's the result you'll get if you try extracting all of the files from this 3,195 byte (3KB) 7zip. Oh, and for the record, if this had been sent uncompressed it would account for more internet traffic than the past decade of traffic over the internet with just one download.

The file (http://blitzkommando.com/storage/100EB.7z)

A small note about the file: Some browsers read 7zip files as raw text files. If it opens up in a new tab/window as garbled text you'll have to save the file to see the structure inside it.

It's just a bit of a fun game. Try getting people to extract some of the text files that the 7zip is compressing. Each one is 1GB and there are 1 trillion of them for a total of about 95 exabytes (or 100EB if you are doing a 1000 base system rather than 1024). Oh, and obviously don't try opening them in Notepad (or any text editor for that matter) you'll run out of memory really quickly and the program will crash as none are made to read 1 GB of text at once.

Randomness

Mar 27, 2008, 03:42 PM

Lol... 95 EXAbytes? If I'm correct, thats... 2^30 gigabytes right there. Over a billion. Wow.

AlexCraig

Mar 27, 2008, 04:04 PM

Get with Terrabytes, yo.

Wait... is Terrabyte > Exabyte?

[ This Message was edited by: AlexCraig on 2008-03-27 14:12 ]

Weeaboolits

Mar 27, 2008, 04:18 PM

No.

1000^1 k kilo
1000^2 M mega
1000^3 G giga
1000^4 T tera
1000^5 P peta
1000^6 E exa
1000^7 Z zetta
1000^8 Y yotta

[ This Message was edited by: Ronin_Cooper on 2008-03-27 14:19 ]

Randomness

Mar 27, 2008, 04:19 PM

On the subject of compression... if its possible to take a large amount of data and make it a small amount in a reversible process... why do we not have systems that don't need to uncompress it yet? Why not just build a processor for compressed data... make hard drive size limits a thing of the past, like the abacus?

Blitzkommando

Mar 27, 2008, 04:21 PM

On 2008-03-27 13:42, Randomness wrote:
Lol... 95 EXAbytes? If I'm correct, thats... 2^30 gigabytes right there. Over a billion. Wow.

Try 100^6 Gigabytes.

And no, It goes like this:

Gigabyte
Terabyte
Petabyte
Exabyte

Where each one is either 1024 of the previous or 1000 of the previous depending on how you are counting. (But to be bit-accurate it would be 1024)

On the subject of compression... if its possible to take a large amount of data and make it a small amount in a reversible process... why do we not have systems that don't need to uncompress it yet? Why not just build a processor for compressed data... make hard drive size limits a thing of the past, like the abacus?

I used special files to do it. Compression uses algorithms to find identical data and removes the excess, duplicate data. Thus, my 1GB text file that I started with was essentially 1GB worth of 0's (or 1's, or any single value). That's why I was able to keep duplicating over and over and the only overhead was just keeping track of the number of duplicates. So, essentially, no practical value.

Also, you're thinking about it a bit backwards. Compression still requires that amount of storage for the data to be uncompressed. Which is why I said you can't have the whole thing uncompressed at once because there's more data there than is able to be stored on the entirety of the world's computers. Compression also inherently means you have to decompress the data as well, which takes additional time and processing thus making it inefficient. And most types of files don't do well with compression and lose key data (and thus lose quality). MP3, JPEG, WMA, WMV, ATRAC, etc. They all use compression to remove bits of the sound that shouldn't affect the sound up to a point. But if you keep compressing the file it becomes so horribly mangled, and is missing so much data, that it is no longer even worth using.

I'll try to give an example that's easy to see. MP3's are encoded using various methods and various bit rates (bit rate = number of bits (8 bits to a byte) processed each second). The lower the bit rate, the higher the compression, and the lower the quality and similarity to the original audio file. Hence, 320kbps (kilobits per second) is much higher quality and much closer to the original sound than what you often see for download at 128kbps. A close listener can easily distinguish between various compression settings and hear where the sound has become distorted.

You probably don't remember it but way back in the days of DOS and prior to Windows 2000 you could compress an entire hard drive through Windows. This would make a 100MB hard drive 'appear' to have the capacity of 250MB. The problem was that it took even longer to read the data, and obviously required extra work from the processor to decompress the data. Once drives started coming in larger capacities (as in started going into the gigabyte range) it became extremely impractical. You can still compress directories in XP and Vista though but, again, it slows the system down due to the extra work. And, like I said, some files simply don't compress well because the data in them is too random to remove anything (JPEG images are already compressed so they sometimes INCREASE in size when they are paired up and put in an archive file like a zip).

tl;dr Compression works for archiving but shouldn't be used for primary storage due to limitations, loss of quality, and lots of other issues.

[ This Message was edited by: Blitzkommando on 2008-03-27 14:38 ]

AlexCraig

Mar 27, 2008, 04:23 PM

O_O My gods above! Yottabytes sound like fun in a basket!

Randomness

Mar 27, 2008, 04:23 PM

On 2008-03-27 14:21, Blitzkommando wrote:

On 2008-03-27 13:42, Randomness wrote:
Lol... 95 EXAbytes? If I'm correct, thats... 2^30 gigabytes right there. Over a billion. Wow.

Try 100^6 Gigabytes.

And no, It goes like this:

Gigabyte
Terabyte
Petabyte
Exabyte

Where each one is either 1024 of the previous or 1000 of the previous depending on how you are counting. (But to be bit-accurate it would be 1024)

Yes. Standard SI prefixes for increasing size are Kilo, Mega, Giga, Tera, Peta, Exa, Yotta (Go Wikipedia for that last one). Normally each one is another 3 powers of 10, or 1000, but due to the binary nature of computers, the tenth power of two, or 1024, is used instead. Not that using 1000 will put you too far off.

Randomness

Mar 27, 2008, 04:28 PM

On 2008-03-27 14:23, AlexCraig wrote:
O_O My gods above! Yottabytes sound like fun in a basket!

Theres on higher that starts with a Z, but I can't remember it.

AlexCraig

Mar 27, 2008, 04:30 PM

I think Ronin said Zetta. Imagine all the stuff you can put on just 1 of those!

Blitzkommando

Mar 27, 2008, 04:47 PM

On 2008-03-27 14:23, Randomness wrote:
Not that using 1000 will put you too far off.

By using base 1000 I could claim my archive is 100 exabytes. But in actuality it is only 95 exabytes. That's a pretty big difference.

That's the same thing that got drive manufacturers sued was claiming 1000 is a close enough approximation to 1024. That was fine back when drives were tiny, but once the "100GB" drives came out, as defined as 1GB = 1 billion bytes that meant the drive reads as 'only' 93.13 GB. Upping it to a 1TB drive and it's only 931.3GB, a 'loss' of 68.7 GB and certainly nothing to sneeze at. The higher the claimed capacity the greater the difference between the two systems. That's why the boxes now say exactly what I said above, "1 GB = 1 billion bytes" when it's really 1,073,741,824 bytes.

AlexCraig

Mar 27, 2008, 04:55 PM

Can we have an actual scale on here for the ladder of Bytitude? You know,
XXXXbytes = 1 GB
XXXXbytes = 1 TB
etc
etc
"
"

Randomness

Mar 27, 2008, 06:05 PM

1 byte=8 bits
1 kilobyte=1024 bytes
1 megabyte=1024 kilobytes
...

astuarlen

Mar 27, 2008, 09:33 PM

This reminds me of a recent XKCD strip:
http://imgs.xkcd.com/comics/kilobyte.png