Friday, September 14, 2007

BARF Compression: holy grail or goose chase?

I was surfing a community in orkut and came across this poll about BARF compression.. Followed the link to a nice blog and from there to http://cs.fit.edu/~mmahoney/compression/barf.html

It's apparently a new open-source compression which can reduce any file by at least one byte. Taking that recursively, it's theoretically possible to reduce any file, no matter how big, to 1 or even 0 bytes. I was amazed at the possibilities that this 1-byte compression can have..( Imagine 700mb-movies fitting in a byte!) and downloaded barf.exe and got started right away. 10 minutes on I realized the secret.. I'm sorry to say this, mate, but it's a TRICK! Like any magic trick, the missing bytes have been stored somewhere else discretely.

Let me explain how I found out:
1. same as other guys who've tried, i took a new notepad file, a.txt, wrote abcdefg in it.
2. compressed it 4 times. I noticed how the filename got appended by an extension everytime. When it truly compressed larger files it was just a .x , but once the reduce-by-1-byte game begins, 3-letter extensions come in. That raised my doubts, that there might be some pattern to the so-called random extension..
3. i renamed the last extension of the compressed file (which now read only "efg" in notepad) by changing the digit in it. And then resumed the decompression back to 7 bytes, to the original a.txt filename.
4. Now the a.txt file which read "abcdefg" before has become "abcHefg"! how did 'd' get replaced by 'H' ??
5. It's becoz i changed one extension mid-way thru, and IT was storing the missing byte, and thus the original data was altered!

There's a quicker way to see this.. If you have downloaded the barf compressor then use these steps to expose the trick
1. In the same folder as barf.exe, make any file a.txt, write "abcd" in it. (sans quotes)
2. Now rename it to "a.txt.x9o"
3. >barf d a.txt.x9o //on the command prompt
4. now the new "a.txt" has 1 more byte, and it reads japanese here on my pc..
5. still not had enough? Do >barf c a.txt //at the prompt again
6. I'll pay u big money if the output file from this compression isn't named "a.txt.x9o" and it doesn't contain your original data!

What this means: There's no bloody compression here, all it did was replace the first byte inside the file with a 3-letter extension (4, including the dot)! This way the file itself maybe compressed, but the computer's file-registry (or whatever they call it in the FAT/NTFS hard drives) has to carry 4 more bytes (there's a dot too) In short you're just passing along information between file & filename which is stored by the OS somewhere else. In the end the system's space occupied gets bigger, and this is purely useless transferring it thru internet.. The filename will itself take more time going thru than the original file!

So, sorry guys & taran in particular, but there's no miraculous way here to turn your 700mb movie into a 1-byte file.. if you try it, you're gonna end up with a file having.. (700x1024x1024 -1) extensions in it! your OS will commit suicide before that happens.. we know it can't allow filenames to go beyond a certain range.
The BARF will initially compress a file as best as it can & put a ".x" extension to it, but this is worse than other compressors. I tried it on a 9.36KB .csv file.. After 2 compressions with barf, the file had a .csv.x.x extension and weighed 7.10KB. But the same .csv , thru winrar, became only 3.63KB. Once you see the 3-letter extensions come instead of .x, you'll know it's just fooling with you!

5 comments:

Unknown said...
This comment has been removed by the author.
Unknown said...

The guy is right. download a simple hex editor like WinHex or HxD and create a text file. enter in complete rubbish and "compress" it over and over. when you have a random extension like .x9v, duplicate the file, then compress the original one more time for another random extension. open both in one of the hex editors and you will see that the one with two extensions is just the other with only one, but with the first byte removed. through some mathematical change, the first byte is turned into two random characters which are placed in the file name.
Try checking the size on disk in the properties dialog as well. you will be surprised.
This method is pretty much as useless as taking off the first byte and putting it at the end.

Atul said...

Yes. Filesize decreases by a byte and filename increases by 4 bytes. Means, you endup in spending extra 3 bytes. Gain? minus three bytes.



Simply, it stores file data in the filename!



If you want to see the fun, try renaming filename of compressed file and now decompress. You'll realise, it either cannot decompress or produces wrong uncompressed data. Thats trick.

Matt Mahoney said...

Yeah, I thought I explained the trick on my web page. There's no such thing as random compression or recursive compression. Did you try compressing the Calgary corpus? It's got more than one trick.

Nico Huysamen said...

You could have just looked at the source code to figure that out. He clearly states in the code -- even commented xD -- that if it reaches a point where compression is no longer possible, take out the first byte and replace it with a number in the filename extansion. But cudoes on figuring it out!

Gift Economy

Would you like to show your appreciation for this work through a small contribution?
Contribute

(PS: there's no ads or revenue sources of any kind on this blog)

Related Posts with Thumbnails