Useless fuss about ZIP and RAR

by Salvatore Iovene on 24 February 2007 — Posted in Articles

There has been some fuss generated by Jeff Atwood (who is a Windows developer, which is bad, and a Visual Basic one, which is worse), who seems, in my humble opinion, to be giving partial information around, as closed in his Windows-only world as he appears to be. In a recent article of his, Jeff makes a basic comparison between the ZIP and RAR compression systems. Unfortunately, most Windows people completely ignore that there’s something much better out there, that has been floating in the *nix world for quite a long time now. I’m talking about the powerful combination of tar and bzip2.

Let’s get to the facts right away. To experiment around, I’ve used a directory containing the source code of the Linux kernel, then I built that kernel, so that the size of the directory would be pretty big, and we would have both text files and binary files.

Here’s the size of the original directory:

$ du -sh /usr/src/linux-2.6.18.3
539M    linux-2.6.18.3

This is what happens with ZIP:

$ time zip -r ~/linux /usr/src/linux-2.6.18.3
...
real    2m35.917s
user    0m32.486s
sys     0m6.024s

$ ls -gGh ~/linux.zip
-rw-r--r-- 1 141M 2007-02-24 01:04 /home/siovene/linux.zip

Fine, 141Mb in 2 minutes and 35 seconds. Let’s try RAR:

$ time ./rar_static a ~/linux.rar /usr/src/linux-2.6.18.3
...
real    5m8.715s
user    2m14.012s
sys     0m12.473s

$ ls -gGh ~/linux.rar -lh
-rw-r--r-- 1 132M 2007-02-24 01:26 /home/siovene/linux.rar

Ouch! Double time and just a slightly better compression! Let’s try TAR and BZIP2:

$ time tar cv linux-2.6.18.3 | bzip2 > ~/linux.tar.bz2
...
real    4m22.265s
user    2m38.134s
sys     0m5.608s

$ ls -gGh ~/linux.tar.bz2
-rw-r--r-- 1 90M 2007-02-24 01:09 /home/siovene/linux.tar.bz2

Not too faster than RAR (but using two programs communicating through a pipe, so some overhead), but much more efficient! The compressed file is only 90Mb starting from an original uncompressed of 539Mb

Let’s summarize the data:


Method Time Size
zip 2m35s 141Mb
rar 5m08s 132Mb
tar.bz2 4m22s 90Mb

In conclusion, you should use the best tools, interdependently from their popularity, and remember that there is so much more than what you can see from your Windows-user-perspective.


Get new articles via email

Related posts:

Architecture of patching semantic versus logical content

by Salvatore Iovene on 19 February 2007 — Posted in Versioning, Articles

Inspired by a certain patch that hit a darcs repository to which I concur, I would like to talk about one thing that developers don’t seem to get very often, when using revision control systems: the structure of your files in the repository should have nothing to do with the logical units that make your patches, or with the comment of your patches themselves.

Yesterday, I saw this patch hit the repository: “Adding Cloth.h to the repo“. The patch was adding an empty file, named Cloth.h. What’s wrong with this? A couple of things:

  1. The patch adds no logical value unit to the repository, but merely a technical value, i.e. an information about the content of the repository itself, which is, then, absolutely redundant, as you could retrieve that information in a separate (and more proper way), which of course depends on the revision control system you are using. Indeed it was just a technical information. Furthermore, the fact that the file was added, would have been there and obvious also without having to dedicate a single patch to it.
  2. The comment (”Adding Cloth.h to the repo“), once again, doesn’t make any logical sense of its own, as adds an information that was already available using the revision control system tools.

What is a better way to do that? A patch named “Preliminary support to clothes“, which would add the file Cloth.h with its content, even if not yet functional, makes perfect sense. It means that you’re adding some logical value to the repository, and the value that you’re adding has nothing to do with the way that value is represented (the file Cloth.h), or that it’s being actually added to a repository.

In other words, the form and content of patches should not only represent single units of implicit logical value, as discussed earlier, but should have no awareness whatsoever of being part of a revision control system, or being uploaded to repositories, contains file, or even being patches at all!

Read more versioning tips here.


Get new articles via email

Related posts:

On the day I go to work for Microsoft

by Salvatore Iovene on 18 February 2007 — Posted in Quotes

” On the day I go to work for Microsoft, faint oinking sounds will be heard from far overhead, the moon will not merely turn blue but develop polkadots, and hell will freeze over so solid the brimstone will go superconductive. “

by Eric S. Raymond


Get new articles via email

Related posts:

Please stop talking about iPhone clones

by Salvatore Iovene on 13 February 2007 — Posted in Opinions, Articles

I can read, basically every other day, some website going nuts about some iPhone clone. Just few minutes ago I read about the nVidia GoForce 6100, and googling for “iPhone clone” really confirms the fuss. Well, the truth is that the iPhone actually is the one coming late, possibly the cloner, rather than the cloned.

I’m not working at Apple Inc., but, being a software developer and happening to have also worked for a while in the development process of a high tech device, I know, as many others, that the process that leads to the birth of a complicated gadget like the iPhone, takes years.

Digg.com has gone completely crazy about the subject. A Google Search about that subject on Digg, returns 5520 results.

Given that Apple Inc. will release the iPhone only next Fall, and that there are many devices already out there, which have been yelled at as mere “clones” (absolutely disrespectfully, IMHO), how can you people abuse the term so much? I’m assuming that nothing really serious leaked from Apple, so the competitors didn’t just rush to make their own touch screen phones. The truth is that the technology started to be ready, and the market started to be ready too, at the same time. Some people missed the opportunity and couldn’t even accept it, some other realized the chance and went for it.

I hope that the abuse of the word “clone” will cease to exist, and that, from now on, everybody will just talk about “another touch screen phone”.

Here’s a photo-list of alleged “iPhone clones” so far.

Samsung’s Ultra Smart F700 Asus Aura
Samsung's Ultra Smart F700 Asus Aura
Meizu M8 LG KE850
Meizu M8 LG KE850


Get new articles via email

Related posts:

Benefits of a weekend offline

by Salvatore Iovene on 13 February 2007 — Posted in Personal

Last weekend I happened to be completely off the network. We somehow managed to forget to pay a couple of DSL bills (now set up to be paid automatically), and the ISP “cut the wires”. Staying off the Internet provided loads of benefits. Let’s see.

  1. More time with the loved one. Of course I managed to spend more time with my loved one, cook together, watch a movie, just sit watching the telly and talking.
  2. Health benefits. I felt overall more relaxed, and my eyes felt much better. Especially because the previous 2 weeks had been very intense, so my left eye had some red in it.
  3. More time to read. With no Internet, I spent a lot of time reading books that were waiting to be finished.
  4. More productivity. This sounds crazy, but I did some coding for a project of mine that didn’t really need network access (no need to check API online, no need to talk with a team mate, etc), and without distraction I was able to do more coding than the previous week.
  5. More time outside. Not having much better to do, going to the shop to fetch food didn’t really feel like wasted time.
  6. More TV. Oh wait, this is not a benefit at all, is it?
  7. Less IRC. When I could connect to my running GNU/Screen session which was holding my IRC client in my office computer, I found out all the old messages. And that I missed the chance to buy tickets for a gig I wanted to go to.
  8. Less Usenet. Do you think I cought up with all the threads? Of course I had to go for “mark all as read” and forget it.
  9. Less RSS. My Google Reader had the infamous 100+ count of news… but I didn’t give up and managed to go through all of them.

What did I found out?
Well, nothing new, really. I already know I was addicted to IRC, RSS and USENET! But of course this break reminded me that I can do perfectly fine without.

Conclusion
I will probably spend more weekends off the net. Not.


Get new articles via email

Related posts:

  • No related posts

Please drop SVN

by Salvatore Iovene on 8 February 2007 — Posted in Opinions, Versioning, Articles

SVN might be stable, it might be mature, it might be successful, and it might be the winning source control system of the moment. There’s always a big risk of resulting unpopular, when criticizing something that actually did find its way to success, but I have to say that SVN sounds terribly antique sometimes.

I have already given a brief introduction to the Darcs source control system, and I would like here to talk about a very strong point it’s got against SVN.

Just yesterday, at work, I needed to commit certain modification to SVN. As I examined the diff of my local copy with:

svn diff

I realized that one of the file also contained some other modifications that I didn’t want to commit. After using Darcs for several months, I was suddenly hit by the shocking truth: SVN doesn’t allow interactive and partial patches, which Darcs names hunks.

What do you do in that case? Provided that there are people who actually abuse the Save as… function of their editor by saving multiple copies of the same file according to the logical patch they contain (which I find absolutely horrible), the quickest way I could find was to:

  1. Making a diff: svn diff > logical_patch_1.diff
  2. Edit the diff manually, until I had two files, which represented the two logical diffs
  3. Revert the pristine: svn -R revert .
  4. Apply the first diff: patch -p0 < logical_patch_1.diff
  5. Commit: svn commit
  6. Apply the second diff: patch -p0 < logical_patch_2.diff
  7. Commit: svn commit

With Darcs, all you have to do is issue the darcs record command (which records your changes):

  1. Record: darcs record -m "First logical patch (fixes bug 1234)"
  2. Answer “yes” to the first hunk, and “no” to the second.
  3. Record again: darcs record -m "Second logical patch (fixes bug 5555)"
  4. Answer “yes” to the only hunk

Can you see the difference? It’s not just about the number of operations needed, but the quality of them, and the fact that Darcs is perfectly oriented to this kind of flexibility. Please consider switching to Darcs for your projects and work, as it’s a mature and better system.


Get new articles via email

Related posts:

ACE: The Adaptive Communication Environment

by Salvatore Iovene on 3 February 2007 — Posted in Articles

For some time, in the last months, I have been using, both for work and leisure purposes, the ACE library. ACE is a very powerful, useful and portable framework, oriented to networking, but that can be used for abstracting nearly any system dependent task. In my case, I’m using it for two client-server architecture projects, one of which is the MMORPG I’m working on.

ACE has had, for me, a pretty nasty learning curve: at first there are some good tutorial, and plenty of examples and test in the installation directory, but after a while you are probably going to need to purchase the books, to master it. I really can’t blame the author(s) for that, as ACE is an impressive (I mean it) work, and deserves some revenue.

Using ACE for my projects has turn out into an incredibly useful outcome: I do all my development on GNU/Linux machines (Debian Sarge at work, and Debian Unstable at home), but the code I write needs to be ported to the Win32 platform as well. I’m no Windows programmer, and no Windows user, and there are other people in my company who take care of integration with Win32 systems. After writing my client/server project for about two months, it started to get usable, and we decided to port it to Win32, and, to some extent, we were expecting some trouble in porting to Win32 an application that went on for two months and was made of roughly 10 thousands lines of code. The project is a server process plus a dynamic library communicating with it, that a UI client can use. Well, the porting to Win32 took no more than half an hour, and just a few changes needed to be made. Of course, during the development, I’ve been caring of not using any system dependent code, but the facts that it took so little to port, was simply amazing.

The ACE library has provided for me lots of platform independent things: a way to manage sockets and TCP connections, a way to manage loading of external programs, a way to manage threads (and related mutexes or locks), a way to manage logging, a way to manage tracing, and many more.

You can check this address for some good ACE tutorials, especially regarding the client/server communication. In my cases, I’ve gone for an approach orientated to a system that would handle one client connection in a dedicated thread. Thanks to ACE, this has been very easy and controllable. By controllable, I mean that I’m quite sure that the code I’ve produced, to that regard, is practically bug free. ACE helps you very well in taking care of all the errors and reacting accordingly. Due to the vast number of platforms ACE supports, it lacks exception handling, which can be considered a bad point, although necessary. To some extent, though, ACE can support exception handling, even if, for portability and integrity reasons, it’s advisable to let it go and rather use the classic return value checking approach. Nothing will anyway impede you in creating your own exception handling layer on top of your classes which manage ACE.

ACE is really strongly Object Oriented, which makes it perfectly suitable for large (but well engineered) projects. Needless to say, that ACE is not advised for very simple projects, unless you just want to take advantage of the system abstraction it provides. For larger projects, instead, you’d better be very careful and plan in advance. If you don’t know the system very well, you might end up making some wrong choices and wasting time. To this concern, I advise to read the books.

ACE is also very useful when it comes to logging, as it provides some really simple but powerful macros that can be used in debug mode, and that will produce no code at all if disabled. You can check this website for an introduction to the ACE logging facilities.

I will end this short article with a list of pros and cons about ACE, as I’ve found out during my experiences.

Pros

  • Very portable.
  • Very powerful.
  • Good initial learning curve.
  • Huge list of features.
  • Many examples.
  • Great mailing list support (even though they remind to reading the books too often).

Cons

  • API are not very well documented.
  • You need to purchase the books to master it.
  • No free binary releases.

What I advice it for
Any large project that need to manage networking and multithreading.


Get new articles via email

Related posts: