Architecture of patching semantic versus logical content

by Salvatore Iovene on 19 February 2007 — Posted in Versioning, Articles

Inspired by a certain patch that hit a darcs repository to which I concur, I would like to talk about one thing that developers don’t seem to get very often, when using revision control systems: the structure of your files in the repository should have nothing to do with the logical units that make your patches, or with the comment of your patches themselves.

Yesterday, I saw this patch hit the repository: “Adding Cloth.h to the repo“. The patch was adding an empty file, named Cloth.h. What’s wrong with this? A couple of things:

  1. The patch adds no logical value unit to the repository, but merely a technical value, i.e. an information about the content of the repository itself, which is, then, absolutely redundant, as you could retrieve that information in a separate (and more proper way), which of course depends on the revision control system you are using. Indeed it was just a technical information. Furthermore, the fact that the file was added, would have been there and obvious also without having to dedicate a single patch to it.
  2. The comment (”Adding Cloth.h to the repo“), once again, doesn’t make any logical sense of its own, as adds an information that was already available using the revision control system tools.

What is a better way to do that? A patch named “Preliminary support to clothes“, which would add the file Cloth.h with its content, even if not yet functional, makes perfect sense. It means that you’re adding some logical value to the repository, and the value that you’re adding has nothing to do with the way that value is represented (the file Cloth.h), or that it’s being actually added to a repository.

In other words, the form and content of patches should not only represent single units of implicit logical value, as discussed earlier, but should have no awareness whatsoever of being part of a revision control system, or being uploaded to repositories, contains file, or even being patches at all!

Read more versioning tips here.


Get new articles via email

Related posts:

Please drop SVN

by Salvatore Iovene on 8 February 2007 — Posted in Opinions, Versioning, Articles

SVN might be stable, it might be mature, it might be successful, and it might be the winning source control system of the moment. There’s always a big risk of resulting unpopular, when criticizing something that actually did find its way to success, but I have to say that SVN sounds terribly antique sometimes.

I have already given a brief introduction to the Darcs source control system, and I would like here to talk about a very strong point it’s got against SVN.

Just yesterday, at work, I needed to commit certain modification to SVN. As I examined the diff of my local copy with:

svn diff

I realized that one of the file also contained some other modifications that I didn’t want to commit. After using Darcs for several months, I was suddenly hit by the shocking truth: SVN doesn’t allow interactive and partial patches, which Darcs names hunks.

What do you do in that case? Provided that there are people who actually abuse the Save as… function of their editor by saving multiple copies of the same file according to the logical patch they contain (which I find absolutely horrible), the quickest way I could find was to:

  1. Making a diff: svn diff > logical_patch_1.diff
  2. Edit the diff manually, until I had two files, which represented the two logical diffs
  3. Revert the pristine: svn -R revert .
  4. Apply the first diff: patch -p0 < logical_patch_1.diff
  5. Commit: svn commit
  6. Apply the second diff: patch -p0 < logical_patch_2.diff
  7. Commit: svn commit

With Darcs, all you have to do is issue the darcs record command (which records your changes):

  1. Record: darcs record -m "First logical patch (fixes bug 1234)"
  2. Answer “yes” to the first hunk, and “no” to the second.
  3. Record again: darcs record -m "Second logical patch (fixes bug 5555)"
  4. Answer “yes” to the only hunk

Can you see the difference? It’s not just about the number of operations needed, but the quality of them, and the fact that Darcs is perfectly oriented to this kind of flexibility. Please consider switching to Darcs for your projects and work, as it’s a mature and better system.


Get new articles via email

Related posts:

Darcs - The source code management system of the future?

by Salvatore Iovene on 29 December 2006 — Posted in Versioning, Articles

Having already mentioned some good practices for source code versioning and how important versioning is, in any case, I would like now to review and comment about what I find the best source code management system out there: darcs.

Darcs is a source control system written in Haskell (a functional language), and feature very solid mathematics bases, being completely engeneered on top of a “patch theory”. Not only darcs is straightforward and very easy to you, not only it’s very interactive and minimizes the chances of mistakes, but it also gives out features that the popular SVN doesn’t have. Here I’m going to show some use cases, and show how things are easier with darcs.

A brief start up

Before analyzing the key features, let’s have a brief start-up quick tutorial. The easiest way to get darcs, is to download a binary package. These packages contain a precompiled release of darcs, with everything needed statically linked inside. You only need o copy that somewhere in your $PATH, such as /usr/bin, /usr/local/bin, or whatever you have in your $PATH. Of course you can download the source code and build it yourself if you want.

Let’s now create a simple Hello World project, and use darcs to version it.

$ mkdir $HOME/projects/HelloWorld
$ cd !$
$ darcs init

darcs init will create all the files necessary to source-control the code. You will find a new directory named _darcs.

Now we can write our HelloWorld.cc main file:

#include <iostream>

using namespace std;

int main(void) {
cout << "Hello World!" << endl;
}

Time to add the file to version control.

$ darcs add HelloWorld.cc

Alright, now we really get into darcs. First of all, in case you didn’t notice, darcs doesn’t really need any server at the other end, like SVN would need an SVN server, or CVS would need a CVS server. This means no hassle in installing and configuring a server. Later we will see how darcs manages collaboration with remote users.

Now it’s time to save our changes to the repository.

$ darcs record
Darcs needs to know what name (conventionally an email
address) to use as the patch author, e.g. 'Fred Bloggs
<fred@bloggs.invalid>'.  If you provide one now it will
be stored in the file '_darcs/prefs/author' and used as a
default in the future.  To change your preferred author
address, simply delete or edit this file.

What is your email address?
Salvatore Iovene <salvatore@invalid.com>
addfile ./HelloWorld.cc
Shall I record this change?(1/?)[ynWsfqadjkc], or ? for help: y
hunk ./HelloWorld.cc 1
+#include
+
+using namespace std;
+
+int main(void) {
+	cout << "Hello World!" << endl;
+	return 0;
+}
+
Shall I record this change?(2/?)[ynWsfqadjkc], or ? for help: y
What is the patch name? First record.
Do you want to add a long comment? [yn] n
Finished recording patch 'First record.'

Some points worth inspection here:

  • Why did darcs what to know my email address? That’s because everything you commit (they are named patches) will be known as coming from you. If you’re working with several people, darcs has to know who is committing what. Furthermore, people downloading your repository can, e.g. make some changes and improvements, and then issue a darcs send which will send you the patch via email, and you can evaluate it and decide if apply it.
  • What is a hunk? A hunkis a piece of a patch, i.e. a certain modification in some source file. If you have a large file, foo.c, and modify a certain function bar() at the beginning of the file, and then a certain other function tar() at the end of the file, this will result in two hunks. What’s the advantage of all this? Since darcs is so interactive, you may decide to either apply both hunks in the same pathc, so answer ‘y’ to both, or realize that they logically belong to two different patches, so you will say ‘y’ to one of them, and ‘n’ to the other. Then, after finishing recording the first patch, you issue a darcs record again, and record the other hunk in a separate patch, with a separate name, that forms a logical unit per se.

Now let’s make a small change.

#include <iostream>

int main(void) {
std::cout << "Hello World!" << std::endl;
}

As you can see, we have removed the using namespace std; declaration, and added the std:: namespace prefix to cout and endl. A very important darcs command is whatsnew, that shows us how the code differs from the repository.

$ darcs whatsnew
{
hunk ./HelloWorld.cc 3
-using namespace std;
-
hunk ./HelloWorld.cc 4
-	cout << "Hello World!" << endl;
+	std::cout << "Hello World!" << std::endl;
}

There are two hunks, as expected. Let’s record the changes.

$ darcs record
hunk ./HelloWorld.cc 3
-using namespace std;
-
Shall I record this change?(1/?)[ynWsfqadjkc], or ? for help: y
hunk ./HelloWorld.cc 4
-	cout << "Hello World!" << endl;
+	std::cout << "Hello World!" << std::endl;
Shall I record this change?(2/?)[ynWsfqadjkc], or ? for help: y
What is the patch name? Removing the std namespace 
declaration.
Do you want to add a long comment? [yn]n
Finished recording patch 'Removing the std namespace
declaration.'

Obviously those two hunks must form one single patch, because we don’t want any patch to leave the repository in a broken state. Now we get to the cool stuff. Darcs lets you unrecord your changes, i.e. interactively rollout the patches until you are satisfied. We might change our mind about the last patch, and think that using namespace std; is not tha bad after all. No problem.

$ darcs unrecord

Fri Dec 29 12:53:32 EET 2006
Salvatore Iovene <salvatore@invalid.com>
* Removing the std namespace declaration.
Shall I unrecord this patch?(1/2)[ynWvpxqadjk], or ? for help: y

Fri Dec 29 12:37:33 EET 2006
Salvatore Iovene <salvatore@invalid.com>
* First record.
Shall I unrecord this patch?(2/2)[ynWvpxqadjk], or ? for help: n
Finished unrecording

Now there we are again, back as if nothing happened.

Imagine you want to have a copy of your repository, maybe on a different partition of your disk, or maybe on a USB storage drive:

$ cd ..
$ mkdir RepoCopy
$ cd RepoCopy/
$ darcs init
$ darcs pull ../HelloWorld/

Fri Dec 29 12:37:33 EET 2006
Salvatore Iovene <salvatore@invalid.com>
* First record.
Shall I pull this patch?(1/1)[ynWvpxqadjk], or ? for help: y
Finished pulling and applying.

Another directory is not the only way you can move your repository around, you can use SSH to copy it to another machine, and HTTP to fetch it. This is actually the way you handle collaboration. Imagine you have a server somewhere, named www.server.com, and there you want to have your central repository, with which you can collaborate with your development peers.

$ darcs push 
username@www.server.com:/var/www/htdocs/HelloWorld/repo

This will ask you which patches you want to push to that server, one by one, in the usual darcs interactive mode. I’m assuming that the directory /var/www/htdocs/HelloWorld/ on the server, hosts the http://www.server.com/HelloWorld/ website. Everybody can now get a copy of your project just by doing this:

$ darcs get http://www.server.com/HelloWorld/repo

And anybody with an account on that server, will be able to push patches, if they of course have write permission to the directory where the repository is.

Where to go from here

Here follow some must-read links if you’re interested in darcs. Probably in the future I will write more about it. Thanks for reading.


Get new articles via email

Related posts:

5 SVN best practices

by Salvatore Iovene on 15 December 2006 — Posted in Versioning, Articles

Versioning systems like CVS, SVN or Darcs are very important tools, that no serious programmers can omit to use. If you started a project without using any versioning tools, I really recommend that you start using one immediately; but I’m not discussing this right now.

I would like to point your attention to some best practices that I recommend when working in a team.

  1. Don’t use versioning like it were a backup tool.

    I’ve heard this question too often: “Have you put your code safely on SVN?”. That’s a bad question. Storing code to an SVN server is not meant for safety, i.e. for fear of losing it. You are talking about something else, and that’s called backup. Take Darcs, a not so popular versioning system. It can start without a server, and you can just run it locally on your machine without launching any daemon whatsoever. A faulty hard drive can still make you lose all your work, of course. That’s why you have to do backups, of course, but they don’t have anything to do with versioning. Hence, committing to the repository once a day, before taking off home, e.g., is not an acceptable practice, especially if you work in a team. Doing that would be like making a daily backup. An SVN commit, instead, has to have a meaning of some sort, not just “Ok, let’s store to the SVN server the work of today”. Moreover, sometimes, if the schedule is tough and the cooperation is tight, you need to commit very often so your peer will keep up with you all the time, and not just find out, at evening, that he’s got dozens conflicts after checking out your code.

  2. Commit as soon as your changes makes a logical unit.

    How often should you commit? Theres no such thing as committing too often, or too rarely. You should commit each time your changes represent a logical unit, i.e. something that makes sense. Usually that happens because you’re following a plan, when coding (because you are, aren’t you?). So, you find out a bug in the trunk, plan a strategy about how to fix it, fix it, and then commit. This makes sense because that’s a commit that fixes a bug. So that revision X is buggy, while revision X+1 is not. Don’t be shy about committing too often. Should you just find an insignificant typo in a debug string, or in a comment, don’t be afraid of committing just to fix that. Nobody will be mad at you for being precise. Consider the extreme situation in which, after months and months, you may want to remember “What was the revision where I fixed that typo in that debug string?”. If you dedicated one signle commit for the actual finite logical unit of correcting the typo, you can just scroll back your changelog and find it. But what often happens, is that people will be doing something else, and, while doing that something else, will notice the type, and correct it, and basically merge that correction with the rest of the commit, making that thing losing visibility. To make it simple: your SVN comments shouldn’t explain that you did more than one thing. If your SVN comment looks like “Fixing bugs #1234 and #1235″ or “Fixing bug #4321 and correcting typo in debug sting” then you should’ve used two commits.

  3. Be precise and exhaustive in your commit comments.

    The second most annoying thing ever is committing with blank comments. If you’re working in a team, your peer developers will be frustrated about it and possibly mad at you, or will label you in a bad way; possibly publicly humiliate you. If you’re working alone, you will experience what you’re hypothetical development companions would have: frustration in not being able to easily track down what a certain commit did. Comments in commits are important. Please be precise and explain in detail everything you did. In the optimal case, I shouldn’t need to read your code.

  4. Never ever break the trunk.

    This is probably the most annoying thing when dealing with people who can’t use versioning. Breaking the trunk is an habit that will quickly earn you the hatred of your colleagues. Think about it: if you commit a patch that breaks the trunk, and then I check it out, what am I going to do? The project won’t build so I either have to fix it, or come to your desk and complain to you. In both cases I’m wasting some time. And consider the first case again: what should I do after fixing your broken code? Commit it? Sending you a diff? If I’ll commit, chances are that you’ll have conflicts when you checkout, and you’ll have to waste time in resolving them. Maybe sending you a patch would be the best way, but still it’s a waste of time for the both of us. So the thing is: before committing, ALWAYS double check! Make a clean build and make sure that it builds. And don’t forget to add files! It’s a very common mistake: committing good code, but forgetting to add a file. You won’t realize, because the thing builds, but when I’ll checkout, I’ll have troubles, because of missing file(s). If you’re using Darcs, just make a “darcs get” in a new directory, and then build.

  5. Branch only if needed.

    There are some ways to handle branches, but here’s my favorite. The most of the work should happen in the trunk, which is always sane, as stated by the previous practice, and the patches should always be small, so that they can be reviewed very easily. If you find yourself in the situation of needing to write a large patch, then you should branch it. In that way you can have small patches that will break your branch over the time, but they can be easily reviewed. After the process is completed, i.e. you’ve achieved your goal of fixing a bug or implementing a new feature, you can test the branch thoroughly, and then merge it to the trunk.


Get new articles via email

Related posts: