Architecture of patching semantic versus logical content

Feb 19 2007 Published by Salvatore Iovene under Software

Inspired by a cer­tain patch that hit a darcs repos­i­tory to which I con­cur, I would like to talk about one thing that devel­op­ers don’t seem to get very often, when using revi­sion con­trol sys­tems: the struc­ture of your files in the repos­i­tory should have noth­ing to do with the log­i­cal units that make your patches, or with the com­ment of your patches them­selves.

Yes­ter­day, I saw this patch hit the repos­i­tory: “Adding Cloth.h to the repo”. The patch was adding an empty file, named Cloth.h. What’s wrong with this? A cou­ple of things:

  1. The patch adds no log­i­cal value unit to the repos­i­tory, but merely a tech­ni­cal value, i.e. an infor­ma­tion about the con­tent of the repos­i­tory itself, which is, then, absolutely redun­dant, as you could retrieve that infor­ma­tion in a sep­a­rate (and more proper way), which of course depends on the revi­sion con­trol sys­tem you are using. Indeed it was just a tech­ni­cal infor­ma­tion. Fur­ther­more, the fact that the file was added, would have been there and obvi­ous also with­out hav­ing to ded­i­cate a sin­gle patch to it.
  2. The com­ment (“Adding Cloth.h to the repo”), once again, doesn’t make any log­i­cal sense of its own, as adds an infor­ma­tion that was already avail­able using the revi­sion con­trol sys­tem tools.

What is a bet­ter way to do that? A patch named “Pre­lim­i­nary sup­port to clothes”, which would add the file Cloth.h with its con­tent, even if not yet func­tional, makes per­fect sense. It means that you’re adding some log­i­cal value to the repos­i­tory, and the value that you’re adding has noth­ing to do with the way that value is rep­re­sented (the file Cloth.h), or that it’s being actu­ally added to a repository.

In other words, the form and con­tent of patches should not only rep­re­sent sin­gle units of implicit log­i­cal value, as dis­cussed ear­lier, but should have no aware­ness what­so­ever of being part of a revi­sion con­trol sys­tem, or being uploaded to repos­i­to­ries, con­tains file, or even being patches at all!

Read more ver­sion­ing tips here.

No responses yet

Please drop SVN

Feb 08 2007 Published by Salvatore Iovene under Software

SVN might be sta­ble, it might be mature, it might be suc­cess­ful, and it might be the win­ning source con­trol sys­tem of the moment. There’s always a big risk of result­ing unpop­u­lar, when crit­i­ciz­ing some­thing that actu­ally did find its way to suc­cess, but I have to say that SVN sounds ter­ri­bly antique sometimes.

I have already given a brief intro­duc­tion to the Darcs source con­trol sys­tem, and I would like here to talk about a very strong point it’s got against SVN.

Just yes­ter­day, at work, I needed to com­mit cer­tain mod­i­fi­ca­tion to SVN. As I exam­ined the diff of my local copy with:

svn diff

I real­ized that one of the file also con­tained some other mod­i­fi­ca­tions that I didn’t want to com­mit. After using Darcs for sev­eral months, I was sud­denly hit by the shock­ing truth: SVN doesn’t allow inter­ac­tive and par­tial patches, which Darcs names hunks.

What do you do in that case? Pro­vided that there are peo­ple who actu­ally abuse the Save as… func­tion of their edi­tor by sav­ing mul­ti­ple copies of the same file accord­ing to the log­i­cal patch they con­tain (which I find absolutely hor­ri­ble), the quick­est way I could find was to:

  1. Mak­ing a diff: svn diff > logical_patch_1.diff
  2. Edit the diff man­u­ally, until I had two files, which rep­re­sented the two log­i­cal diffs
  3. Revert the pris­tine: svn -R revert .
  4. Apply the first diff: patch -p0 < logical_patch_1.diff
  5. Com­mit: svn commit
  6. Apply the sec­ond diff: patch -p0 < logical_patch_2.diff
  7. Com­mit: svn commit

With Darcs, all you have to do is issue the darcs record com­mand (which records your changes):

  1. Record: darcs record -m "First logical patch (fixes bug 1234)"
  2. Answer “yes” to the first hunk, and “no” to the second.
  3. Record again: darcs record -m "Second logical patch (fixes bug 5555)"
  4. Answer “yes” to the only hunk

Can you see the dif­fer­ence? It’s not just about the num­ber of oper­a­tions needed, but the qual­ity of them, and the fact that Darcs is per­fectly ori­ented to this kind of flex­i­bil­ity. Please con­sider switch­ing to Darcs for your projects and work, as it’s a mature and bet­ter system.

19 responses so far

Darcs — The source code management system of the future?

Dec 29 2006 Published by Salvatore Iovene under Software

Hav­ing already men­tioned some good prac­tices for source code ver­sion­ing and how impor­tant ver­sion­ing is, in any case, I would like now to review and com­ment about what I find the best source code man­age­ment sys­tem out there: darcs.

Darcs is a source con­trol sys­tem writ­ten in Haskell (a func­tional lan­guage), and fea­ture very solid math­e­mat­ics bases, being com­pletely enge­neered on top of a “patch the­ory”. Not only darcs is straight­for­ward and very easy to you, not only it’s very inter­ac­tive and min­i­mizes the chances of mis­takes, but it also gives out fea­tures that the pop­u­lar SVN doesn’t have. Here I’m going to show some use cases, and show how things are eas­ier with darcs.

A brief start up

Before ana­lyz­ing the key fea­tures, let’s have a brief start-up quick tuto­r­ial. The eas­i­est way to get darcs, is to down­load a binary pack­age. These pack­ages con­tain a pre­com­piled release of darcs, with every­thing needed sta­t­i­cally linked inside. You only need o copy that some­where in your $PATH, such as /usr/bin, /usr/local/bin, or what­ever you have in your $PATH. Of course you can down­load the source code and build it your­self if you want.

Let’s now cre­ate a sim­ple Hello World project, and use darcs to ver­sion it.

$ mkdir $HOME/projects/HelloWorld
$ cd !$
$ darcs init

darcs init will cre­ate all the files nec­es­sary to source-control the code. You will find a new direc­tory named _darcs.

Now we can write our HelloWorld.cc main file:

#include <iostream>

using namespace std;

int main(void) {
    cout << "Hello World!"<< endl;
}

Time to add the file to ver­sion control.

$ darcs add HelloWorld.cc

Alright, now we really get into darcs. First of all, in case you didn’t notice, darcs doesn’t really need any server at the other end, like SVN would need an SVN server, or CVS would need a CVS server. This means no has­sle in installing and con­fig­ur­ing a server. Later we will see how darcs man­ages col­lab­o­ra­tion with remote users.

Now it’s time to save our changes to the repository.

$ darcs record
Darcs needs to know what name (conventionally an email
address) to use as the patch author, e.g. 'Fred Bloggs
<fred@bloggs.invalid>'.  If you provide one now it will
be stored in the file '_darcs/prefs/author' and used as a
default in the future.  To change your preferred author
address, simply delete or edit this file.

What is your email address?
Salvatore Iovene <salvatore@invalid.com>
addfile ./HelloWorld.cc
Shall I record this change?(1/?)[ynWsfqadjkc], or ? for help: y
hunk ./HelloWorld.cc 1
+#include <iostream>
+
+using namespace std;
+
+int main(void) {
+	cout << "Hello World!" << endl;
+	return 0;
+}
+
Shall I record this change?(2/?)[ynWsfqadjkc], or ? for help: y
What is the patch name? First record.
Do you want to add a long comment? [yn] n
Finished recording patch 'First record.'

Some points worth inspec­tion here:

  • Why did darcs what to know my email address? That’s because every­thing you com­mit (they are named patches) will be known as com­ing from you. If you’re work­ing with sev­eral peo­ple, darcs has to know who is com­mit­ting what. Fur­ther­more, peo­ple down­load­ing your repos­i­tory can, e.g. make some changes and improve­ments, and then issue a darcs send which will send you the patch via email, and you can eval­u­ate it and decide if apply it.
  • What is a hunk?hunkis a piece of a patch, i.e. a cer­tain mod­i­fi­ca­tion in some source file. If you have a large file, foo.c, and mod­ify a cer­tain func­tion bar() at the begin­ning of the file, and then a cer­tain other func­tion tar() at the end of the file, this will result in two hunks. What’s the advan­tage of all this? Since darcs is so inter­ac­tive, you may decide to either apply both hunks in the same pathc, so answer ‘y’ to both, or real­ize that they log­i­cally belong to two dif­fer­ent patches, so you will say ‘y’ to one of them, and ‘n’ to the other. Then, after fin­ish­ing record­ing the first patch, you issue a darcs record again, and record the other hunk in a sep­a­rate patch, with a sep­a­rate name, that forms a log­i­cal unit per se.

Now let’s make a small change.

#include <iostream>

int main(void) {
    std::cout << "Hello World!" << std::endl;
}

As you can see, we have removed the using namespace std; dec­la­ra­tion, and added the std:: name­space pre­fix to cout and endl. A very impor­tant darcs com­mand is whatsnew, that shows us how the code dif­fers from the repository.

There are two hunks, as expected. Let's record the changes.

$ darcs record
hunk ./HelloWorld.cc 3
-using namespace std;
-
Shall I record this change?(1/?)[ynWsfqadjkc], or ? for help: y
hunk ./HelloWorld.cc 4
-	cout << "Hello World!" << endl;
+	std::cout << "Hello World!" << std::endl;
Shall I record this change?(2/?)[ynWsfqadjkc], or ? for help: y
What is the patch name? Removing the std namespace \
declaration.
Do you want to add a long comment? [yn]n
Finished recording patch 'Removing the std namespace
declaration.'

Obvi­ously those two hunks must form one sin­gle patch, because we don’t want any patch to leave the repos­i­tory in a bro­ken state. Now we get to the cool stuff. Darcs lets you unrecord your changes, i.e. inter­ac­tively roll­out the patches until you are sat­is­fied. We might change our mind about the last patch, and think that using namespace std; is not tha bad after all. No problem.

$ darcs unrecord

Fri Dec 29 12:53:32 EET 2006
Salvatore Iovene <salvatore@invalid.com>
* Removing the std namespace declaration.
Shall I unrecord this patch?(1/2)[ynWvpxqadjk], or ? for help: y

Fri Dec 29 12:37:33 EET 2006
Salvatore Iovene <salvatore@invalid.com>
* First record.
Shall I unrecord this patch?(2/2)[ynWvpxqadjk], or ? for help: n
Finished unrecording

Now there we are again, back as if noth­ing happened.

Imag­ine you want to have a copy of your repos­i­tory, maybe on a dif­fer­ent par­ti­tion of your disk, or maybe on a USB stor­age drive:

$ cd ..
$ mkdir RepoCopy
$ cd RepoCopy/
$ darcs init
$ darcs pull ../HelloWorld/

Fri Dec 29 12:37:33 EET 2006
Salvatore Iovene <salvatore@invalid.com>
* First record.
Shall I pull this patch?(1/1)[ynWvpxqadjk], or ? for help: y
Finished pulling and applying.

Another direc­tory is not the only way you can move your repos­i­tory around, you can use SSH to copy it to another machine, and HTTP to fetch it. This is actu­ally the way you han­dle col­lab­o­ra­tion. Imag­ine you have a server some­where, named www.server.com, and there you want to have your cen­tral repos­i­tory, with which you can col­lab­o­rate with your devel­op­ment peers.

$ darcs push \
username@www.server.com:/var/www/htdocs/HelloWorld/repo

This will ask you which patches you want to push to that server, one by one, in the usual darcs inter­ac­tive mode. I’m assum­ing that the direc­tory /var/www/htdocs/HelloWorld/ on the server, hosts the http://www.server.com/HelloWorld/ web­site. Every­body can now get a copy of your project just by doing this:

$ darcs get http://www.server.com/HelloWorld/repo

And any­body with an account on that server, will be able to push patches, if they of course have write per­mis­sion to the direc­tory where the repos­i­tory is.

Where to go from here

Here fol­low some must-read links if you’re inter­ested in darcs. Prob­a­bly in the future I will write more about it. Thanks for reading.

4 responses so far

5 SVN best practices

Dec 15 2006 Published by Salvatore Iovene under Software

Ver­sion­ing sys­tems like CVS, SVN or Darcs are very impor­tant tools, that no seri­ous pro­gram­mers can omit to use. If you started a project with­out using any ver­sion­ing tools, I really rec­om­mend that you start using one imme­di­ately; but I’m not dis­cussing this right now.

I would like to point your atten­tion to some best prac­tices that I rec­om­mend when work­ing in a team.

  1. Don’t use ver­sion­ing like it were a backup tool.

    I’ve heard this ques­tion too often: “Have you put your code safely on SVN?”. That’s a bad ques­tion. Stor­ing code to an SVN server is not meant for safety, i.e. for fear of los­ing it. You are talk­ing about some­thing else, and that’s called backup. Take Darcs, a not so pop­u­lar ver­sion­ing sys­tem. It can start with­out a server, and you can just run it locally on your machine with­out launch­ing any dae­mon what­so­ever. A faulty hard drive can still make you lose all your work, of course. That’s why you have to do back­ups, of course, but they don’t have any­thing to do with ver­sion­ing. Hence, com­mit­ting to the repos­i­tory once a day, before tak­ing off home, e.g., is not an accept­able prac­tice, espe­cially if you work in a team. Doing that would be like mak­ing a daily backup. An SVN com­mit, instead, has to have a mean­ing of some sort, not just “Ok, let’s store to the SVN server the work of today”. More­over, some­times, if the sched­ule is tough and the coöper­a­tion is tight, you need to com­mit very often so your peer will keep up with you all the time, and not just find out, at evening, that he’s got dozens con­flicts after check­ing out your code.

  2. Com­mit as soon as your changes makes a log­i­cal unit.

    How often should you com­mit? Theres no such thing as com­mit­ting too often, or too rarely. You should com­mit each time your changes rep­re­sent a log­i­cal unit, i.e. some­thing that makes sense. Usu­ally that hap­pens because you’re fol­low­ing a plan, when cod­ing (because you are, aren’t you?). So, you find out a bug in the trunk, plan a strat­egy about how to fix it, fix it, and then com­mit. This makes sense because that’s a com­mit that fixes a bug. So that revi­sion X is buggy, while revi­sion X+1 is not. Don’t be shy about com­mit­ting too often. Should you just find an insignif­i­cant typo in a debug string, or in a com­ment, don’t be afraid of com­mit­ting just to fix that. Nobody will be mad at you for being pre­cise. Con­sider the extreme sit­u­a­tion in which, after months and months, you may want to remem­ber “What was the revi­sion where I fixed that typo in that debug string?”. If you ded­i­cated one signle com­mit for the actual finite log­i­cal unit of cor­rect­ing the typo, you can just scroll back your changelog and find it. But what often hap­pens, is that peo­ple will be doing some­thing else, and, while doing that some­thing else, will notice the type, and cor­rect it, and basi­cally merge that cor­rec­tion with the rest of the com­mit, mak­ing that thing los­ing vis­i­bil­ity. To make it sim­ple: your SVN com­ments shouldn’t explain that you did more than one thing. If your SVN com­ment looks like “Fix­ing bugs #1234 and #1235″ or “Fix­ing bug #4321 and cor­rect­ing typo in debug sting” then you should’ve used two commits.

  3. Be pre­cise and exhaus­tive in your com­mit com­ments.

    The sec­ond most annoy­ing thing ever is com­mit­ting with blank com­ments. If you’re work­ing in a team, your peer devel­op­ers will be frus­trated about it and pos­si­bly mad at you, or will label you in a bad way; pos­si­bly pub­licly humil­i­ate you. If you’re work­ing alone, you will expe­ri­ence what you’re hypo­thet­i­cal devel­op­ment com­pan­ions would have: frus­tra­tion in not being able to eas­ily track down what a cer­tain com­mit did. Com­ments in com­mits are impor­tant. Please be pre­cise and explain in detail every­thing you did. In the opti­mal case, I shouldn’t need to read your code.

  4. Never ever break the trunk.

    This is prob­a­bly the most annoy­ing thing when deal­ing with peo­ple who can’t use ver­sion­ing. Break­ing the trunk is an habit that will quickly earn you the hatred of your col­leagues. Think about it: if you com­mit a patch that breaks the trunk, and then I check it out, what am I going to do? The project won’t build so I either have to fix it, or come to your desk and com­plain to you. In both cases I’m wast­ing some time. And con­sider the first case again: what should I do after fix­ing your bro­ken code? Com­mit it? Send­ing you a diff? If I’ll com­mit, chances are that you’ll have con­flicts when you check­out, and you’ll have to waste time in resolv­ing them. Maybe send­ing you a patch would be the best way, but still it’s a waste of time for the both of us. So the thing is: before com­mit­ting, ALWAYS dou­ble check! Make a clean build and make sure that it builds. And don’t for­get to add files! It’s a very com­mon mis­take: com­mit­ting good code, but for­get­ting to add a file. You won’t real­ize, because the thing builds, but when I’ll check­out, I’ll have trou­bles, because of miss­ing file(s). If you’re using Darcs, just make a “darcs get” in a new direc­tory, and then build.

  5. Branch only if needed.

    There are some ways to han­dle branches, but here’s my favorite. The most of the work should hap­pen in the trunk, which is always sane, as stated by the pre­vi­ous prac­tice, and the patches should always be small, so that they can be reviewed very eas­ily. If you find your­self in the sit­u­a­tion of need­ing to write a large patch, then you should branch it. In that way you can have small patches that will break your branch over the time, but they can be eas­ily reviewed. After the process is com­pleted, i.e. you’ve achieved your goal of fix­ing a bug or imple­ment­ing a new fea­ture, you can test the branch thor­oughly, and then merge it to the trunk.

7 responses so far