Leaving closed protocols behind

Oct 23 2007

In order to ful­fill what has been a propo­si­tion of mine for quite a long time, as of Decem­ber the 1st 2007, I will no longer use any Instant Mes­sag­ing ser­vices based on a closed pro­to­col, e.g. MSN, ICQ, AIM or Yahoo. The only way you will be able to con­tact me (besides con­ven­tional meth­ods such as phone and email) will be through my Jab­ber ID: salvatore.iovene at googlemail.com (replace “at” with “@”). This also works from GMail.

Rea­son

Pro­pri­etary IM sys­tems have a ter­ri­ble flaw: MSN users can’t chat with Yahoo users, AIM can’t chat with ICQ, and so on. So if I have friends who only use MSN and other friends who only use ICQ, I will have to use both to keep in touch with every­body. The rea­son for this is the cor­po­rate greed tak­ing advan­tage of the net­work effect. Wikipedia says:

The net­work effect is a char­ac­ter­is­tic that causes a good or ser­vice to have a value to a poten­tial cus­tomer depen­dent on the num­ber of cus­tomers already own­ing that good or using that service.

This also reflects the fact that cor­po­rates are valu­ing their own profit bet­ter than the final user’s sat­is­fac­tion. More­over, I don’t like the idea of using a closed pro­to­col. “Closed pro­to­col” means that the data (e.g. chat mes­sages) exchanged by two com­put­ers involved in a trans­ac­tion, is rep­re­sented with a secret for­mat, that the user is not allowed to study. Jab­ber, on the other hand, uses an open pro­to­col, based on XML. Every­body is allowed to study the pro­to­col, and write clients or servers that sup­port it. This allow col­lab­o­ra­tion and coöper­a­tion. Greedy cor­po­rates, instead, keep the pro­to­col secret in order to be the only ones able to write a client and a server for it, so they impose you the use of their clients (such MSN) which might be bloated with spy­ware and advertisements.

Since I’ve decided that I don’t want to sup­port this kind of behav­iour, I will unsub­scribe from the closed pro­to­col ser­vices that I use. You don’t have to do the same, but just get your­self a Jab­ber account in order to keep in touch with me, and, pre­ferrably, con­vince your friends to do the same.

What is Jabber?

When you hear some­one (prob­a­bly me) talk­ing about Jab­ber, they are usu­ally refer­ring to one of the following:

  • The XMPP (Jab­ber) Protocol
  • The Pub­lic Fed­er­ated Jab­ber Net­work (PFJN)
  • The Jab­ber Plat­form (which includes the pre­vi­ous items, as well as jab­ber chat clients, devices, trans­ports, etc.)

Jab­ber is, strictly speak­ing, the infor­mal name of an open-standard decen­tral­ized instant mes­sag­ing pro­to­col offi­cially called XMPP.

NOTE: It is also the name of a com­pany called Jab­ber Inc., which sells Jabber-based prod­ucts. How­ever, the Jab­ber plat­form is much larger than this sin­gle com­pany. Don’t let this con­fuse you! If you want to go to the author­i­ta­tive web­site about jab­ber, that would be jabber.org, not jabber.com!

The net­work of inde­pen­dent Jab­ber servers on the inter­net make up the Pub­lic Fed­er­ated Jab­ber Net­work. If you have an account on a server on the PFJN, then you can com­mu­ni­cate with any­one else who has an account on a PFJN server. This means that Google Talk users can com­mu­ni­cate seam­lessly with Gizmo-Project users (and vice-versa), as both of these ser­vices are on the Pub­lic Fed­er­ated Jab­ber Network.

How can I use Jabber?

To use Jab­ber you need a Jab­ber client and an account on a server. Here’s a list of pop­u­lar clients:

MS Win­dows

MacOS X

GNU/Linux

Then you will need an account. Most of the listed clients will allow you to cre­ate a Jab­ber account choos­ing from a list of servers, or, if you want, you can run your own server.

Google Talk

If you have a gMail account, then you have a jab­ber id via Google Talk! Your jab­ber id is the same as your email address. You can use either the native google talk client or any other jab­ber client.

Gizmo Project

Also, if you use the Gizmo Project, you too have a jab­ber account. Your jab­ber ID is login-name@chat.gizmoproject.com.

jabber.org

The Jab­ber Soft­ware Foun­da­tion is prob­a­bly the best known Jab­ber server out there. They just recently switched over to ejab­berd for their soft­ware, so they should be quite solid now.

3 responses so far

Fujitsu-Siemens shame on you

Oct 17 2007

I found myself in the process of real­iz­ing a dual-boot sys­tem on a lap­top, with Win­dows XP and Ubuntu 7.10. After wip­ing out the con­tent of the disk and par­ti­tion­ing it appro­pri­ately, I pro­ceed to the instal­la­tion of Win­dows XP (know­ing that I needed to install it first, since it would over­write the MBR and com­pletely dis­re­gard and dis­re­spect user’s free­dom), so I put in the so-called Recov­ery Disk pro­vided with that par­tic­u­larly Fujitsu-Siemens lap­top. After a lit­tle while, the Recov­ery Disk was propos­ing me to install Win­dows XP either using the full disk, or in two par­ti­tions, with a 10 GB “data” par­ti­tion. I was aston­ished and out­raged. It seems that the Fujitsu-Siemens peo­ple were think­ing that the user wasn’t smart enough to be able to choose what par­ti­tion use for his or her Win­dows install. They would rather limit the user’s free­dom, and not give him or her some choice. I sus­pect this was the result of some pres­sure from the Microsoft end: this “trick” basi­cally stops the user from installing other Oper­at­ing Sys­tems along Win­dows, unless he or she buys a new non-branded copy.

Fujitsu-Siemens and Microsoft: shame on you!

One response so far

Who wants to talk about patent infringement?

May 15 2007

After all the dust raised again by Microsoft about Linux and the Open Source com­mu­nity allegedly com­mit­ting patent infringe­ments (235 this time, they were 228 in 2004), I really feel the need to spend a few words about the mat­ter, or then, a few images.

It looks like the jeal­ous Redmond’s zealots, avidly fight­ing to pro­tect the unique­ness of their work (sar­casm intended), didn’t real­ize that they have been play­ing copy­cat for a long time, rein­vent­ing the wheel and doing a bad job at it. Do the fol­low­ing screen­shots tell you anything?

Search page

search.jpg

Search results

search2.jpg

Image search

images.jpg

News search

news.jpg

Maps search

maps.jpg

No responses yet

TABs vs Spaces. The end of the debate.

May 14 2007

When writ­ing source code, indent­ing is very impor­tant. Hav­ing a neat and clean pro­gram­ming style, let alone a pre­cise and uni­form one, is prob­a­bly one of the most impor­tant keys when attach­ing exam­ple source code with a job appli­ca­tion. I was myself asked to show some of my source code in my last two inter­views. Nobody ever asked me to show any run­ning pro­gram that I had made, though. Won­der why? A lot can be under­stood about the author just by glanc­ing quickly at some source code.

Indent­ing makes the source code eas­ier to read for us human beings, whereas the com­piler doesn’t really care (except for some lan­guages, where inden­ta­tion applies as a syn­tax ele­ment). Even if you’re not a pro­gram­mer, you can see the dif­fer­ences here:

Com­piler friendly

Compiler readable

Badly indented

Badly indented

Prop­erly indented

Properly indented

There is, I guess, no ques­tion that the last one, labelled as “Prop­erly indented”, is the most read­able. Prob­lem arise, though, when peo­ple start won­der­ing what they should use as indent­ing char­ac­ter. Some pre­fer TABs, other pre­fer blank spaces. A TAB, the key on the left of the Q of most Qwerty key­boards, is a sin­gle char­ac­ter that a text edi­tor can rep­re­sent what­ever way it wants. This is usu­ally cus­tomiz­able by the user, of course, so she can decide that a TAB will be shown as 8 spaces, or 4, or 2.

You can hear all the time some­one claim­ing, in turn, that TABs are evil or that spaces are evil, but the truth is that none is wrong, as long as you can indent.

I’ll use, as an exam­ple, a piece of source code taken from the ext3 mod­ule of the Linux ker­nel. The Linux pro­gram­ming guide­lines rec­om­mend using TABs for indent­ing, and that they should be 8 spaces wide. Let’s have a look at some code.

8-space TAB

8-space.jpg

4-space TAB

4-space.jpg

2-space TAB

2-space.jpg

As you can see, the orig­i­nal intent of the author, was to have the vari­able names aligned. But that align­ment gets screwed up as soon as a reader has a dif­fer­ent space-size for her TABs. What’s wrong there? Let’s use a very use­ful Vim tip: the :set list command.

:set list
set-list.jpg

This way, we can actu­ally see the TABs, as “>——-”. Of course there will be less dashes if part of the TAB area is occu­pied by some text. So, can you see what’s wrong with that? The author of that source code is using TABs not only for indent­ing, but also for align­ing! That way his align­ment gets messed up when some­body uses a dif­fer­ent TAB size. The solu­tion of this prob­lem is to sim­ply just use what ever you want for indent­ing, but use spaces for align­ing. Indent­ing must only be that left mar­gin that you give to some lines, but it’s not to be con­fused with align­ment. If the author of that source code had used TABs at the begin­ning of the lines, but just blank spaces between the type and the name of the vari­ables, his code would be as he meant it what­ever indent­ing style one’s edi­tor would use.

So, in the end, it doesn’t mat­ter whether you use TABs or space, for indent­ing, as long as you use just spaces for align­ing.

Use­ful Vim/Emacs tip

I like spaces, and add the fol­low­ing to the end of all of my source files:

/*
Local Variables:
mode:c++
c-basic-offset:2
c-file-offsets:((innamespace . 0)(inline-open . 0)(case-label . +))
c-tabs-mode:nil
End:
*/
// vim: filetype=cpp:expandtab:shiftwidth=2:tabstop=8:softtabstop=2

This way, if the reader uses Vim or Emacs (and maybe also gedit), her set­tings will be tem­porar­ily over­rid­den by mine, so, if she’s going to change my code, there are lit­tle chances that she’ll mess up my indenting.

The :set line options I use are the following:

set listchars=tab:>-,eol:$,trail:.,extends:#

It helps me to also spot trail­ing spaces. I rec­om­mend every­body to use the :set list, as it will pre­vent you to acci­den­tally mess up other’s indentation.

19 responses so far

How to improve the quality of programmers

Mar 09 2007

After claim­ing that most pro­gram­mers just can’t pro­gram, and actu­ally address­ing most of the prob­lems to the lack of pas­sion of peo­ple who decide to start a career as a pro­gram­mer, I would also like to express my point of view on a tightly related sub­ject: what can be done to improve the sit­u­a­tion? The prob­lem that I was try­ing to bring up in the spot­lights, is that a lot of peo­ple just start (or wish to start) a career in the IT for no par­tic­u­lar rea­sons. Those are the ones who don’t love and don’t loathe pro­gram­ming, and they just see it as some­thing that pays their bills. Well, maybe the first ques­tion that I should address, actu­ally is: why is this bad? Sure there are so many jobs which don’t require pas­sion at all, and peo­ple just do them because a job is just a job, and don’t really care. In my opin­ion, being a pro­gram­mer is different.

There are many peo­ple, espe­cially the ones who sit high in the hier­ar­chy of a com­pany, who see pro­gram­mers as the last and least impor­tant step of a lad­der. They often think that pro­gram­ming is quite of an auto­mated and repet­i­tive task, and it could basi­cally be done by any­one, with just a lit­tle train­ing. Unsur­pris­ingly, this seems to be the opin­ion of most com­mon peo­ple, who ignore what pro­gram­ming really is. I wouldn’t want to dis­crim­i­nate among dif­fer­ent types of pro­gram­ming, or dif­fer­ent pro­gram­ming lan­guages, but it’s obvi­ous to me that pro­gram­ming, to some extent, actu­ally can become an auto­mated an repet­i­tive task. That’s quite the minor­ity of cases, though, so I will sim­ply ignore them, and focus on the rest.

As any­body who’s a pro­gram­mer knows, pro­gram­ming is a highly cre­ative task, that requires good imag­i­na­tion and great prob­lem solv­ing skills. Every­body else might just see it as “typ­ing stuff on a com­puter”, and believe me, there’s a whole lot of edu­cated peo­ple who think that pro­gram­ming is a mon­key mat­ter. Hence the term “code mon­key”. This term has his­tor­i­cally been abused a lot, by even pro­gram­mers them­selves. A “code mon­key” is said to per­form a pro­gram­ming task so easy that even a mon­key could do, as the image sug­gests. There are two truths about this phe­nom­e­non: first of all, luck­ily, pro­gram­ming requires far more skills than it’s usu­ally believed; sec­ondly, and sadly, the major­ity of peo­ple just ignore it.

The prob­lem with lousy pro­gram­mer is kind of sim­i­lar to a medal: it’s dou­ble faced. You could actu­ally call it a dog try­ing to bite its own tail: as pro­gram­ming is believed to be an eas­ier and eas­ier task, more pro­gram­mers are needed; as more and more pro­gram­mers are needed, more peo­ple will jump on the field; as more and more peo­ple try to become pro­gram­mers, the lousier the aver­age qual­ity of pro­gram­mers gets. Unfor­tu­nately, what aver­age non-programming peo­ple miss to under­stand is that although it doesn’t really take a hard train­ing to become a lousy pro­gram­mer, it takes a damn hard one to excel in the art of pro­gram­ming. More­over, most peo­ple just lack the innate logic mech­a­nisms that make you a poten­tial pro­gram­mers. Such mech­a­nisms are devel­oped in your mind when you’re very young, and it’s quite rare to develop them after your twenty-somethings. With this, though, I’m not deny­ing that there are a lot of peo­ple who actu­ally do develop those mech­a­nisms in advanced age. I’m just try­ing to think of the big num­bers, here.

So, get­ting to the point, what went wrong and how can it be fixed? I don’t think it would be wise to say that what’s wrong is that there’s too much need of pro­gram­mers, ergo the aver­age qual­ity was inevitably doomed to lower and lower over the time. I rather think that the prob­lem is with edu­ca­tion. Of course I can’t speak for all the uni­ver­si­ties and col­leges in the world, but I can at least try and speak for the one I’ve known per­son­ally, or through peo­ple who have stud­ied there. It seems that, as more and more peo­ple apply to Com­puter Sci­ence or related depart­ments, the eas­ier it gets to get in (sorry for the pun), and to get through with it, i.e. to graduate.

I know this hap­pens most likely in any other fac­ul­ties, but see­ing that there are peo­ple who have been study­ing CS for three or more years, and still can’t get through the most sim­ple con­cepts, just doesn’t seem right to me. Yes­ter­day night, I was sit­ting in an IRC chan­nel about the C pro­gram­ming lan­guage, when some­body joined in and asked:

“I just started study­ing struc­tures in C, and I don’t get them. Can any­one explain to me what’s the use for them?”

Ok, I don’t really think there’s any­thing wrong in not get­ting the point of C struc­tures right away, but after a lit­tle chat­ting, it turned out that the guy was in his sec­ond year of Com­puter Sci­ence, and this was the sec­ond time he took the C class. Still that wouldn’t be a rea­son of hatred, of course (not that I have any hatred), but after another small while it turned out that the guy didn’t like pro­gram­ming at all, but he just got him­self into it because he applied to CS since he liked to “fid­dle around with com­put­ers”.

What’s really needed, in my opin­ion, is a harder and less tol­er­ant edu­ca­tional sys­tem, that would be more selec­tive, rather than push­ing every­one for­ward. Peo­ple that find out to be really not made for it, should just give up and move their focus on some­thing less.

I’m actu­ally very well aware that a lot of pro­gram­ming work, nowa­days, is not really rocket sci­ence, still this doesn’t mean that it should be done by com­pletely unqual­i­fied peo­ple. If what Jeff Atwood says in his post about pro­gram­mers who can’t pro­gram is true, and that is that 199 out of 200 appli­cants (not pro­gram­mers, appli­cants) can’t write any code what­so­ever, than it obvi­ously means that some­thing is wrong. Look­ing at the num­bers pro­vided by Joel Spol­sky, it looks like a lot of these basi­cally incom­pe­tent peo­ple are going to end up work­ing on an actual pro­gram­ming job, and maybe their code will end up on The Daily WTF (Paula, are you there?).

Unfor­tu­nately, the edu­ca­tion is not the only one to blame. No mat­ter how much edu­ca­tion will improve, there will always be unqual­i­fied peo­ple who are going to apply for jobs that require a lot of skills, and in the end the odds will help them, so they’ll man­age to get a job as a pro­gram­mer. Is it so bad, con­sid­er­ing that it’s most likely not going to be any crit­i­cal posi­tion, and the only ones that will be dam­aged will be the own­ers of the com­pany that hired them? Well, the point is that this is not true. There’s some­one else who gets dam­aged, in this sce­nario. I’m talk­ing about the com­mu­nity out there, the good pro­gram­mers, who find them­selves com­pet­ing with new­bies who’re happy to earn peanuts. The salaries keep going down, and cus­tomers are not able to dis­tin­guish a good job from a good one.

In a com­ment on the pre­vi­ous post of mine about this sub­ject, Hoowie Good­ell really gets a great point with this paragraph:

“There has been a great effort to indus­tri­al­ize pro­gram­ming, too. Again, there are many good fea­tures, and it’s a field I’m inter­ested in. Build­ing a large pro­gram requires a struc­tured approach. Lan­guage design, libraries, pro­gram­ming frame­works and IDEs can and should incor­po­rate as much exist­ing human knowl­edge as pos­si­ble: com­puter sci­ence, domain knowl­edge, solid pre-written code and human inter­face prin­ci­ples. (Check out Thomas Greene’s “Cog­ni­tive Dimen­sions of Nota­tions” for some of the lat­ter: I think of how pro­gram­ming tools fail to use them on a daily basis!)”

In a way, this sug­gests that the whole sys­tem is not ready yet, as it’s indeed years and years behind sev­eral other engi­neer­ing fields, and that’s a good rea­son, prob­a­bly, to explain why it’s so easy to fail at being a good pro­gram­mer. Let’s just try to get some insight­ful inspec­tion points, in order to build bet­ter gen­er­a­tion of programmers:

  1. Bet­ter edu­ca­tion.
    The whole higher edu­ca­tional sys­tem should be improved in sev­eral way. World­wide. Nowa­days, it looks to me that in many coun­tries grad­u­a­tion is just a direct con­se­quence of apply­ing to an Uni­ver­sity. Unfor­tu­nately, this kind of prob­lem must be addressed on a country-basis, to prop­erly iden­tify the spe­cific issues, but still the options that I would like to con­sider are worth men­tion­ing. It all comes down to a sin­gle point: there should be less tol­er­ance towards peo­ple that don’t learn. The thresh­olds for suc­ceed­ing in a course should be raised to greater dif­fi­culty. Cur­rent mod­els of test­ing should be seri­ously revised, so to ensure that stu­dents that really didn’t under­stand the sub­ject are not going to make it.
  2. Bet­ter tools.
    Are we try­ing to make pro­gram­ming just like a fac­tory chain or are we not? If we are, as it seems nowa­days, then the tools are not ready yet to sec­ond our inten­tions. Pro­gram­ming is too error prone and too time-consuming.
  3. Bet­ter process.
    Soft­ware process that doesn’t con­form to some stan­dards, say ISO-9000 (sorry if it’s inap­pro­pri­ate, I’m not an expert on this kind of stan­dards), shouldn’t be allowed to sell. Qual­ity insur­ance com­mit­tees should be taken more seri­ously as being part of the process. This might be against all prin­ci­ples of lib­er­al­ism, I know, as bad soft­ware, you may say, will not sell any­way. I know many bad soft­ware that did sell well, for greatly dif­fer­ent rea­sons than its (non) good quality.
  4. Bet­ter judg­ment when hir­ing.
    I’m not going to try to teach you how to run your com­pany, nor how to hire your crew. But some­times really crazy thing hap­pen (again, is Paula around?). A very inter­est­ing post by Joel Spol­sky (I’m sorry, I can’t find it any­more: does any­body know the link?) talks about only hir­ing “A”-people, where “A” means top class. If you’re ever hir­ing a “B”-person, he’s quite likely to hire a “C”-person, some­day. After that, it’s chaos. I rec­om­mend any­one not to lower their canons of per­fec­tions. Here’s another great arti­cle by Joel, about hir­ing good devel­op­ers, I rec­om­mend it.

Con­clud­ing, improv­ing the qual­ity of pro­gram­mers seems really to be a tough issue, and the whole thing depends on so many fac­tors that track­ing a pre­cise prob­lem is impos­si­ble. Cul­tural and tech­ni­cal dif­fi­cul­ties arise all the time, and get­ting clues is hard. I’ve tried to get around the prob­lem and give some insight­ful opin­ions: what do you peo­ple think?

11 responses so far

Why most programmers are lousy

Mar 08 2007

I’ve been in the IT field long enough to get to know many pro­gram­mers, both expe­ri­enced and just wanna­bies. Dur­ing this time, I’ve real­ized that most of them are just bad pro­gram­mers, sim­ply said. I find myself agree­ing with a bril­liant post by Jeff Atwood, which alleges that pro­gram­mers can’t pro­gram. What are the rea­sons for this? Many. Prob­a­bly, IMHO, the main fault has to be addressed to the lousy edu­ca­tion that peo­ple receive. But then again, the abil­ity of giv­ing edu­ca­tion remains directly pro­por­tional to the abil­ity of get­ting it, and where I see peo­ple com­plain­ing about low qual­ity of edu­ca­tion in Uni­ver­sity, I also see stu­dents with no inter­est in learn­ing. Let’s see some of the rea­sons why pro­gram­mers can’t really program.

  1. Young peo­ple study Com­puter Sci­ence just because it’s a trend. It sounds almost unbe­liev­able to me, but I must admit it’s mostly true. The vast major­ity of my old Uni­ver­sity mates just applied to the Com­puter Sci­ence depart­ment because… well: every­body was doing so. They fol­lowed the rest of the sheep.
  2. Young peo­ple study Com­puter Sci­ence because they wouldn’t know what else to do. That’s really another strong source of appli­ca­tions to Com­puter Sci­ence. A lot of young peo­ple in their teenage years just don’t know what they want to do as grownups. Com­puter Sci­ence still seems to be a good career oppor­tu­nity, so they just go for it.
  3. Young peo­ple study Com­puter Sci­ence because they think it’s a sure way of get­ting a job. 10-something years ago there was a big boom, and if you just knew some HTML, were thought to be a com­puter guru. These types of belief mark a deep foot­print on pop­u­lar say­ings, hence the wave of peo­ple apply­ing to Com­puter Sci­ence just because they can work, is still there.
  4. Many of today’s pro­gram­mers, were doing noth­ing else than surf­ing the net or using Word till last year. Espe­cially in small and ver­ti­cal based mar­kets, impro­vi­sa­tion just rules. Peo­ple learn some­thing, and lit­er­ally throw them­selves on the field. Draw­backs for qual­ity of their work are sim­ply inevitable. This is not only a group of illit­er­ate peo­ple that just jumped in to catch the big wave (what big wave, nowa­days?), but peo­ple with no pas­sion what­so­ever. In other words, I don’t think it’s pos­si­ble, nowa­days, to become a great pro­gram­mer if you didn’t start get­ting some inter­est in the field when you were very young, say about 10 years old (with the due excep­tions, of course).
  5. Many of today’s Com­puter Sci­ence stu­dents have no inter­est what­so­ever in what they’re force­fully study­ing. Just put together the pre­vi­ous items in this list and what do you get? A bunch of peo­ple who just don’t care, who want to get their piece of paper (the degree) as soon as pos­si­ble, and have absolutely no pas­sion in what they learn. That’s the worst. I strongly believe that pro­gram­ming is not just a job like many oth­ers, but you need pas­sion to get best at it.
  6. A lot of pro­gram­mers just don’t like to pro­gram. This goes for 100% of my ex Uni­ver­sity mates! Think of that: 100%. Of course it’s not the whole world but it makes a small statistics.
  7. A lot of pro­gram­mers just don’t get it. Not even the easy things. I was asked, few weeks ago, by a friend of mine who’s been study­ing Com­puter Sci­ence for now 4 years, what the dif­fer­ence is between a private and protected method in Java. Appar­ently read­ing the books isn’t enough any­more, nowa­days. Another guy asked me: “I’ve stud­ied point­ers in C, and I think I under­stood them. Still I can’t find any use for them… are they really used at all?”.
  8. Basi­cally all of the pro­gram­mers, or wannabe pro­gram­mers, men­tioned above, are miles away from the tech­ni­cal com­mu­nity. These peo­ple will totally ignore the exis­tence of:

    Slash­dot and sim­i­lar
    RSS
    Usenet
    IRC (“Is that like MSN?”)
    SVN and similar

As you can see, a really strong point, in my opin­ion, is the lack of care and pas­sion for the sub­ject of pro­gram­ming itself. Lousy pro­gram­mers are bound to pro­gram to take a wage home; good ones are bound to pro­gram for the sake of pro­gram­ming itself. Or course you can do that but still miss to be a good pro­gram­mer, but all falls down to numbers.

86 responses so far

The ultimate guide for UTF-8 in irssi and GNU/Screen

Mar 06 2007

I’ve been hav­ing quite a lot of trou­ble, lately, con­fig­ur­ing irssi to work well with UTF-8. Irssi’s doc­u­men­ta­tion was quite incom­plete, on the mat­ter, or dis­cour­ag­ing, and there wasn’t much on the Inter­net, so, after fig­ur­ing out what the way is, I’ll share it here.

First of all, you’ve got to make sure that your sys­tem is con­fig­ured for UTF-8 locales:

bash-3.1$ locale
LANG=en_GB.utf8
LANGUAGE=en_GB.utf8
LC_CTYPE="en_GB.utf8"
LC_NUMERIC="en_GB.utf8"
LC_TIME="en_GB.utf8"
LC_COLLATE="en_GB.utf8"
LC_MONETARY="en_GB.utf8"
LC_MESSAGES="en_GB.utf8"
LC_PAPER="en_GB.utf8"
LC_NAME="en_GB.utf8"
LC_ADDRESS="en_GB.utf8"
LC_TELEPHONE="en_GB.utf8"
LC_MEASUREMENT="en_GB.utf8"
LC_IDENTIFICATION="en_GB.utf8"
LC_ALL=en_GB.utf8

If the out­put of the locale doesn’t look like that, you want to recon­fig­ure your locales. On Debian, wha you have do is:

sudo dpkg-reconfigure locales

Here’s some scree­nies of what to expect:

dpkg-1.png
dpkg-2.png
dpkg-3.png

Generating locales (this might take a while)...
  en_GB.ISO-8859-1... done
  en_GB.ISO-8859-15... done
  en_GB.UTF-8... done
  en_US.ISO-8859-1... done
  en_US.ISO-8859-15... done
  en_US.UTF-8... done
Generation complete.

Per­fect, now that our sys­tem is con­fig­ured for UTF-8, we want to con­fig­ure our ter­mi­nal emu­la­tor. If you’re using xterm, you can invoke it with the -u8 switch, or just do uxterm, and that’s all that’s needed. If you’re using the gnome-terminal, go to the Ter­mi­nal menu, then choose Set Char­ac­ter Encod­ing and then UTF-8. If UTF-8 doesn’t appear in the list, you may want to try to logout and login again. While you’re at it, in the GDM login man­ager, go to the Lan­guage option and choose UTF-8 there too, so that it will be default.

Now let’s take care of GNU/Screen. In order to enable UTF-8, all you have to do is launch it with the -U switch:

screen -U -S irc

irc is just the name I want to assign to that screen ses­sion. Notice that if you want to switch a liv­ing screen ses­sion to UTF-8, you could do it for each win­dow, using the com­mand CTRL-a : utf8 on.

Once your GNU/Screen is con­fig­ured for UTF-8, you have to finally set up your irssi client. This was, for me, the tricky part, since the doc­u­men­ta­tion is a bit unclear, and I didn’t real­ize that my irssi wasn’t built with recode sup­port. To make sure that your irssi is, fire it up and give the command

/recode

If you get some­thing like

Target                         Character set

then every­thing is alright, oth­er­wise, if you get a No such com­mand error, you will have to rein­stall irssi with recode sup­port.

Irssi UTF-8 sup­port is made so that you are able to recode to dif­fer­ent charsets, depend­ing on the server or chan­nel you’re chat­ting in. First let’s set up some gen­eral options:

/set term_charset UTF-8
/set recode_autodetect_utf8 ON
/set recode_fallback UTF-8
/set recode ON
/set recode_out_default_charset UTF-8
/set recode_transliterate ON

These options will be the default, unless over­rid­den for spe­cific servers or chan­nels. What do they mean?

  • term_charset: this is the char­ac­ter set of your ter­mi­nal emulator
  • recode_autodetect_utf8: irssi will rec­og­nize UTF-8 input auto­mat­i­cally and treat it consequentially
  • recode_fallback: when we get some non-UTF-8 text from a chat peer, the text should be con­verted to this char­ac­ter set
  • recode: this enables the whole recode thing
  • recode_out_default_charset: this is very impor­tant: this is the default charset that you send out, unless dif­fer­ently spec­i­fied by a server/channel rule (we will see that shortly)
  • recode_transliterate: this enables translit­er­a­tion of the clos­est match: i.e. if some­one sends you a char­ac­ter that’s not in your charset, it will be translit­er­ate to the clos­est pos­si­ble one, or with a ques­tion mark, if none found

Now, you prob­a­bly need dif­fer­ent recodes on dif­fer­ent chan­nels, because you may speak dif­fer­ent lan­guages on dif­fer­ent chan­nels. For exam­ple, I send out UTF-8 when typ­ing on Eng­lish speak­ing chan­nels, and ISO-88591 or ISO-885915 when typ­ing on Finnish or Ital­ian speak­ing chan­nels, so peo­ple on the other end will always get my char­ac­ters right.

You need to add rules with the /recode command:

/recode add ircnet/foo ISO-8859-15
/recode add ircnet/bar ISO-8859-1
/recode add freenode/gee ISO-8859-1

Those com­mand will make you “speak” ISO-885915 on #foo on IRC­Net, and ISO-88591 on #bar and #gee in freen­ode. Every­where else you will “speak” UTF-8.

And this is what we get: here I’m typ­ing (er… I’m copy-pasting from Wikipedia) some text:

irssi.png

If you con­nect via SSH to a remote machine, where you run irssi inside screen, all you have to do is to set both sys­tems to use UTF-8, as explained in the begin­ning of this arti­cle, and then set the ter­mi­nal of the machine from which you SSH, to use UTF-8, as explained earlier.

9 responses so far

Useless fuss about ZIP and RAR

Feb 24 2007

There has been some fuss gen­er­ated by Jeff Atwood (who is a Win­dows devel­oper, which is bad, and a Visual Basic one, which is worse), who seems, in my hum­ble opin­ion, to be giv­ing par­tial infor­ma­tion around, as closed in his Windows-only world as he appears to be. In a recent arti­cle of his, Jeff makes a basic com­par­i­son between the ZIP and RAR com­pres­sion sys­tems. Unfor­tu­nately, most Win­dows peo­ple com­pletely ignore that there’s some­thing much bet­ter out there, that has been float­ing in the *nix world for quite a long time now. I’m talk­ing about the pow­er­ful com­bi­na­tion of tar and bzip2.

Let’s get to the facts right away. To exper­i­ment around, I’ve used a direc­tory con­tain­ing the source code of the Linux ker­nel, then I built that ker­nel, so that the size of the direc­tory would be pretty big, and we would have both text files and binary files.

Here’s the size of the orig­i­nal directory:

$ du -sh /usr/src/linux-2.6.18.3
539M    linux-2.6.18.3

This is what hap­pens with ZIP:

$ time zip -r ~/linux /usr/src/linux-2.6.18.3
...
real    2m35.917s
user    0m32.486s
sys     0m6.024s

$ ls -gGh ~/linux.zip
-rw-r--r-- 1 141M 2007-02-24 01:04 /home/siovene/linux.zip

Fine, 141Mb in 2 min­utes and 35 sec­onds. Let’s try RAR:

$ time ./rar_static a ~/linux.rar /usr/src/linux-2.6.18.3
...
real    5m8.715s
user    2m14.012s
sys     0m12.473s

$ ls -gGh ~/linux.rar -lh
-rw-r--r-- 1 132M 2007-02-24 01:26 /home/siovene/linux.rar

Ouch! Dou­ble time and just a slightly bet­ter com­pres­sion! Let’s try TAR and BZIP2:

$ time tar cv linux-2.6.18.3 | bzip2 > ~/linux.tar.bz2
...
real    4m22.265s
user    2m38.134s
sys     0m5.608s

$ ls -gGh ~/linux.tar.bz2
-rw-r--r-- 1 90M 2007-02-24 01:09 /home/siovene/linux.tar.bz2

Not too faster than RAR (but using two pro­grams com­mu­ni­cat­ing through a pipe, so some over­head), but much more effi­cient! The com­pressed file is only 90Mb start­ing from an orig­i­nal uncom­pressed of 539Mb

Let’s sum­ma­rize the data:

Method        Time        Size
zip           2m35s       141Mb
rar           5m08s       132Mb
tar.bz2       4m22s       90Mb

In con­clu­sion, you should use the best tools, inter­de­pen­dently from their pop­u­lar­ity, and remem­ber that there is so much more than what you can see from your Windows-user-perspective.

10 responses so far

Architecture of patching semantic versus logical content

Feb 19 2007

Inspired by a cer­tain patch that hit a darcs repos­i­tory to which I con­cur, I would like to talk about one thing that devel­op­ers don’t seem to get very often, when using revi­sion con­trol sys­tems: the struc­ture of your files in the repos­i­tory should have noth­ing to do with the log­i­cal units that make your patches, or with the com­ment of your patches them­selves.

Yes­ter­day, I saw this patch hit the repos­i­tory: “Adding Cloth.h to the repo”. The patch was adding an empty file, named Cloth.h. What’s wrong with this? A cou­ple of things:

  1. The patch adds no log­i­cal value unit to the repos­i­tory, but merely a tech­ni­cal value, i.e. an infor­ma­tion about the con­tent of the repos­i­tory itself, which is, then, absolutely redun­dant, as you could retrieve that infor­ma­tion in a sep­a­rate (and more proper way), which of course depends on the revi­sion con­trol sys­tem you are using. Indeed it was just a tech­ni­cal infor­ma­tion. Fur­ther­more, the fact that the file was added, would have been there and obvi­ous also with­out hav­ing to ded­i­cate a sin­gle patch to it.
  2. The com­ment (“Adding Cloth.h to the repo”), once again, doesn’t make any log­i­cal sense of its own, as adds an infor­ma­tion that was already avail­able using the revi­sion con­trol sys­tem tools.

What is a bet­ter way to do that? A patch named “Pre­lim­i­nary sup­port to clothes”, which would add the file Cloth.h with its con­tent, even if not yet func­tional, makes per­fect sense. It means that you’re adding some log­i­cal value to the repos­i­tory, and the value that you’re adding has noth­ing to do with the way that value is rep­re­sented (the file Cloth.h), or that it’s being actu­ally added to a repository.

In other words, the form and con­tent of patches should not only rep­re­sent sin­gle units of implicit log­i­cal value, as dis­cussed ear­lier, but should have no aware­ness what­so­ever of being part of a revi­sion con­trol sys­tem, or being uploaded to repos­i­to­ries, con­tains file, or even being patches at all!

Read more ver­sion­ing tips here.

No responses yet

On the day I go to work for Microsoft

Feb 18 2007

” On the day I go to work for Microsoft, faint oink­ing sounds will be heard from far over­head, the moon will not merely turn blue but develop polka­dots, and hell will freeze over so solid the brim­stone will go superconductive. ”

by Eric S. Raymond

No responses yet

« Newer - Older »