TABs vs Spaces. The end of the debate.

by Salvatore Iovene on 14 May 2007 — Posted in Coding, Articles

When writing source code, indenting is very important. Having a neat and clean programming style, let alone a precise and uniform one, is probably one of the most important keys when attaching example source code with a job application. I was myself asked to show some of my source code in my last two interviews. Nobody ever asked me to show any running program that I had made, though. Wonder why? A lot can be understood about the author just by glancing quickly at some source code.

Indenting makes the source code easier to read for us human beings, whereas the compiler doesn’t really care (except for some languages, where indentation applies as a syntax element). Even if you’re not a programmer, you can see the differences here:

Compiler friendly

Compiler readable

Badly indented

Badly indented

Properly indented

Properly indented

There is, I guess, no question that the last one, labelled as “Properly indented”, is the most readable. Problem arise, though, when people start wondering what they should use as indenting character. Some prefer TABs, other prefer blank spaces. A TAB, the key on the left of the Q of most Qwerty keyboards, is a single character that a text editor can represent whatever way it wants. This is usually customizable by the user, of course, so she can decide that a TAB will be shown as 8 spaces, or 4, or 2.

You can hear all the time someone claiming, in turn, that TABs are evil or that spaces are evil, but the truth is that none is wrong, as long as you can indent.

I’ll use, as an example, a piece of source code taken from the ext3 module of the Linux kernel. The Linux programming guidelines recommend using TABs for indenting, and that they should be 8 spaces wide. Let’s have a look at some code.

8-space TAB

8-space.jpg

4-space TAB

4-space.jpg

2-space TAB

2-space.jpg

As you can see, the original intent of the author, was to have the variable names aligned. But that alignment gets screwed up as soon as a reader has a different space-size for her TABs. What’s wrong there? Let’s use a very useful Vim tip: the :set list command.

:set list
set-list.jpg

This way, we can actually see the TABs, as “>——-”. Of course there will be less dashes if part of the TAB area is occupied by some text. So, can you see what’s wrong with that? The author of that source code is using TABs not only for indenting, but also for aligning! That way his alignment gets messed up when somebody uses a different TAB size. The solution of this problem is to simply just use what ever you want for indenting, but use spaces for aligning. Indenting must only be that left margin that you give to some lines, but it’s not to be confused with alignment. If the author of that source code had used TABs at the beginning of the lines, but just blank spaces between the type and the name of the variables, his code would be as he meant it whatever indenting style one’s editor would use.

So, in the end, it doesn’t matter whether you use TABs or space, for indenting, as long as you use just spaces for aligning.

Useful Vim/Emacs tip

I like spaces, and add the following to the end of all of my source files:

/*
Local Variables:
mode:c++
c-basic-offset:2
c-file-offsets:((innamespace . 0)(inline-open . 0)(case-label . +))
c-tabs-mode:nil
End:
*/
// vim: filetype=cpp:expandtab:shiftwidth=2:tabstop=8:softtabstop=2

This way, if the reader uses Vim or Emacs (and maybe also gedit), her settings will be temporarily overridden by mine, so, if she’s going to change my code, there are little chances that she’ll mess up my indenting.

The :set line options I use are the following:

set listchars=tab:>-,eol:$,trail:.,extends:#

It helps me to also spot trailing spaces. I recommend everybody to use the :set list, as it will prevent you to accidentally mess up other’s indentation.


Get new articles via email

Related posts:

How to improve the quality of programmers

by Salvatore Iovene on 9 March 2007 — Posted in Opinions, Coding, Articles

After claiming that most programmers just can’t program, and actually addressing most of the problems to the lack of passion of people who decide to start a career as a programmer, I would also like to express my point of view on a tightly related subject: what can be done to improve the situation? The problem that I was trying to bring up in the spotlights, is that a lot of people just start (or wish to start) a career in the IT for no particular reasons. Those are the ones who don’t love and don’t loathe programming, and they just see it as something that pays their bills. Well, maybe the first question that I should address, actually is: why is this bad? Sure there are so many jobs which don’t require passion at all, and people just do them because a job is just a job, and don’t really care. In my opinion, being a programmer is different.

There are many people, especially the ones who sit high in the hierarchy of a company, who see programmers as the last and least important step of a ladder. They often think that programming is quite of an automated and repetitive task, and it could basically be done by anyone, with just a little training. Unsurprisingly, this seems to be the opinion of most common people, who ignore what programming really is. I wouldn’t want to discriminate among different types of programming, or different programming languages, but it’s obvious to me that programming, to some extent, actually can become an automated an repetitive task. That’s quite the minority of cases, though, so I will simply ignore them, and focus on the rest.

As anybody who’s a programmer knows, programming is a highly creative task, that requires good imagination and great problem solving skills. Everybody else might just see it as “typing stuff on a computer”, and believe me, there’s a whole lot of educated people who think that programming is a monkey matter. Hence the term “code monkey”. This term has historically been abused a lot, by even programmers themselves. A “code monkey” is said to perform a programming task so easy that even a monkey could do, as the image suggests. There are two truths about this phenomenon: first of all, luckily, programming requires far more skills than it’s usually believed; secondly, and sadly, the majority of people just ignore it.

The problem with lousy programmer is kind of similar to a medal: it’s double faced. You could actually call it a dog trying to bite its own tail: as programming is believed to be an easier and easier task, more programmers are needed; as more and more programmers are needed, more people will jump on the field; as more and more people try to become programmers, the lousier the average quality of programmers gets. Unfortunately, what average non-programming people miss to understand is that although it doesn’t really take a hard training to become a lousy programmer, it takes a damn hard one to excel in the art of programming. Moreover, most people just lack the innate logic mechanisms that make you a potential programmers. Such mechanisms are developed in your mind when you’re very young, and it’s quite rare to develop them after your twenty-somethings. With this, though, I’m not denying that there are a lot of people who actually do develop those mechanisms in advanced age. I’m just trying to think of the big numbers, here.

So, getting to the point, what went wrong and how can it be fixed? I don’t think it would be wise to say that what’s wrong is that there’s too much need of programmers, ergo the average quality was inevitably doomed to lower and lower over the time. I rather think that the problem is with education. Of course I can’t speak for all the universities and colleges in the world, but I can at least try and speak for the one I’ve known personally, or through people who have studied there. It seems that, as more and more people apply to Computer Science or related departments, the easier it gets to get in (sorry for the pun), and to get through with it, i.e. to graduate.

I know this happens most likely in any other faculties, but seeing that there are people who have been studying CS for three or more years, and still can’t get through the most simple concepts, just doesn’t seem right to me. Yesterday night, I was sitting in an IRC channel about the C programming language, when somebody joined in and asked:

“I just started studying structures in C, and I don’t get them. Can anyone explain to me what’s the use for them?”

Ok, I don’t really think there’s anything wrong in not getting the point of C structures right away, but after a little chatting, it turned out that the guy was in his second year of Computer Science, and this was the second time he took the C class. Still that wouldn’t be a reason of hatred, of course (not that I have any hatred), but after another small while it turned out that the guy didn’t like programming at all, but he just got himself into it because he applied to CS since he liked to “fiddle around with computers”.

What’s really needed, in my opinion, is a harder and less tolerant educational system, that would be more selective, rather than pushing everyone forward. People that find out to be really not made for it, should just give up and move their focus on something less.

I’m actually very well aware that a lot of programming work, nowadays, is not really rocket science, still this doesn’t mean that it should be done by completely unqualified people. If what Jeff Atwood says in his post about programmers who can’t program is true, and that is that 199 out of 200 applicants (not programmers, applicants) can’t write any code whatsoever, than it obviously means that something is wrong. Looking at the numbers provided by Joel Spolsky, it looks like a lot of these basically incompetent people are going to end up working on an actual programming job, and maybe their code will end up on The Daily WTF (Paula, are you there?).

Unfortunately, the education is not the only one to blame. No matter how much education will improve, there will always be unqualified people who are going to apply for jobs that require a lot of skills, and in the end the odds will help them, so they’ll manage to get a job as a programmer. Is it so bad, considering that it’s most likely not going to be any critical position, and the only ones that will be damaged will be the owners of the company that hired them? Well, the point is that this is not true. There’s someone else who gets damaged, in this scenario. I’m talking about the community out there, the good programmers, who find themselves competing with newbies who’re happy to earn peanuts. The salaries keep going down, and customers are not able to distinguish a good job from a good one.

In a comment on the previous post of mine about this subject, Hoowie Goodell really gets a great point with this paragraph:

“There has been a great effort to industrialize programming, too. Again, there are many good features, and it’s a field I’m interested in. Building a large program requires a structured approach. Language design, libraries, programming frameworks and IDEs can and should incorporate as much existing human knowledge as possible — computer science, domain knowledge, solid pre-written code and human interface principles. (Check out Thomas Greene’s “Cognitive Dimensions of Notations” for some of the latter — I think of how programming tools fail to use them on a daily basis!)”

In a way, this suggests that the whole system is not ready yet, as it’s indeed years and years behind several other engineering fields, and that’s a good reason, probably, to explain why it’s so easy to fail at being a good programmer. Let’s just try to get some insightful inspection points, in order to build better generation of programmers:

  1. Better education.

    The whole higher educational system should be improved in several way. Worldwide. Nowadays, it looks to me that in many countries graduation is just a direct consequence of applying to an University. Unfortunately, this kind of problem must be addressed on a country-basis, to properly identify the specific issues, but still the options that I would like to consider are worth mentioning. It all comes down to a single point: there should be less tolerance towards people that don’t learn. The thresholds for succeeding in a course should be raised to greater difficulty. Current models of testing should be seriously revised, so to ensure that students that really didn’t understand the subject are not going to make it.

  2. Better tools.

    Are we trying to make programming just like a building-chain or are we not? If we are, as it seems nowadays, then the tools are not ready yet to second our intentions. Programming is too error prone and too time-consuming.

  3. Better process.

    Software process that doesn’t conform to some standards, say ISO-9000 (sorry if it’s inappropriate, I’m not an expert on this kind of standards), shouldn’t be allowed to sell. Quality insurance committees should be taken more seriously as being part of the process. This might be against all principles of liberalism, I know, as bad software, you may say, will not sell anyway. I know many bad software that did sell well, for greatly different reasons than its (non) good quality.

  4. Better judgment when hiring.

    I’m not going to try to teach you how to run your company, nor how to hire your crew. But sometimes really crazy thing happen (again, is Paula around?). A very interesting post by Joel Spolsky (I’m sorry, I can’t find it anymore: does anybody know the link?) talks about only hiring “A”-people, where “A” means top class. If you’re ever hiring a “B”-person, he’s quite likely to hire a “C”-person, someday. After that, it’s chaos. I recommend anyone not to lower their canons of perfections. Here’s another great article by Joel, about hiring good developers, I recommend it.

Concluding, improving the quality of programmers seems really to be a tough issue, and the whole thing depends on so many factors that tracking a precise problem is impossible. Cultural and technical difficulties arise all the time, and getting clues is hard. I’ve tried to get around the problem and give some insightful opinions: what do you people think?


Get new articles via email

Related posts:

Why most programmers are lousy

by Salvatore Iovene on 8 March 2007 — Posted in Opinions, Coding, Articles

I’ve been in the IT field long enough to get to know many programmers, both experienced and just wannabies. During this time, I’ve realized that most of them are just bad programmers, simply said. I find myself agreeing with a brilliant post by Jeff Atwood, which alleges that programmers can’t program. What are the reasons for this? Many. Probably, IMHO, the main fault has to be addressed to the lousy education that people receive. But then again, the ability of giving education remains directly proportional to the ability of getting it, and where I see people complaining about low quality of education in University, I also see students with no interest in learning. Let’s see some of the reasons why programmers can’t really program.

  1. Young people study Computer Science just because it’s a trend. It sounds almost unbelievable to me, but I must admit it’s mostly true. The vast majority of my old University mates just applied to the Computer Science department because… well: everybody was doing so. They followed the rest of the sheep.
  2. Young people study Computer Science because they wouldn’t know what else to do. That’s really another strong source of applications to Computer Science. A lot of young people in their teenage years just don’t know what they want to do as grownups. Computer Science still seems to be a good career opportunity, so they just go for it.
  3. Young people study Computer Science because they think it’s a sure way of getting a job. 10-something years ago there was a big boom, and if you just knew some HTML, were thought to be a computer guru. These types of belief mark a deep footprint on popular sayings, hence the wave of people applying to Computer Science just because they can work, is still there.
  4. Many of today’s programmers, were doing nothing else than surfing the net or using Word till last year. Especially in small and vertical based markets, improvisation just rules. People learn something, and literally throw themselves on the field. Drawbacks for quality of their work are simply inevitable. This is not only a group of illiterate people that just jumped in to catch the big wave (what big wave, nowadays?), but people with no passion whatsoever. In other words, I don’t think it’s possible, nowadays, to become a great programmer if you didn’t start getting some interest in the field when you were very young, say about 10 years old (with the due exceptions, of course).
  5. Many of today’s Computer Science students have no interest whatsoever in what they’re forcefully studying. Just put together the previous items in this list and what do you get? A bunch of people who just don’t care, who want to get their piece of paper (the degree) as soon as possible, and have absolutely no passion in what they learn. That’s the worst. I strongly believe that programming is not just a job like many others, but you need passion to get best at it.
  6. A lot of programmers just don’t like to program. This goes for 100% of my ex University mates! Think of that: 100%. Of course it’s not the whole world but it makes a small statistics.
  7. A lot of programmers just don’t get it. Not even the easy things. I was asked, few weeks ago, by a friend of mine who’s been studying Computer Science for now 4 years, what the difference is between a private and protected method in Java. Apparently reading the books isn’t enough anymore, nowadays. Another guy asked me: “I’ve studied pointers in C, and I think I understood them. Still I can’t find any use for them… are they really used at all?”.
  8. Basically all of the programmers, or wannabe programmers, mentioned above, are miles away from the technical community. These people will totally ignore the existence of:
    • Slashdot and similar
    • RSS
    • Usenet
    • IRC (“Is that like MSN?”)
    • SVN and similar

As you can see, a really strong point, in my opinion, is the lack of care and passion for the subject of programming itself. Lousy programmers are bound to program to take a wage home; good ones are bound to program for the sake of programming itself. Or course you can do that but still miss to be a good programmer, but all falls down to numbers.


Get new articles via email

Related posts:

How to write robust code

by Salvatore Iovene on 13 January 2007 — Posted in Coding, Design, Articles

As software is one of the most important issues in our era, writing good robust programs is essential. This article is an in-depth essay focused on Object Oriented software and large projects. Everything said here, though, scales well to good directives for small projects as well.

Our time is dominated by software. There is basically software everywhere around us; most of the object you can see right now around you, have something to do with software, probably because they were created using some sort of machine. Given the importance of software nowadays, I just have to find bugs unacceptable. Of course you might argue that a small and rare bug is a minor software won’t harm anyone, and is not nearly as important as a bug that could affect the software of an airplane, and I’m going to agree with that. But as time goes by, everything has to be going towards perfection, and current trends about software seem to be going nowhere: there were bugs in software 30 years ago, and there are today. There was a time, in the beginning, where scientists thought that it would be relatively easy to write bug free programs right away, but then they realized pretty soon that it wasn’t quite so. After all, software is written by us human beings, and we are doomed to make mistakes or omissions. The point of this article is not that software should be always bug free, but that we, coders, should always get them to the minimum, and here I’m going to present some ways to deal with programming in general.

One huge problem, as I’ve faced quite often, is that as a program grows in size and dependencies, its developers start losing trace of its components, get further away from the big picture, and ease the introduction of bugs. Note, I’m not talking here about bugs caused by a single human error that can be labeled as a cheap error by anyone who would look at the code. I’m talking about the sort of nasty bugs that nobody can spot right away with a glance at the code. I’m talking about system wide bugs, usually emerging as a result of hardly related subsystems of the program. Usually connections between dependencies and libraries.

Anyway, the path to write bug free code, is the one you step when you write robust code. What do I mean by that? Robust code has some features:

  • Well designed
  • Neat and tidy
  • Well named
  • Well commented
  • Well tested
  • It never segfaults

As a result of some of these, robust code is also:

  • Exstensible
  • Reusable
  • Lasting in time

Well designed.

Having already talked about this somewhere else, I’ll be brief on this section. Writing a complex program, a program made of hundreds of thousands lines of code, is a damn complicated thing: it takes many people and a lot of time. Usually, the more people you involve in the project, the less robust code you’ll get in the end. People will use different conventions and different styles. For this reason, not only it’s crucial to hire the right developers, but it’s essential to have a very strict and detailed specification of the project. Programming is a creative work, no doubt, and coders need to have freedom so they can breathe. A constrained coder is a chained coder, hence a dead coder and a threat to the quality of the end product. But, in spite of how much we care for the freedom and openness of initiative from the developers, we have to be aware that loosing control means lowering the quality. A large project must be designed thoroughly and carefully, in every single details. Even though programmers love freedom, most of them also love exhaustive documentation. If you want to make a good coder happy, and get the best out of him, flood him with docs and specs. Nothing pisses off the good coder as the lack of documentation: it tears his motivation apart. “Why should I start to read their minds and run by guesses” - he thinks, “when they didn’t even get the time to write good specs?”. Furthermore, a project without good specs looks superficial, destined to failure and without a future. A very good coder is hardly going to stay in a company that doesn’t make good design for the projects. He will think that it’s a loser company, and start looking around.

But what does good design mean? A good design is:

  • Exhaustive
  • Non redundant
  • Non contradictory
  • Easy to understand
  • Related 1:1 to the implementation

We want to cover every possible outcome in our specification, let be them exhaustive so that nothing will be left to case. We don’t want to repeat the same information more than once, and be redundant for several reasons, e.g. information should be retrievable in exactly one place, and it would ease up contradictions. Documentation should be for the developers, i.e. written in the most straightforward way for the right audience: simplicity of language and straightforwardness of tables and schemes will spare some curses from the developers. Furthermore, as a specification is just a way to put a program in words before it’s written, developers should be easily able to translate what they see on paper to code. Think about a shopping list: when I get one, I just go to the shop and take care of translating each item on the list to a physical item in my shopping cart. Direct and easy.

Neat and tidy

A good definition of neat is: in a pleasingly orderly and clean condition. How does that apply to software? What is neat software? One nice word that I like in that definition is “pleasingly”. Neat software pleases the eye and the mind. Don’t want to be cocky here, but neat software is something written by a good programmer, and will be appreciated by another good programmer. If somebody known as a good programmer points at some software and says “That’s neat” and you find yourself looking at it and replying “Huh? That’s just code”, I’m sorry but chances are that you are not a good programmer. A good programmer appreciates the beauty of some code, both on a small scale and on a large scale. Neatness of software on a small scale means that you’re able to look at one function and appreciate the simplicity of it. Neat pieces of code are easily readable and use good name conventions. Please read this article if you want to know more about good code on a small scale. Neat code on a larger scale, on the other hand, means neat integration between components and subsystem of a project. A bad integration would mean, e.g., having a project-wide global variable that points to a certain subsystem, and using it everywhere in the project. Or having two subsystems that, in a messed and intertwined way, mutually call each other’s methods violating several layers of abstraction. Proving what neat code is, turns up to be very difficult. It’s a bit like the opposite of what happens with common logic: if I want to prove you that, say, lions exist, I can just go to Africa, pick one and show it to you, then say “That’s a lion, ergo lions exist”. But how can I prove that unicorns or dragon don’t exist? You probably agree that it’s much more difficult. It’s just the opposite with neat code. I can show you bad code, and you will easily agree that it’s bad. But looking at neat code doesn’t it prove it neat right away. It takes probably years and years of experience, writing a lot of code and reading a lot.

Well named

This topic has already been discussed here, but repetuta juvant. As code is managed by possibly dozens or more people, being understood is an important key to increase robustness of the code. Writing robust code also means writing code that will easily stay robust when other people will modify of expand it, unless they have no clue, of course. The most your code is understood by others, the most likely they will not break your ideas, and keep the code robust. There are several ways of making own code easily understood, and having a good, consistent and solid naming convention is one of them. Of course, as discussed later, code needs to be well documented also.

Well commented

I know, I know. Everybody says that you should comment your code. That’s what I say and that’s what I’ve been told. Still I’m now comment my own code enough as I should. Before you can then tell me “Who are you, then, to tell me to comment my code, if you don’t do it enough with yours?” let me remind you that we learn from mistakes. What they don’t tell you about the importance of commenting code, is some subtle and psychological little thing. If you are a bad programmer, you’ll never produce good code. But if you are a good programmer, sometimes being in a hurry will make you produce really bad code. There are two reasons why this can happen: 1) you are in a hurry because you’re late with your deadlines. With this, there’s nothing to do. 2) you are in a hurry because you’re just coding fast, on the rush of some ideas that flashed you. In this case, commenting your code a lot will improve drastically the quality of your code. Always write your comments before writing the actual code. This will make you realize it, if your function is not really going to do what it’s supposed to do. Writing the comment will also help you think more about what you’re doing, and being more conscious about it. It will keep your state of mind clear and precise. I strongly recommend using Doxygen to generate a browseable HTML version of your comments, especially if you’re writing a library. Otherwise, it’s still going to keep you on a professional line, which is always a good thing.

Well tested

Write and use unit tests. If your code is well designed, there are good chances that each function in your code, or each class, performs a specific task in a certain way, and nothing more. Given a certain input, it will reliably return the same output. Right? You have to make sure of that, by writing test cases. Testing the smallest units of your program doesn’t ensure that the whole is working perfectly, but helps. Possibly, append a hook to your Source Code Versioning System (SVN? Darcs?) so that the automatic testing suite will run automatically on the server that hosts your repository, before it accepts your patch. This is quite easy with Darcs.

It never segfaults

Of course this point applies to the languages that allow segmentation fault, or NullPointerException (in Java). It’s easy to get: if your code segfaults, there are no excuses. No matter how stupid the provided input was, your program should not segfault. A good practice, is that each and every function/method would check it’s argument before doing anything. A solid exception handling structure is required. Again, you can object that I’m not really saying anything useful here: “Of course programs shouldn’t segfault, I knew it!”, but think about it: it’s a matter of attitude. You want to write a perfect program, and there are some things you have to keep in mind. Be paranoid with segfaults will implicitly and secretly improve the general quality of your code, without you even noticing.

Conclusion

Writing perfect code is impossible. Especially as the code grows in size and number of programmers. Achieving the impossible, then, is beyond any good intentioned coder. What we can do, though, is just try to have the right attitude, which is about precision, care and, sometimes, paranoia. Writing complex programs is not an easy thing, and, as such, should be handled with extreme care.


Get new articles via email

Related posts:

Common mistakes when approaching OO design - Class dependencies

by Salvatore Iovene on 21 December 2006 — Posted in Coding, Design, Articles

Here we continue with explaining some of the mistakes commonly made in Object Oriented design, and the good practices that are often ignored. This article is focused on code maintainability and on improving cooperation with people working at the same project.

Encouraging class dependencies.

Having a lot of (mutual) dependencies in the code is quite typical of Spaghetti Code, and it’s definitely something we want to avoid, in order to keep our design neat, improve maintainability and ensure ease of collaboration with colleagues. What do I mean by “class dependencies”? Let’s continue the example from the last article, and suppose we have a certain class GuiManager which, at some points, wants to generate some reports. Let’s introduce now a certain ReportManager, which is a class responsible for generating reports. We have two types of report: TableReport, and ChartReport. They look like this:

class TableReport {
    public:
        void report()  {
            // do something
        }
};

class ChartReport {
    public:
        void report()  {
            // do something
        }
};

This means that the ReportManager will have to look something like this:

class ReportManager {
    public:
        void reportAll() {
            m_tableReport.report();
            m_chartReport.report();
        }

    private:
        TableReport m_tableReport;
        ChartReport m_chartReport;
};

There are several problems in this implementation. First of all, If the guy responsible for the TableReport one day wakes up, and decides that the method report() should rather be named generate(), he will not only be allowed to just change that and commit to the repository, but this will break the ReportManager! So after a few hours, the guy responsible for the ReportManager checks out from the repository, builds, and finds out that all the times he has used the TableReport need to be changed. Of course this is something we don’t want to happen.

The usual approach to this, is using an Abstract Base Class (ABC), which is a very robust way to sort out problems like this. Let’s see come code:

class Report {
    public:
        virtual void report() = 0;
};

class TableReport : public Report {
    public:
        void report()  {
            // do something
        }
};

class ChartReport : public Report {
    public:
        void report()  {
            // do something
        }
};

Report is our ABC, and with it we are literally forcing the people who write TableReport and ChartReport to write a method named report(). So, this way we broke one dependency: the ReportManager doesn’t need to worry about the way every single report will call the method: it’s sure that a method named report() will exist.

There is, tho, another dependency. If somebody writes a new report, say XmlReport, this will need modifications to the ReportManager, because our logic so far implies that the ReportManager knows about all the reports. So, if we’re not the maintainers of the ReportManager (because maybe it’s in some different library, written by someone else, and we don’t have access to the code), we will have to go ahead and ask the rightful maintainer to modify the code. Hence, there’s an extra dependency, not structural, this time, but logical. What if the maintainer of the ReportManager gave us tools (read APIs) so that we can register our particular report to the ReportManager? Consider the following code:

class ReportManager {
    public:
        void registerReport(Report const & r) {
            m_reports.push_back(r);
        }

        void reportAll() {
            std::list::const_iterator iter;
            for(iter = m_reports.begin();
                 iter != m_reports.end();
                 ++iter)
            {
                iter->report();
            }
        }

        private:
            std::list m_reports;
};

This way, the ReportManager doesn’t have to know anything about any Report.


Get new articles via email

Related posts:

Common mistakes when approaching OO design

by Salvatore Iovene on 19 December 2006 — Posted in Coding, Design, Articles

Today I want to talk about Object Oriented practices, and 3 commonly made mistakes. Very often, when reviewing code written by other people, I find violations of common OO practices, that make the code a lot less maintainable. Here follows a list of the most common ones, and, of course, some explanations about them.

Layer violation.

While not the most common, this appears to me as the most dangerous. What is layer violation? Let’s show it with an example. Assume we have a GUI driven application that reads data from a database and shows the results on the display. We might consider having some upper level Controller class, and managers for each component, e.g. GuiManager, DbManager, ReportManager. Assume that the Controller class runs a loop, and in that loop we take care of refreshing the GUI. What I’m going to write now is really wrong:

this->guiManager().reportTable().update(this->reportManager().populate(
    this->dbManager().query(someSqlString)));

Well, there are many horrible thing here, but the layer violation happens in this->guiManager().reportTable().update(…). Imagine the various components of this scheme as layers on top of each other. We have the Controller, the GuiManager and a certain ReportTable.

What we’re doing, is accessing the ReportTable layer from the Controller one. Why is this bad? Having layer violations will fill your code up with disturbance. You will rapidly lose track of what does what (e.g., who is updating the ReportTable? The Controller or the GuiManager?), and this will end up into an intertwined mess commonly known as Spaghetti Code.
Doing that, you are performing actions from parts of the code to which those actions do not belong. Classes shouldn’t care about what other classes are, but only about what other classes do. Think about it: do you really want to let the Controller know that the GuiManager has a ReportTable, inside? Shouldn’t the Controller tell the GuiManager just what to do, rather than how to do it? Having classes access inner functioning of other classes will lead you to messy code, especially when there’s more than one people working on a project, as discussed later in this article. Having all the communication happening between adjacent layers will help us keeping the project consistent even in case of changes to components. Imagine if one day I will decide that the GuiManager doesn’t need a ReportTable, but a ReportChart. My ideal scenario is the one where all the changes I need to make are only within the GuiManager. But if there was a layer violation, such as the mentioned one, I would have to modify the Controller as well. When people in a group work on different components of a system, they don’t want to make a change that will break everything else. In order to avoid broken code, it would be a good practice to keep layers commnicate with the adjacent ones, according to well known interfaces.

Information hiding.

This brings us to our next point. What does the Controller need to know about the members of the GuiManager? Ideally, nothing. Ideally, there would be no getters or setters, since the Controller doesn’t need to know anything about the GuiManager’s inner functioning. What needs to be done, in fact, is designing a well known interface for the GuiManager that the Controller can use. Once designed, such interface should never be changed, in order to ensure maximum compatibility within the components. Imagine you have just a certain GuiManager::update() method, the Controller would just need to call this->guiManager().update() and, whatever the GuiManager does, is none of the Controller’s business. Inside, the GuiManager might do something like this->reportTable().update(), but in case this would change to a ReportChart, it wouldn’t break the Controller, and keep the people that work with it happy.

Abusing singleton pattern.

Singletons are not a way to get yourself some global variables. Think thoroughly about the reasons why you really need a Singleton in your program. Is it just a way to access some variables from everywhere in the code? If the answer is yes, you should consider refactoring your code to get rid of the Singleton class. Keep also in mind that Singletons are enemies of unit testing. Have a Singleton class do something, rather than contain something. A typical example of a class suitable to be a Singleton is a Logger class. You need to access it from everywhere in the code; the class doesn’t need to be aware of the application it’s in; the class does (logs) and doesn’t just contain. If you write a Singleton class like the following, you’re doing something wrong:

class AccessData : public Singleton<AccessData> {
    friend class Singleton<AccessData>;
    public:
        std::string username;
        std::string password;
};

This class seems to have the sole purpose of easing the access to a certain username and password from everywhere in the code, without the need of passing them around. You should consider passing references and data around only when needed, or adopting some signaling framework.



Get new articles via email

Related posts:

10 advice to write good code

by Salvatore Iovene on 14 December 2006 — Posted in Coding, Design, Articles

Having been coding for 16 years now (I started quite young), I have seen a lot of bad code. Code is not good just because it works. So here’s a quick list of 10 advice that you’d better keep in mind while coding.

  1. Don’t sacrifice code maintainability to performance, unless it’s strictly necessary.

    This happens very often. You have to consider that your code is likely to be read by many persons, and some of them will read it after you might have parted from that company. Remember that you won’t remember what your own code does after few weeks. So always try to put things in the most readable and obvious form, even if this will require writing more lines of code, or having less performing code. Of course this is not so important if performance is your number one issue. Try, for instance, to avoid use of the ?: operator in C/C++. Everybody will understand it anyway, but a good old if statement will do it, so why not going for it?

  2. Be precise as a Swiss clock, when it comes to naming conventions.

    Nobody wants to read class names or variable names that look like gibberish. Don’t be mean on the keyboard: when you type, remember that somebody else will have to read it, so be extensive.

    1. Name your variable NumberOfItems rather than items_n. Don’t use cryptic prefixes to class name. Name your class ClientMessageOperationsBasicFunctor rather than CMOpFunctor. It’s a lot more typing for you, but a lot less hassle reading for the ones that will come after you.

    2. Don’t change your conventions. If you’re calling the iterators i, don’t call any of them n, ever. You will induce confusion to your reader. It doesn’t seem as important as it actually is. If you call a class ClientMessageBlockContact, then do not have ServerMessageContactBlock. Be perfect, be precise, be obsessed.

  3. Use a good and consistent indentation style.

    Never ever have more than one blank line. Don’t have trailing spaces at the end of the lines. Don’t have blank spaces or TAB characters in blank lines. A blank line must be blank, that is.Be consistent: don’t use TABs to indent in one file, and spaces in another one. Possibly, use 8-chars wide TABs to indent. If you find yourself going beyond 80 rows too often, then that could be an indication that there might be some design flaws in your program. Tweak your editor to show you the end-of-line character and the TABs.

  4. One class, one file.

    Don’t write files like ServerMessages.h where you write all the class that are ServerMessages. One class goes in one file. If you find yourself thinking that you can’t do it, review your design.

  5. In C/C++, project includes use “”, dependency includes use <>.

    If you’re including a file that’s local to your project, use #include "file.h"; if it’s an external dependency, do #include <file.h>. Why? I’ve seen people including <file.h> and then just put things like /usr/includes/my_project in the inclusions search path, so that nobody will be able to compile before installing. That’s a bad assumption. And you don’t want to end up in that error.

  6. Always compile with -ansi -pedantic -Wall -Werror flags (or similar, according to your compiler).

    Let’s adhere to standards. Let’s avoid warnings. A warning might become an error in the future.

  7. Use TODOs and FIXMEs.

    If you know that you, or somebody else, will have to return on a certain piece of code to add or modify some functionality, please mark it with a TODO. If you know that a piece of code is buggy but you can’t fix it right now, add a FIXME marker. Later on, it will be easy to grep the source tree for TODOs and FIXMEs and analyze them, especially if they’re very well commented.

  8. Comment your own code.

    Seriously: you’re going to forget, sooner than you think. Just invest 5% of your time in writing commented code. Never assume that code is self-explanatory, just write a couple of lines for everything you do. Comments are not only meant to generate doxygen documentation. You have to assume that somebody else will read your code and need to modify/extend it.

  9. Use a versioning system even if you’re working alone.

    Yes, versioning is not just for working in a team. Use Darcs or SVN even if you’re working alone: you won’t regret it.Commit often and try to be professional all the time. Later on somebody else might join you. Or then you might find useful to revert to previous versions of your program. And it will help you to keep trace of what you’re doing.

  10. Use a bug tracking system even if you’re working alone.

    Things like Bugzilla are EXTREMELY useful. Usually you will forget bout a bug after less than 2 days. Everytime you find a bug, either fix it immediately, or mark it to your personal bugzilla. And always fix bugs first, and then write new code.

Common errors:

  • It compiles, so it works.
  • It works here, so it works everywhere.
  • Commenting? We don’t have time to waste, we gotta ship!
  • UML diagrams are useless.
  • Plan code so that we can reuse it is useless: we’ll end up writing everything from scratch anyway.
  • Unit tests are a waste of time.



Get new articles via email

Related posts:

Is your stacktrace really corrupted?

by Salvatore Iovene on 17 October 2006 — Posted in Howtos, Coding, Articles

You may encounter, during your debugging sessions, the `stack corruption’ problem. Usually you will find it out after seeing your program run into a segmentation fault. Otherwise, it must mean that some very malicious and subtle code has been injected into your program, usually through a buffer overrun. What is a buffer overrun? Let’s examine the following short C code:


#include <stdio.h>

void bar(char* str) {
    char buf[4];
    strcpy( buf, str );
}

void foo() {
    printf("Hello from foo!");
}

int main(void) {
    bar("This string definitely is too long, sorry!");
    foo();
    return 0;
}

There’s clearly something wrong with it: as you can see, we are copying `str’ to `buf’ without first checking the size of `str’. First of all there is a security issue, because if `str’ didn’t just come from a fixed string like in this case, but got inputted from somewhere (maybe on a website), then there could be a string long enough to overwrite the code of `foo’, and run malicious code on its behalf. What we have here, anyhow, is just a segmentation fault. Let’s debug the program.


(gdb) file stack
Reading symbols from /home/siovene/stack...done.
(gdb) run
Starting program: /home/siovene/stack

Program received signal SIGSEGV, Segmentation fault.
0x6f6c206f in ?? ()
(gdb) backtrace
#0  0x6f6c206f in ?? ()
#1  0x202c676e in ?? ()
#2  0x72726f73 in ?? ()
#3  0xbf002179 in ?? ()
#4  0xb7df9970 in __libc_start_main ()
      from /lib/tls/i686/cmov/libc.so.6
Previous frame inner to this frame (corrupt stack?)

Obviously something must have gone wrong. In order to better understand what is going on, let’s make a step back, and let’s examine a working example instead:


#include <stdio.h>

void bar(char* str) {
    char buf[4];
    strcpy( buf, str );
}

void foo() {
    printf("Hello from foo!");
}

int main(void) {
    bar("abc");
    foo();
    return 0;
}

This is the same code, but it’s been stripped off of the long string that caused the segmentation fault, and in its place we find a harmless 3 character string: `abc’. Let’s name the program stack.c anc compile it with debug informaion:


$> gcc -g -o stack stack.c

Now let’s debug it:


(gdb) file stack
Reading symbols from /home/siovene/stack...done.
(gdb) break bar
Breakpoint 1 at 0x80483ca: file stack.c, line 5.
(gdb) run
Starting program: /home/siovene/stack

Breakpoint 1, bar (str=0x8048545 "abc") at stack.c:5
5         strcpy( buf, str );

We have entered the bar() function, let’s examine the backtrace:


(gdb) backtrace
#0  bar (str=0x8048545 "abc") at stack.c:5
#1  0x0804840e in main () at stack.c:13

What is the address of the bar() function?


(gdb) print bar
$1 = {void (char *)} 0x80483c4

Let’s now be paranoid and check this out producing a dump of our executable:


$> objdump -tD stack > stack.dis

Open the file with your favorite editor and look for `80483c4′, the address of bar():


080483c4 <bar>:
 80483c4: 55                    push   %ebp
 80483c5: 89 e5                 mov    %esp,%ebp
 80483c7: 83 ec 28              sub    $0x28,%esp
 80483ca: 8b 45 08              mov    0x8(%ebp),%eax
 80483cd: 89 44 24 04           mov    %eax,0x4(%esp)
 80483d1: 8d 45 e8              lea    0xffffffe8(%ebp),%eax
 80483d4: 89 04 24              mov    %eax,(%esp)
 80483d7: e8 0c ff ff ff        call   80482e8
 80483dc: c9                    leave
 80483dd: c3                    ret

Perfect, that’s our function. But now let’s get curious. Where’s the stack pointer in the CPU registers?


(gdb) info registers
eax            0x0      0
ecx            0xb7ed11b4       -1209200204
edx            0xbff04f60       -1074770080
ebx            0xb7ecfe9c       -1209205092
esp            0xbff04f10       0xbff04f10
ebp            0xbff04f38       0xbff04f38
esi            0xbff04fd4       -1074769964
edi            0xbff04fdc       -1074769956
eip            0x80483ca        0x80483ca
eflags         0x282    642
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

The `esp’ register, on the architecture this article is written on, is the stack pointer. Its address is 0xbff04f10. Let’s examine the memory at that point:


(gdb) x/20xw 0xbff04f10
0xbff04f10:  0x00000000   0x08049638   0xbff04f28   0x080482b5
0xbff04f20:  0xb7ecfe90   0xbff04f34   0xbff04f48   0x0804843b
0xbff04f30:  0xbff04fdc   0xb7ecfe9c   0xbff04f48   0x0804840e
0xbff04f40:  0x08048545   0x08048480   0xbff04fa8   0xb7db3970
0xbff04f50:  0x00000001   0xbff04fd4   0xbff04fdc   0x00000000

With this command we have told GDB to examine 20 words in exadecimal format at the address 0xbff04f10. That’s because the value of the stack pointer is the address of the back-chain pointer to the previous stack frame. So address 0×00000000 is the address of the previous stack frame. But 0×00000000 is put in the stack frame in concurrence of the program entry point, i.e. the main() function. This agrees with the fact that we know bar() was called by main()!

Everything looks ok and in place, since the program works perfectly we weren’t expecting anything different. Let’s now do the same with the faulty program. At the moment of the segmentation fault, the backtrace looked like this:


(gdb) backtrace
#0  0x6f6c206f in ?? ()
#1  0x202c676e in ?? ()
#2  0x72726f73 in ?? ()
#3  0xbf002179 in ?? ()
#4  0xb7df9970 in __libc_start_main ()
      from /lib/tls/i686/cmov/libc.so.6
Previous frame inner to this frame (corrupt stack?)

To see exactly what goes on, it would be better to debug it more carefully:


(gdb) file stack
Reading symbols from /home/siovene/stack...done.
(gdb) break bar
Breakpoint 1 at 0x80483ca: file stack.c, line 5.
(gdb) run
Starting program: /home/siovene/stack

Breakpoint 1, bar (str=0x8048580
                    "This string definitely is too long, sorry!")
                  at stack.c:5
5         strcpy( buf, str );
(gdb) next
6       }
(gdb) next
0x6f6c206f in ?? ()
(gdb) next
Cannot find bounds of current function

Let’s then try to follow back the stacktrace, as we did previously:


(gdb) backtrace
#0  0x6f6c206f in ?? ()
#1  0x202c676e in ?? ()
#2  0x72726f73 in ?? ()
#3  0xbf002179 in ?? ()
#4  0xb7e9b970 in __libc_start_main ()
      from /lib/tls/i686/cmov/libc.so.6
Previous frame inner to this frame (corrupt stack?)

(gdb) info registers
eax            0xbfeed1e0       -1074867744
ecx            0xb7ea4c5f       -1209381793
edx            0x80485ab        134514091
ebx            0xb7fb7e9c       -1208254820
esp            0xbfeed200       0xbfeed200
ebp            0x6f742073       0x6f742073
esi            0xbfeed294       -1074867564
edi            0xbfeed29c       -1074867556
eip            0x6f6c206f       0x6f6c206f
eflags         0x246    582
cs             0x73     115
ss             0x7b     123
ds             0x7b     123
es             0x7b     123
fs             0x0      0
gs             0x33     51

(gdb) x/20xw 0xbfeed200
0xbfeed200:  0x202c676e   0x72726f73   0xbf002179   0xb7e9b970
0xbfeed210:  0x00000001   0xbfeed294   0xbfeed29c   0x00000000
0xbfeed220:  0xb7fb7e9c   0xb7fee540   0x08048480   0xbfeed268
0xbfeed230:  0xbfeed210   0xb7e9b932   0x00000000   0x00000000
0xbfeed240:  0x00000000   0xb7feeca0   0x00000001   0x08048300

(gdb) x/20xw 0x202c676e
0x202c676e:     Cannot access memory at address 0x202c676e

There’s only one explanation to that: the stack memory has been overwritten and now contains gibberish. We have been very unlucky with our example, but this gave us the tools to imagine another case. Let’s assume the stack got actually corrupted not because it was overwritten accidentally, but because GDB was failing to build it. In this case you are still able to navigate it backwards. All you need to do it keep following the value of the stack frames, starting from the `esp’ register, until you reach 0×000000. Write all the addresses down, and then use `objdump’ to obtain the disassembly and symbols information from the binary. All is left, now, is to check the names of the symbols matching the pinned up addresses.

If you can actually do that, than you have successfully reconstructed your stacktrace. It wasn’t really corrupted by a bug in your program, but simply GDB missed to keep it up with it.


Get new articles via email

Related posts:

  • No related posts