Sunday, October 27, 2013

Coding habits

I was wandering in quora and finally found a interesting thread: what are some of your weird coding habits?. I've been meaning to talk about my style choices, specifically in programming contests where the people that think that throwing away readability in exchange of a bit typing time is a fair exchange. I really like my code to be readable (for me) and have forged some coding habits.

#defines

In c++, #defines are a meta-programming tool inherited from c++. It is great to use defines if you want to have code that is created conditionally (using #ifdef). Some defines are useful because the text-processing allows things that a constant function cannot do, like: #define x(a) cout << "#a" << a . In the past, I used a for_each #define because it was much clearer than the alternative (But now c++11 has range-based for loops and a for_each macro is not needed anymore) Any other #define use tends to be something that Satan has introduced in c++ to allow people to make horrible code.

I am unapologetic. All those people using a REP #define to replace for loops. Also you, yes, you who uses #defines for stuff that typedefs and constants can do just fine. You should be ashamed. It is as if you were throwing plastic to the ocean of code. Think of the children, there are newb programmers that read up your code in the high ranks of SRMs and and think "COOL". Yes, really, this awful thing you are doing is IMITABLE BEHAVIOR. How do you sleep at night?

I am not talking just about code purity, #defines are risky and can bring you confusing bugs.

//First of all, this causes a syntax error:  
#define MAX 1000 //Discouraging comments?

// Now think about this, if I use a constant:
const int MAX_N = 50;
// I can then use the same constant to generate other constants
const int MAX_V = MAX_N * MAX_N + 2; // maximum number of vertices in the graph
                                     // note how I can comment that code

//If I instead used defines for constants:
#define MAX_N 50
#define MAX_V MAX_N * MAX_N + 2

// cool, right? NO!

// The following code would be wrong in the #define version:
x = (MAX_V * 2);
cout << MAX_N << " " << MAX_V << " " << x << endl;
// in const version:   "50 2502 5004"
// in #define version: "50 2502 2504" 

// Using a #define in code is  a text replace. It is semantically different to
// a constant, that's why you shouldn't use them in place of constants.

// And yes, you can "fix" this by adding brackets:

#define MAX_V (MAX_N * MAX_N + 2)

// But ask yourself: how many people who use #defines instead of constants actually
// use brackets when declaring them?


Code blocks

When I started programming I really thought saving up on code lines made me look clever. I did aberrations such as this:

for (int i=0; i<n; i++) for (int j=0; j<n; i++) for (int k=0; k<n; i++)
    dist[i][j] = min( dist[i][j],  dist[i][k] + dist[k][j] ); // floyd Warshall

//Or this:
for (int i=0; i<n; i++)
    for (int j=0; j<n; i++)
        for (int k=0; k<n; i++)
            dist[i][j] = min( dist[i][j],  dist[i][k] + dist[k][j] ); // floyd Warshall

        
// or even this:
for (int i=0; i<n; i++)
    for (int j=0; j<n; i++)
        for (int k=0; k<n; i++)
        {
            dist[i][j] = min( dist[i][j],  dist[i][k] + dist[k][j] ); // floyd Warshall
            cout << i << ", " << j << "= " << dist[i][j];
        }
// (Just one curly is needed, right? )

So yeah, c++ allows you to save the hassle of using curly braces when there is only one statement. You can use if(cond) statement; or for(x;y;z;) statement; or any similar variation. You save up the number of lines!

Nowadays I have a strict rule. I try to almost always use curly braces. Just because the language allows you not to use them doesn't mead you shouldn't. I have an exception though, else statement after if:

if (x == 1) {
    return 7;
} else if ( 0 <= x && x <= 10 ) {
    return 8;
} else if ( x > 78 ) {
    return 12;
} else {
    return 0;
}

The reason for this exception is the lack of an elseif keyword.

Why use always curly braces? The more I programmed the more it seemed that I had no way to really predict how I was going to have to modify my code. Today it appears that this if statement requires to execute only one line:

if (x == 1) return 7;

But maybe tomorrow you are going to need to add something to it. Maybe a special case you were missing. Maybe you need to add debug code... In order to update this code you'll need to restructure it , add curly, add the indentation, add the new code. But what if later you decide you need to remove it again? Once again you reset the style... It is too many steps before updating something, and each time you change something in your code you risk causing more bugs. This simple if is... simple, what if it is one line right next and inside many other blocks? I don't know, it just feels better when all your code blocks are braced.

Adding braces is half the battle. Things like :

//1.
if (x == 1)
{
    return 7;
}

if (x == 1) { return 7; }

if (x == 1)
    { return 7; }

if (x == 1) {
    return 7; }

Are also problematic to my taste. I prefer the egyptian brackets style. If I am going to add curly braces to everything, I better not use up too many lines of code. I used to use the style in the first example all the time. Now it just looks quite heavy. The return 7 seems too far away from the x == 1 :).

Oddly , I don't like to use egyptian brackets outside of statement blocks. The braces for a function / method / struct / class / namespace / must be the heavy ones. Why? I am not sure. But it seems intuitive to me. Almost as if it is because the statements are shorter than the functions.

Indent

Really, why do people use tabs for indents? It breaks so many programs that it is not even funny. I prefer 4 spaces. I am not sure why, but less than 4 spaces tends to seem too tight and more requires far more of a panoramic view. I have exceptions. I know that I sometimes do something like this:

for (int x0 = 0; x0 < w; x0++) {
 for (int y0 = 0; y0 < h; y0++) {
    for (int x1 = 0; x1 < w; x1++) {
     for (int y1 = 0; y1 < h; y1++) {
        if ( something ) {
            doWhatever();
        }
     }
    }
 }     
}

Some times the indentation depth becomes too wide, even though the logic of the algorithm is not really nesting four loops. When I look at the code above, I really think I am nesting two loops and not four. One loop for point `(x0,y0)` and another for point `(x1,y1)`. That is why the indentation is so peculiar. I wish languages had better facilities to do "2d loops". In fact, I think they do:

for (int x0 = 0, y0 = 0; y0 < h; x0 = (x0 + 1) % w, y0 += !x0) {
    for (int x1 = 0, y1 = 0; y1 < h; x1 = (x1 + 1) % w, y1 += !x1) {
        if ( something ) {
            doWhatever();
        }
    }
}

But that would be unorthodox, wouldn't it? :)

Memoization functions

This is one I learned the hard way over the years. I always try to have exactly one return statement in a memoized recursive function.

int f(int x, int y)
{
    int & res = dp[x][y];
    if (res == -1) {
        res = 0;
        if (x == 0) {
            res = y;
        } else {
            res = f(x-1, y);
            if (y > 0) {
                res = f(x-1, y-1);
            }
        }
    }
    return res;
}

Why? let us check out the alternative:

int f(int x, int y)
{
    if (x == 0) {
        return y;
    }
    int & res = dp[x][y];
    if (res != -1) {
        return res;
    }
    res = f(x-1, y);
    if (y > 0) {
        res = f(x-1, y-1);
    }
    return res;
}

Seems inicuous, but it is bug-prone to have this habit. In this case y is accessible in `O(1)` time, but maybe another recurrence requires you to return some f(y) function that is `O(y)`. If we did that, in this function, the f(y) call wouldn't get memoized. There are tons of similar ways in which you can forget to memoize something if you are liberal with return statements. My favorite is when coding in a language that doesn't have by-ref variables, or when you use that code style in c++:

int f(int x, int y)
{
    if (x == 0) {
        return y;
    }
    int res = dp[x][y];
    if (res != -1) {
        return res;
    }
    if (x >= y/2) {
        res = f(x, y - x);
        return res; // oops , forgot to set dp[x]y] = res
    }
    res = f(x-1, y);
    if (y > 0) {
        res = f(x-1, y-1);
    }
    dp[x][y] = res;
    return res;
}

No comments :