vexorian's blog: implementation

Showing posts with label implementation. Show all posts

Saturday, March 08, 2014

SRM 611 Python review

SRM 611 was the first of 10 topcoder SRMs in which I am going to exclusively use python during the contest. During the match, I had a bit of a bad day. Now I finished the first part of the editorial so I had to solve all the problems (except the hardest one, for now). I have some things to say.

Div2 250: The one with digits

link to problem statement

The problem is just an implementation thing: Return if a number (provided as a string) is interesting. A number is interesting if for each digit d from 0 to 9, either it doesn't appear in the number or it appears in exactly two positions and there are exactly d other digits between those positions.

My first reaction was:

class InterestingNumber:
 def isInteresting(self, x):
    for d in range(10):
        ch = chr(d + ord('0') )
        if ch in x:
            indexes = [ i for i in range( len(x) ) if x[i] == ch ]
            if len(indexes) != 2 or indexes[1] - indexes[0] - 1 != d:
                return "Not interesting"
    return "Interesting"

It is okay, although not particularly interly or showcasing of what's different in python. What is interesting is extracting the indexes that contain ch in one line.

It can be simplified a bit, although it is unclear whether this makes the code clearer or not:

class InterestingNumber:
 def isInteresting(self, x):
    for d in range(10):
        ind = [ i for i in range( len(x) ) if x[i] == chr(d + ord('0') ) ]
        if (len(ind) > 0) and ( len(ind) != 2 or ind[1] - ind[0] - 1 != d ):
            return "Not interesting"
    return "Interesting"

Too imperative, maybe?

class InterestingNumber:
 def isInteresting(self, x):
    def validIndexes(d, ind):
        return (len(ind) == 0) or (len(ind) == 2 and ind[1] - ind[0] - 1 == d)
    
    def extractEqual(d):
        return [i for i in range(len(x)) if int(x[i]) == d]
    
    def validDigit(d):
        return validIndexes(d, extractEqual(d) )
     
    return ["Not i","I"][ all(validDigit(d) for d in range(10) ) ] + "nteresting"

The all() function is a useful builtin, returns True if all the stuff you provided it are True too.

I like this last one because it is almost pseudocode (to me). But I don't know, really. At least in general the code in this problem is quick to write in python.

Div2 500: The one with LCM

problem statement

You are given a set `S` of up to 50 elements and a number `x`. All elements and `x` are between 1 and `10^9`. Return "Possible" if there exists a subset `S'` such that the LCM (least common multiple) of all the numbers in `S'` is equal to `x`.

The solution is basically to just take the LCM of all elements in `S` that are divisors of `x`. Answer is Possible if and only if that LCM is equal to `x`.

The first part of all of this is to find the LCM of two numbers. We can use the formula `lcm(a,b) = (ab)/gcd(a,b)`. Then we need to find the GCD... We use Euclid's algorithm. This takes me to how to implement Euclid's algorithm in python...

def gcd(a,b):
    while b != 0:
        (a,b) = (b, a%b)
    return a
    
def lcm(a,b):
    return a*b / gcd(a,b)

This is a good approximation. There is a slight problem and it is, what if `0 = a < b`? The correct result in that case is `b`. This is not an issue in this problem specifically because no element is 0. But we can solve it by making sure `b` is always the smallest of the two.

In c++ you'd do :

if (a < b) {
    swap(b,a);
}

But in python, there are some other ideas. For example: `(b,a) = (min(a,b), max(a,b))` . But my favorite is:

def gcd(a,b):
    (b,a) = sorted( (a,b) )
    while b != 0:
        (a,b) = (b, a%b)
    return a

The sorted function will sort the tuple `(a,b)`, so its result will be `(min(a,b), max(a,b))`, then we do the assignment in such a way that `b` will hold the minimum and `a` the maximum.

Now we need to extract the elements of `S` that are divisors of `x` (easy), and then take the LCM of them all... Our lcm function takes a pair of numbers, but this is what reduce was made for! reduce is another useful builtin with a functional origin story. Basically `"reduce"(lcm, (a,b,c,...)) = "reduce"( lcm, (lcm(a,b),c,...) )`... So calling reduce with lcm as argument will apply the lcm of all the numbers. A bit of a corner case is when the set of divisors of `x` in `S` is empty... Then reduce would get an empty set. For those cases, reduce would throw an exception unless you give it a default value. A good default value is 1.

def div2solution(S, x):
    return x == reduce(lcm, [z for z in S if (x % z == 0) ], 1)

Reduce is one of my favorite things after starting to use python. It makes code so much simpler once you "get" it. The imperative alternative would be to set a variable to 1, do a loop and repeat lcm( variable, x[i] ) and things like that.

Div2 hard: The one with elephants

This was a long implementation problem with a long problem statement that it is difficult for me to abridge and an even longer explanation for a solution that is quite obvious. So here is the problem statment and here's the editorial (Too bad topcoder is still hiding editorials to guest users, I guess those are *great* at SEO...).

The crux of the matter is, you need a brute force to pick all combinations of starting positions in the top and bottom rows. There are at most 5 columns, so there are at most `2^10` such combinations. So really, we need to iterate through all `2^(2n)` subsets of the numbers from `0` to `2n-1`. In the contest world, we usually use bitmasks, we are so used to them that we forget how horrific of an incomprehensible low level hack they are. The other option is to use backtracking, but that's a lot of code... In python, though, we have something better: itertools.combinations. It allows you to easily iterate through all the subsets of size `r` of some list.

The second issue with this problem is that you need to simulate a search that can start in the top,right,left or bottom side of the map, progress towards the opposite direction until it finds a "pond" or a square that is already blocked, and block all the cells in its path. This sort of simulation is best if we engineer it to use a single function. Nesting the function makes things just easier here:

class ElephantDrinkingEasy:
 def maxElephants(self, theMap):
    n = len(theMap)
    mx = 0
    for r in xrange(2*n + 1):                                #number of elements of subset
        for prows in itertools.combinations(xrange(2*n), r): #subsets of r elements
            
            # copy the contents of the map so we can overwrite them
            sim = [ [ theMap[i][j] for j in xrange(n) ] for i in xrange(n) ]
            res = 0

            #the simulation in a cnvenient nested function
            def simulate(x,y, dx,dy):
                while (0 <= x < n) and (0 <= y < n):
                    if sim[x][y] == 'Y':
                        sim[x][y] = '#'
                        return 1
                    elif sim[x][y] == '#':
                        break
                    else:
                        sim[x][y] = '#'
                        (x,y) = (x + dx, y + dy)
                return 0

            for i in prows:
                x = i % n            
                (y,dy) = (0, 1) if i < n else (n-1, -1)
                res += simulate( x,y, 0,dy)
            for j in xrange(2*n):
                y = j % n
                (x,dx) = (0, 1) if j < n else (n-1, -1)
                res += simulate( x,y, dx,0 )
            mx = max(mx, res)
    return mx

Div1 easy : The one I failed

Problem statement

The LCM set of a set S is the set of all numbers that can be generated by taking LCM of a non-empty subset of S. Given sets `A` and `B`, find if the LCM sets of the two sets are equal.

I failed this problem during the match. It turns out that my logic was mostly right, the mistake, however was in checking if it is possible to build a number out of some of the other elements in the set. During the match I did an overcomplicated mess based on prime factors, that was wrong. Now we know that this is actually the solution to the division 2 problem above. So we need to do the same thing "compress" both sets getting rid of the redundant numbers and then compare the compressed sets.

def compress(X):
    return { y for y in X if not div2solution( [z for z in X if z != y] , y) }

def div1solution(A, B):
    return compress(A) == compress(B)

Another shiny python implementation. [z for z in X if z != y] returns the elements in `X` that are not equal to `y`. Note that I return a set (surrounded by `{}` instead of a list, this saves me the issue of sorting after compressing (A and B are provided as tutples, so their orders may be different even though they represent the same sets).

Div1 medium: The one with Spanning Tree

You are given 50 points. Find the Spanning Tree to connect those points through straight lines (ignoring that there might be intersections), but instead of a usual MST problem, we want the Spanning Tree with the minimum standard deviation of edge weights...

Long story short, we need to repeat the following for each dummy average `m = (w_i - w_j)/2 + \epsilon` where `w_i` and `w_j` are any two edges of the original graph. Find the minimum spanning tree using weights: `(m - w_i)^2`, where `w_i` is the original edge weight. Take the standard deviation of the `w_i` weights in this tree. One of the standard deviation of the `O(n^4)` trees you are going to try is going to be the answer. *Somehow* (This is a python post, not an editorial, read the long story short for what passes as a proof).

Anyway... This problem seems impossible in python through this `O(n^6)` approach. `O(n^6)` under two seconds in TC servers for `n<=20` is a piece of cake for c++ and Java, but apparently not for python. This is my fastest python solution:

class Egalitarianism2:
 def minStdev(self, x, y):
     points = [ (float(p), float(q)) for (p,q) in zip(x,y) ]
     
     def euclid(p, q):
         return math.sqrt( (p[0] - q[0])**2 + (p[1] - q[1])**2 ) 
     
     #Prim's MST algorithm:
     def deviationPrim(m):
         n = len(points)
         par = range(n)
         w = [ [ (m - euclid(points[i],points[j]))**2 for i in xrange(n) ] for j in xrange(n) ]
         c = 0
         reached = [ i == 0 for i in xrange(n) ]
         parent = [0] * n
         dist = [ w[0][i] for i in xrange(n) ]
         r = [0.0] * (n - 1)

         for i in xrange(n - 1):
             v = min( (dist[j],j) for j in xrange(n) if not reached[j] )[1]
             u = parent[v]
             r[c] = euclid( points[u], points[v] )
             c += 1
             reached[v] = True
             for j in xrange(n):
                 if dist[j] > w[v][j]:
                     dist[j] = w[v][j]
                     parent[j] = v

         # Take the standard deviation of weights
         m = sum(r) / (n - 1)
         return math.sqrt( sum( (x-m)**2 for x in r) / (n - 1) )

     res = 10.0**9
     edges = [ euclid(a,b) for a in points for b in points ]
     for (e1,e2) in itertools.combinations( edges, 2):
         m = ( e1 + e2 ) / 2
         res = min(res, deviationPrim(m + 1e-9) )
    
     return res

A cool thing is how it takes the unordered pairs of edge weights to find `m = (e_1 + e_2) / 2`, instead of using indexes i and j, we can just use itertools.combinations yet again :).

Unfortunately, it is too slow. It needs 30 seconds to run for `n = 20` in my computers, which means it should probably take 25 seconds in topcoder's servers. Could it be improved? Maybe, my python skills are still young and this is the first time I try Prim's in python. Either way, the c++ version of the code easily passes under 1 second. So it is at least much harder to pass in python.

To sum up

Some great python opportunities in most of the match, except for div1 medium which is extremely difficult to pass in python if not impossible.

Thursday, February 27, 2014

Python in TopCoder 2

The other day someone linked my topcoder preliminary python vs. c++ comparison in reddit (Could you please not *ever* link my blog to reddit? ) Anyway, it reminds me that now that it has been a while since Python was enabled in topcoder SRMs it is a good time to talk more about it.

Poor adoption

It appears that the number of python topcoders stayed as low or lower than when it was released. I would love to have some actual stats, but anecdotally it appears to me C# still has more users than python (And C#'s user share is very low in comparison to Java). Which is a shame, really.

The following is a screenshot of the top 50 in the last SRM. Yellow scores are c++, Green scores are Java, Orange scores are python and White scores are C#:

So yep, it is not statistics, but this is a scenario that tends to repeat itself. The python solutions stand out because they are rare. I was expecting a bit more.

So why is this?

A contest system designed around Java

Let's go to the point, the contest system is just not python-friendly. This all begins because it has been clearly designed with Java solutions in mind. The obvious indication of this is that it requires you to wrap the solution in a class. In Java, this is natural. In C++, it is slightly annoying, but useful because it allows you easy access to the heap. But in python, it is really just a bunch of verbosity that is not helpful at all.

Using classes in python is a bit obnoxious and it gets in the way unless you really need the OOP syntactic sugar. In python, the more declarative/functional approaches tend to be better. This means using functions. But if you put your functions inside the class, that means having to use self. syntax and include self in all your function. Ultimately it means having the self argument in places where it is not really needed. I just wish there was an alternative to this.

What if we wanted to include actual functional languages? You won't be able to force a class in Haskell. And it would be quite cool to have Haskell or even LISP as an option. Actually, I wish we had more languages in general.

But the real aspect of TopCoder that reveals that it is all about Java is the execution time limit. There is a 2.0 time limit for each test case. When problem setting, writers have to provide a Java solution that runs in less than 1.0 seconds. So time constraints are guaranteed for Java and work great for c++ because it is faster. But any language slower than Java is going to have issues and it shows.

The list of Topcoder problems that are impossible in python keeps growing. Even if a problem is possible, if it is not one of the easy problem categories, it will be usually be much harder to come up with a passing python solution than a passing c++ or Java solution. Problems like EllysCandyGame where you can easily pass in c++ with brute force, but need an exponential-time dynamic programming to pass in python.

Poor information

How many people out there know that python is allowed in TC? It seems to still be news to plenty of people. Maybe we need to tell more people about this?

If only

There are a couple of problems that I'd like to mention out of how frustratingly bad they are for python.

EllysPairing an egregious example. This problem has a 16MB memory limit. This means python cannot even run without failing the test case. There is no way around this issue, python simply cannot do anything about it.

What is ironic is that the solution for the problem would be very nicely implemented in python. The problem uses a list of integers that are generated randomly . The memory limit makes you unable to save the list of integers in a list/array/etc. The algorithm that solves it is linear-time majority vote which requires you to visit the sequence twice. Since you cannot save the numbers in a list, this means that you would need to generate the list twice. This makes it very easy to have messy code that repeats the random generation process in two code places. But in python, this wouldn't be a problem, because it has yield.

class EllysPairing:
 def getMax(self, M, count, first, mult, add):
    # generate all the random numbers, return as generator
    def names():
        for i in xrange( len(count) ):
            x = first[i]
            for j in xrange(count[i]):
                yield x
                x = (x * mult[i] + add[i]) & (M - 1);
    p = -1
    q = 0
    for x in names(): #use them
        if (p != x):
            q -= 1
            if (q <= 0):
                p = x
                q = 1
        else:
            q += 1
    
    n = sum(count)
    c = sum( x == p for x in names() ) #use them again
    
    if (c > n / 2):
        return n / 2 - (c - n/2 - n%2)
    else:
        return n / 2

The important part is the for x in names() , that is all that is needed to iterate through all the random numbers in the list without rewriting the code. The top c++ solution by aceofdiamonds repeats the random number generation code.

This cute solution cannot run. Even without the cruel 16 MB limit, it will be too slow in comparison to the Java solution and would time out, taking around 6 seconds in TC's servers. That's too bad.

I also want to talk about a more recent problem: MiningGoldEasy. This problem has a simple dynamic programming / memoization solution. Just notice that if , in each move you limit yourself to rows and columns that contain at least one "event", it should be optimal, so we have an `O(|event|^3)` solution if we remember the current number of step, and the current position. In python, I can code this:

class MiningGoldEasy:
 def GetMaximumGold(self, N, M, event_i, event_j):
    @MemoizedFunction
    def f(x,y,s):
        if s == len(event_i) :
            return 0
        else:
            bx = max( f(nx,y,s+1) for nx in event_i )
            by = max( f(x,ny,s+1) for ny in event_j )
            return max(bx,by) + N + M - abs(event_i[s] - x) - abs(event_j[s] - y)
     
    return max( max( f(x,y,0) for x in event_i) for y in event_j )

The MemoizedFunction is a decorator. A syntactic sugar that means I do f = MemoizedFunction(f) after declaring the function. MemoizedFunction() is a function that takes a function and instead returns a function that behaves exactly the same but uses memoization. It is clever:

def MemoizedFunction(f):
    mem = dict()
    def g(*args):
        if args not in mem:
            mem[args] = f(*args)
        return mem[args]
    return g

Since 90% of Topcoder problems are dynamic programming ones, this idea to automatically turn a function into memoization is very cool. Just take a look to the GetMaximumGold() code up there, it is almost pseudocode. A c++ equivalent looks like this:

int GetMaximumGold(int N, int M, vector<int> event_i, vector<int> event_j)
{
    int t = event_i.size();
    
    int mem[50][50][51];
    memset(mem, -1, sizeof(mem));        
    function<int(int,int,int)> f = [&](int a, int b, int s) {
        int & res = mem[a][b][s];
        if (res == -1) {
            if (s == t) {
                res = 0;
            } else {
                int x = N + M 
                      - abs( event_i[a] - event_i[s])
                      - abs( event_j[b] - event_j[s]);
                res = 0;
                // change row:
                for (int na = 0; na < event_i.size(); na++) {
                    res = std::max(res, x + f(na,b,s + 1) ); 
                }
                // change column:
                for (int nb = 0; nb < event_j.size(); nb++) {
                    res = std::max(res, x + f(a,nb,s + 1) ); 
                }
            }
        }
        return res;
    };
    int mx = 0;
    for (int a = 0; a < event_i.size(); a++) {
        for (int b = 0; b < event_j.size(); b++) {
            mx = std::max(mx, f(a,b,0) );
        }
    }
    return mx;
}

Unfortunately, the python solution above fails system tests whilst the c++ one passes in 0.1 seconds worst-case. Of course, it is likely that the python solution can be optimized to pass. Probably not using dictionaries or adding crops. But this means going lower level and thus losing the advantages python provides.

Google's Code Jam

Again, this all contrasts with the Google Code Jam, in which python has almost as many users as Java. Since in google jam there is no memory limit (as long as it runs in your computer) and the time limit is very generous (4~8 minutes for around 100 test cases as opposed to 2 seconds for each test case). Then python has a more healthy following in Code Jam. Shall TC make changes to become more language agnostic? It is possible for them to do that without losing what makes them unique? I have no idea.

SRM 611-620 Challenge

With all this in mind, let's do something fun! In the next 10 SRMs, I will use only python during each match. Repeat, I will be forbidden from submitting anything in c++. This will be the first time I use a language other than c++ during a match. Let's see what happens... I will try to make detailed blog posts about how it is effecting (helping or being an obstacle) my performance.

If anyone want to join the challenge, just post a comment here or make a blog post of your own.

Sunday, October 27, 2013

Coding habits

I was wandering in quora and finally found a interesting thread: what are some of your weird coding habits?. I've been meaning to talk about my style choices, specifically in programming contests where the people that think that throwing away readability in exchange of a bit typing time is a fair exchange. I really like my code to be readable (for me) and have forged some coding habits.

#defines

In c++, #defines are a meta-programming tool inherited from c++. It is great to use defines if you want to have code that is created conditionally (using #ifdef). Some defines are useful because the text-processing allows things that a constant function cannot do, like: #define x(a) cout << "#a" << a . In the past, I used a for_each #define because it was much clearer than the alternative (But now c++11 has range-based for loops and a for_each macro is not needed anymore) Any other #define use tends to be something that Satan has introduced in c++ to allow people to make horrible code.

I am unapologetic. All those people using a REP #define to replace for loops. Also you, yes, you who uses #defines for stuff that typedefs and constants can do just fine. You should be ashamed. It is as if you were throwing plastic to the ocean of code. Think of the children, there are newb programmers that read up your code in the high ranks of SRMs and and think "COOL". Yes, really, this awful thing you are doing is IMITABLE BEHAVIOR. How do you sleep at night?

I am not talking just about code purity, #defines are risky and can bring you confusing bugs.

//First of all, this causes a syntax error:  
#define MAX 1000 //Discouraging comments?

// Now think about this, if I use a constant:
const int MAX_N = 50;
// I can then use the same constant to generate other constants
const int MAX_V = MAX_N * MAX_N + 2; // maximum number of vertices in the graph
                                     // note how I can comment that code

//If I instead used defines for constants:
#define MAX_N 50
#define MAX_V MAX_N * MAX_N + 2

// cool, right? NO!

// The following code would be wrong in the #define version:
x = (MAX_V * 2);
cout << MAX_N << " " << MAX_V << " " << x << endl;
// in const version:   "50 2502 5004"
// in #define version: "50 2502 2504" 

// Using a #define in code is  a text replace. It is semantically different to
// a constant, that's why you shouldn't use them in place of constants.

// And yes, you can "fix" this by adding brackets:

#define MAX_V (MAX_N * MAX_N + 2)

// But ask yourself: how many people who use #defines instead of constants actually
// use brackets when declaring them?

Code blocks

When I started programming I really thought saving up on code lines made me look clever. I did aberrations such as this:

for (int i=0; i<n; i++) for (int j=0; j<n; i++) for (int k=0; k<n; i++)
    dist[i][j] = min( dist[i][j],  dist[i][k] + dist[k][j] ); // floyd Warshall

//Or this:
for (int i=0; i<n; i++)
    for (int j=0; j<n; i++)
        for (int k=0; k<n; i++)
            dist[i][j] = min( dist[i][j],  dist[i][k] + dist[k][j] ); // floyd Warshall

        
// or even this:
for (int i=0; i<n; i++)
    for (int j=0; j<n; i++)
        for (int k=0; k<n; i++)
        {
            dist[i][j] = min( dist[i][j],  dist[i][k] + dist[k][j] ); // floyd Warshall
            cout << i << ", " << j << "= " << dist[i][j];
        }
// (Just one curly is needed, right? )

So yeah, c++ allows you to save the hassle of using curly braces when there is only one statement. You can use if(cond) statement; or for(x;y;z;) statement; or any similar variation. You save up the number of lines!

Nowadays I have a strict rule. I try to almost always use curly braces. Just because the language allows you not to use them doesn't mead you shouldn't. I have an exception though, else statement after if:

if (x == 1) {
    return 7;
} else if ( 0 <= x && x <= 10 ) {
    return 8;
} else if ( x > 78 ) {
    return 12;
} else {
    return 0;
}

The reason for this exception is the lack of an elseif keyword.

Why use always curly braces? The more I programmed the more it seemed that I had no way to really predict how I was going to have to modify my code. Today it appears that this if statement requires to execute only one line:

if (x == 1) return 7;

But maybe tomorrow you are going to need to add something to it. Maybe a special case you were missing. Maybe you need to add debug code... In order to update this code you'll need to restructure it , add curly, add the indentation, add the new code. But what if later you decide you need to remove it again? Once again you reset the style... It is too many steps before updating something, and each time you change something in your code you risk causing more bugs. This simple if is... simple, what if it is one line right next and inside many other blocks? I don't know, it just feels better when all your code blocks are braced.

Adding braces is half the battle. Things like :

//1.
if (x == 1)
{
    return 7;
}

if (x == 1) { return 7; }

if (x == 1)
    { return 7; }

if (x == 1) {
    return 7; }

Are also problematic to my taste. I prefer the egyptian brackets style. If I am going to add curly braces to everything, I better not use up too many lines of code. I used to use the style in the first example all the time. Now it just looks quite heavy. The return 7 seems too far away from the x == 1 :).

Oddly , I don't like to use egyptian brackets outside of statement blocks. The braces for a function / method / struct / class / namespace / must be the heavy ones. Why? I am not sure. But it seems intuitive to me. Almost as if it is because the statements are shorter than the functions.

Indent

Really, why do people use tabs for indents? It breaks so many programs that it is not even funny. I prefer 4 spaces. I am not sure why, but less than 4 spaces tends to seem too tight and more requires far more of a panoramic view. I have exceptions. I know that I sometimes do something like this:

for (int x0 = 0; x0 < w; x0++) {
 for (int y0 = 0; y0 < h; y0++) {
    for (int x1 = 0; x1 < w; x1++) {
     for (int y1 = 0; y1 < h; y1++) {
        if ( something ) {
            doWhatever();
        }
     }
    }
 }     
}

Some times the indentation depth becomes too wide, even though the logic of the algorithm is not really nesting four loops. When I look at the code above, I really think I am nesting two loops and not four. One loop for point `(x0,y0)` and another for point `(x1,y1)`. That is why the indentation is so peculiar. I wish languages had better facilities to do "2d loops". In fact, I think they do:

for (int x0 = 0, y0 = 0; y0 < h; x0 = (x0 + 1) % w, y0 += !x0) {
    for (int x1 = 0, y1 = 0; y1 < h; x1 = (x1 + 1) % w, y1 += !x1) {
        if ( something ) {
            doWhatever();
        }
    }
}

But that would be unorthodox, wouldn't it? :)

Memoization functions

This is one I learned the hard way over the years. I always try to have exactly one return statement in a memoized recursive function.

int f(int x, int y)
{
    int & res = dp[x][y];
    if (res == -1) {
        res = 0;
        if (x == 0) {
            res = y;
        } else {
            res = f(x-1, y);
            if (y > 0) {
                res = f(x-1, y-1);
            }
        }
    }
    return res;
}

Why? let us check out the alternative:

int f(int x, int y)
{
    if (x == 0) {
        return y;
    }
    int & res = dp[x][y];
    if (res != -1) {
        return res;
    }
    res = f(x-1, y);
    if (y > 0) {
        res = f(x-1, y-1);
    }
    return res;
}

Seems inicuous, but it is bug-prone to have this habit. In this case y is accessible in `O(1)` time, but maybe another recurrence requires you to return some f(y) function that is `O(y)`. If we did that, in this function, the f(y) call wouldn't get memoized. There are tons of similar ways in which you can forget to memoize something if you are liberal with return statements. My favorite is when coding in a language that doesn't have by-ref variables, or when you use that code style in c++:

int f(int x, int y)
{
    if (x == 0) {
        return y;
    }
    int res = dp[x][y];
    if (res != -1) {
        return res;
    }
    if (x >= y/2) {
        res = f(x, y - x);
        return res; // oops , forgot to set dp[x]y] = res
    }
    res = f(x-1, y);
    if (y > 0) {
        res = f(x-1, y-1);
    }
    dp[x][y] = res;
    return res;
}

Friday, August 30, 2013

Regarding c++ vs. python (in algorithm contests)

Reddit...

Dear people from reddit. Someone linked this blog to you. And it was wrong for it to happen. Because the intended audience of this blog is *not* software developers/ programmers at /r/programming. The intended audience is people who participate in programming contests. Specifically TopCoder. This blog post was made for those people in mind. For example, chances are that you have no idea what an editorial is in this context. The blog is also extremely tongue-in-cheek as I write it using my vexorian persona.

We already know very well about performance. Python runs around 10x slower in comparison to c++ in these contests. Since that's a known fact about the whole thing that's why I didn't even mention it or include "benchmarks".

One thing that you must understand is that the contests are basically designed around C++. This comparison has nothing, absolutely nothing, to do with what language is better than the other for general usage. This is no ammo in some language supremacy flame war. In other words: There is absolutely no need to come to insult me. If you just want to know which language is better for general usage: Just pick the one you are more experienced with and that fits your needs better; If you are undecided, go look for more professional comparisons or something...

--------------------------

So, yesterday I was writing an editorial and things were going ok. I was solving a problem when I decided to include both c++ and python code, like I've been doing plenty of times lately. In this occasion, I wanted the c++ not to rely on libraries or built-in functions, so that someone who just started reading on the syntax understands the code. For the python one, I wanted to make a one-liner, because it was totally possible.

I was aware that this choice would make python 'look good". What I didn't expect was to wake up today to the news that TopCoder's facebook account decided to use those codes as the preview image when talking about my editorial... So now we have this image circulating around that seems to make the point that python allows much shorter code than c++ and it is in part because of me. This was the final trigger that made me write this post, I was planning to write something like this since a while ago.

Python - good?

My relationship with python started on rocky ground. Once upon a time, I was a lame programmer who was obsessed with speed and performance. Python, along with Java and many others was a language that was very slow. So there was no way I would ever use it. Worse, python specifically had syntactically-relevant indentation, which also meant it didn't have brackets or begin/end or something similar. Block depth was implicit. For a good while, this feature made python code make me feel dizzy. It is like the code is floating without something that holds it!. It actually still annoys me, but not as much as before.

Eventually, I did some maturing as a programmer. I learned many languages. Learned about good practices. I even made my own languages as an attempt to fix a very bad language (And discovered many things about what works and what doesn't). So I eventually gave python a chance. It helps to have an editor with code-folding or indentation guides. It removes that feeling of things floating without being held :).

My learning process was slow because I never had any project that I had to write in python. But it did become my main calculator application. I also use it for small scripts and to generate test cases for programming contest problems.

There are many good things about python. There are also some things I am not a fan off. What I like about python is that instead of trying to be a better C++ (*cough* Java and C# *cough*), it tries to be a better language. It is also free and not tied to some evil mega-corporation (*cough* Java and C# *cough*). By the time I heard they were going to add python to the list of languages in Topcoder SRMs, I was excited because I would get to use it in editorials, and it seems like a great tool for them.

Python in editorials

And so, began the journey. I started to use python to solve SRM problems and include the code in editorials. I now have some things to say about using python in TopCoder contests.

Small code

Python really shines in problems that need little code. The stuff other languages need to do the simplest of operations, is usually very easily-implementable if not completely built-in. The code from the problem in the image I first talked about is a good example.

class GooseTattarrattatDiv2:
 def getmin(self, S):
    return len(S) - max( S.count(ch) for ch in S )

It is even self-documented. The result is equal to the length of the string, minus the maximum number of times a character appears in S.

It is actually very fun, and a major source of procrastination, to try to come up with the smallest possible python code to solve a problem.

class InsertZ:
 def canTransform(self, init, goal):
    return "Yes" if (goal.replace('z','') == init) else "No"

If the goal is equal to the init after removing all Zs from it, return Yes.

Making readable code

So the division 2 problem of SRM 588 was a typical "bit masks" problem. You had to iterate through all the subsets of songs. More so, songs had two attributes: length and tone. Typically , the code for this looks very messy in c++ or Java. You end up using bitmasks, which means that you will have beautiful things like (1<<i)& in your code, then for each i, you check if it is in the bitmask, if so, call duration[i], etc, etc... You need a formula between the maximum, minimum tones and the sum of durations and other things

The python code, however, looks like this:

import itertools, collections
 
class GUMIAndSongsDiv2:
 def maxSongs(self, duration, tone, T):
    n = len(tone)
    #making this a named tuple is not needed, but it adds readability:
    Song = collections.namedtuple('Song', ['tone', 'duration'] )
    # Make a list of song objects:
    songs = [ Song(tone[i], duration[i]) for i in range(0,n) ]
    res = 0
    for c in xrange(1, n+1):
        # c is a number from 1 to n, increasing order
        for picked in itertools.combinations(songs, c):
            # "picked" is a set of c songs:
            durationSum = sum( s.duration for s in picked )
            minTone     = min( s.tone     for s in picked )
            maxTone     = max( s.tone     for s in picked )
            if durationSum + maxTone - minTone <= T:
                res = c
    return res

itertools.combinations creates an iterator from the iterator that you pass to it, in this case the songs. The new iterator will contain all the combinations of c songs. So if you try c in increasing order, you can easily get all the subsets. In order to easily divide the songs in subsets, we actually made a list of the song objects. To create and define those objects, we only needed a single line to call collections.namedtuple...

There are many ways in which this code is amazing (to me). And it baffles me how readable it is. It is almost a description of the algorithm for the problem, instead of being code. The alternative in c++ or worse, Java , would be much messier. Check it:

int maxSongs(vector <int> duration, vector <int> tone, int T)
{
    int n = tone.size();
    int best = 0;
    // for each subset of songs represented by a bit mask:
    for (int mask=1; mask<(1<<n); mask++) {
        int maxTone = -1, minTone = 200000, durationSum = 0, c = 0;
        // find the minimum/maximum tone, the sum of durations and the
        // number of songs in the subset:
        for (int i=0; i < n; i++) {
            if (mask & (1<<i)) { //is song i in the subset?
                maxTone = max(maxTone, tone[i]);
                minTone = min(minTone, tone[i]);
                durationSum += duration[i];
                c++;
            }
        }
        // If selected songs in optimal order fit the time constraint, this is valid:
        if ( durationSum + maxTone - minTone <= T) {
            best = std::max(best, c);
        }
    }
    return best;
}

Being able to use functions

Most large algorithms will need you to use many functions and call them. You know what this means? In c++ / Java this means that you will have to copy some variables and the arguments as global variables or class members so that the functions can use them. To me, this is very annoying and has been a source of bugs because if you forget to actually make the copy or assign it correctly, you are screwed. c++11 allows lambdas, and I was able to use them to avoid this issue somehow, like in test SRM #2. But c++ lambdas have a very big issue, you cannot (easily) have recursion with them. You need to work around it somehow. The current best answer to the problem I have is to use std::function explicitly when declaring the lambda and it is very messy.

Python has nested functions instead of lambdas, so making them recursive is not hard at all. It is quite normal. Check the codes in the post about SRM 589 and you will see what I mean.

Some more obvious advantages

Sure, there are some obvious things like how overflow is not possible. That whole stuff about tuples is nice to have. Too many times you are dealing with points or coordinates so you are using tuples, so something like (x,y) = ... makes more sense than x = ... ; y = ... The native set() and dict() types. Etc.

Some bad things

After I started to apply this python knowledge to make code at topcoder, I couldn't avoid thinking. What if this is more convenient during a contest? What if I move to python?

Then I try to use this python thing for a larger problem and start to notice a darker side...

My pet peeve against dynamic languages

This is not something new that I just found, or that is specific to programming contests. I have to mention it. Languages that allow you to declare and use variables in the fly. Languages that are not compiled. They have this annoyance, and it is that your typos won't be discovered until the line with the typo is executed. You write these 100 lines of code and run your program. It is not until a few seconds of execution, that the program reaches a line of code in which you type blok instead of block. Sorry, do it again.

Multidimensional "arrays"?

Python doesn't have that array concept, it has tuples and lists. Lists are one-dimensional, but of course, you can make a list of lists and it can be indexed like a bi-dimensional array - a matrix. You can go out of your way and do the same for a list of lists of lists of lists of lists, and you have something with 5 dimensions!

In the Real World™ Programming world, this is likely not a big issue. In the algorithm/programing contests world, specifically the Topcoder world, it can be quite a pain. We have those dynamic programming problems with 4 dimensions and it is like nothing to write home about. You might even have 6 dimensions.

What in c++ looks like :

int dp[w][h][t][n][2][3];    // Just a typical day in topcoder land
memset(dp, -1, sizeof(dp));
dp[x][y][f][p][1][0] = 6;

In python looks like:

dp = [[[[[ [-1]*3 for i in xrange(2) ] for j in xrange(n)] for k in xrange(t)] for a in xrange(h)] for b in xrange(w)]
dp[x][y][f][p][1][0] = 6

Of course, that's ridiculous. Maybe the real problem is that we are approaching it the wrong way. We could use a dictionary?

dp = dict() 
dp[ (x,y,f,p,1,0) ] = 6

But then, since it is a dictionary it will have overhead in accessing the elements... Maybe a list but we will translate the indexes to a single integer? This is what C arrays do anyway.

dp = [-1] * w*h*t*n*2*3 
dp[ (x*h*t*n*2*3 + y*t*n*2*r + f*n*2*3 +  p*2*3 +  1*3 + 0 ] = 6

Yeah, a bit complicated... too, back to square one.

The petty default recursion limit

Speaking of dynamic programming and memoization in functions. I attempted to use python for FlippingBitsDiv2 and it worked... except that it was actually reaching the recursion depth limit way too quickly. In that problem, the recursion depth is as large as 2500, which doesn't really strike me as too high. Specially not for these contests... But that's the default.

Turns out it is perfectly possible to raise the depth limit with a call to a sys. function. So this problem isn't a big deal. Except that you need to be conscious of it when solving something using recursion.

Memory...

I knew of the obvious weakness. Speed. Python's dynamism comes at that price. Actually, I tried to do some tests and it seems like the time factor is 6 times or even 10 times slower. I haven't actually found a division 1 hard problem that can be solved in python yet.

What I didn't expect was the memory usage. This happened to me while solving GameInDarknessDiv2. It needed a depth-first search on a large graph. If you take a look at the c++ code, there are plenty of implementation bottlenecks. My first idea was to use python code for this solution. Because it can make easier to read code if you work for it. The result is beautiful:

def check(field, moves):
    # useful dictionary from move char to position offset:
    move = {'R':(1,0), 'L':(-1,0), 'U':(0,-1), 'D':(0,1) }
    
    # given a position and a move character, return the new position:
    def doMove( (x,y), ch):
        return (x + move[ch][0], y + move[ch][1])

    # find Alice and Bob's initial positions:
    (w,h) = ( len(field[0]), len(field) )
    for y in range(0,h):
        for x in range(0,w):
            if field[y][x] == 'A':
                (ax,ay) = (x,y)
            elif field[y][x] == 'B':
                (bx,by) = (x,y)

    # Save Alice's positions for each of Bob's turns:
    moves = string.join(moves, '')
    alice = [ doMove( (ax,ay), moves[0] ) ] 
    for ch in moves[1:]:
        alice.append( doMove(alice[-1], ch ) )
    # if in the first move, Alice moves towards Bob, it is game over for Bob:
    if (bx,by) == alice[0]:
        return "Alice wins"

    #And the DFS:
    T = len(alice)
    
    visited = [ [ [False]*(T+1) for i in xrange(0,h) ] for j in xrange(0,w) ]
    winner = ["Alice"]
    
    def dfs( (x,y), t):
        if not visited[x][y][t]:
            visited[x][y][t] = True
            if t == T:
                winner[0] = "Bob"
            else :
                for ch in "URDL":
                    (nx, ny) = doMove( (x,y), ch)
                    if (0 <= nx < w) and (0 <= ny < h) and (field[ny][nx] != '#'):
                        if (alice[t] != (nx,ny) ) and ( (t+1 == T) or (alice[t+1] != (nx,ny)) ):
                            dfs( (nx,ny), t+1 )
                            
    #oops, default recursion limit is too conservative for our DFS:
    sys.setrecursionlimit(T+10)
    #call DFS:
    dfs( (bx,by), 0 )
    #and done:
    return winner[0] + " wins"

The amount of code that is not wasted in boring activities is very relevant, at least to me. The use tuples also makes many things make more sense. It is more self-documenting. However, when I actually run system tests on that solution, I found plenty of issues. The need for a 3D "array" was the first. Then the recursion limit. But the one thing that I couldn't fix was the fact that this code was reaching the 64 MB limit.

The memory usage of this approach is `O(w*w*h*h)`, for `w,h <= 50`, such a memory complexity didn't usually seem like a problem to me when using c++ or Java. So I didn't expect this coming at all. 2500*2500 indexes in a boolean array take around 5.96 MB. In python, however, it turns out that lists holding 2500*2500 elements are too heavy (They are still too heavy even if I use the single-list approach that converts 3D indexes to a single dimension). What is going on?

It is not so surprising. Python is dynamic, a single index in a list can describe plenty of things, not just a number or a boolean. The reserved memory is probably at least 4 bytes for each index, that means 23.84 MB instead of just 5. It is more than that. I discovered a secret python function called __sizeof__, it tells me that a list of 2500*2500 False values is worth 47.68 MB (somehow). That is a good chunk of our 64MB. So apparently, not only is the available time around 10 times smaller, it seems like memory is 3 times smaller in practice too.

In codejam

In google codejam, python is the third most popular language and it is very close to Java, in fact, in round 1A it beat Java in popularity for the top 20% contestants. In TopCoder, python isn't working so well (But at least it has beaten Visual Basic and seems to be on its way to beat C# :) ). Python is quite a recent addition, we have had it for 4 official SRMs so far..., there may still not be many coders who know about its availability and the python coders that didn't give TopCoder a try because of the lack of python probably haven't heard of the change yet. It is no surprise usage is low right now. But will it ever reach a popularity level similar to what it has in google code jam?

Codeforces has had python since inception and python's numbers don't seem to match Codejam's (Although a tool like go-hero.net for codeforces language statistics would definitely be great to have, I couldn't find such thing). Codeforces and TC have an audience that is more focused on ACM-ICPC than the Codejam, though, so that skews things up. I have an alternative theory though: Hard time and memory limits like those in TopCoder and Codeforces just don't work well if you want a language agnostic contests. Making the time limits different is not a good solution either, as it would be difficult to pick good limits. What codejam does well is that, with 4 minutes/8 minutes time limits and using your computer's memory instead of google's, the constant factor is quite irrelevant. That allows plenty of more languages to work, including python.

Move to python?

I think python is a great language. But I don't think I will move to using python during contests. At least not for the time being. I am still not as good with python as I am with c++.

Also, although python has a powerful expression power. c++ has also improved a big deal with c++11. The big advantage I would have assigned to python over c++ would be the support for functions as first-class objects and closures. But now c++ has it too. So...

Remember the initial dilemma? In reality the true c++ version, using more of the available features would look like this:

int getmin(string S) 
{
    int n = S.length(), m = 0;
    for (char ch : S) {
        m = max(m, (int)count(S.begin(), S.end(), ch) );
    }
    return n - m;
}

It is still ugly, but not so bad:)

A bonus: Swapping integers

Last week, there was a image circulating around. How to swap two integers without using an additional variable. The "C version": a=a+b; b=a-b;a=a-b;. Then one of those obnoxious meme faces throws a misogynist slur and says that python has: a,b = b,a. Ahahahaha, so python is better, supposedly.

Let me introduce to you, the c++ way: swap(a,b). It has a nice advantage: It is self-commenting. It also follows [once and only once]. But the reason I like is because of what it says about c++. (a,b) = (b,a) is cool, but it is an builtin language feature that was introduced precisely to fix this problem and similar ones. The std::swap is a library function.

template <class T> void swap ( T& a, T& b )
{
  T c(a); a=b; b=c;
}

It is a result of things like templates, and also by reference arguments. Powerful features that also solve plenty of other problems and not just swap. std::swap works with any data types too - Unlike the C approach. :)

Saturday, August 10, 2013

Test SRM#2 : In which I abuse lambdas

So, today we had a new test SRM in TopCoder. This time testing the g++ 4.8.1 compiler and a python upgrade too. g++ 4.8.1 has some amazing features over 4.4. So I really wanted to use them in this test. (Disclaimer: I am actually just learning c++11)

Div1 hard

It was a dynamic programming problem. I solved this one before, and though I avoided looking at the past solution, the idea was still evident. This problem is such a standard dynamic programming problem that I don't think we would be able to use it in division 1 as a medium anymore. It was a division 1 hard back in its day...

Was sad because I couldn't find an excuse to put any of the new features.

Div1 Medium: FlightScheduler

The maximum distance a plane can travel given an initial mass of fuel `f` is: `R = K ln( (m + f - t) / m )` where:

`K` is a constant for the plane (efficiency)
`m` is the plane's mass.
`t` is the mass of fuel spent during take off, so the initial amount of fuel has to be at least this.

(Special effort was put to make the problem statement as unclear as possible. I initially didn't know that the take off amount wasn't included in the first formula that appears in the problem statement. This really delayed me).

Anyway, so knowing that, we need to minimize the amount of fuel needed to travel `D` distance units. We are allowed to land at any intermediary point, recharge fuel and take off.

Equal distances

The first thing to know is that if we make `(n-1)` stops, so that the plane takes off `n` times in total, it is always better to cover the same distance in each of the sub-trips. So we should move `D / n` units in each of them.

The independent variable that determines the final fuel cost is then just `n`. We need to find the best value of `n` for this.

Ternary search

So, let us say we make one flight. In this case, the amount of fuel needed for a single round might be too large (Note that the formula for R is logarithmic in the mass of fuel, so increasing the amount of fuel by a large amount will not increase the distance that much).

If we instead do a lot of flights, it is also not very good, because each flight needs at least `t` fuel.

There is a value of `n` that will yield a minimum value for `f(n)`, the total fuel cost. For `(x < n)` and `(y > n)`, we can guess that: `(f(x) > n)` and (f(y) > n). Also, the more we move away from `n`, the worse the result. This is all ternary search-friendly. Let us do it.

f(n)

So let us just find `f(n)`, the minimum fuel cost needed to travel `D` distance units using `n` trips. Let `(d = D / n)` , this is the distance per trip. What is the minimum fuel cost to travel `d` distance units? We need the following to be true::

`K ln( (m + f - t) / m ) >= d`

Let us solve the inequality.

`ln( (m + f - t) / m ) >= d / K`
`(m + f - t) / m >= e ^(d / K)`
`m + f - t >= me ^(d / K)`
`f >= me ^(d / K) - m + t`
`f >= m(e ^(d / K) - 1) + t`

So that is the lower bound for f.

Implementing ternary search

So, I really wanted to use the new c++ features. You know what is funny about ternary search? You need to implement the function f. You know what is lame about solving TC problems using multiple functions? TURNING ARGUMENTS INTO CLASS MEMBERS. It is so tedious to copy variables manually just so a new function can use them in their scope. OOP is truly the worst idea ever. Anyway. It occurred to me... I can use a closure! This way when I create the function f() I will do it inside the scope of the main method, so no need to copy arguments manually.

The result is beautiful:

double minimizeFuel(int distance, int K, int emptyMass, int takeoffFuel)
{
    // so we use closure syntax. The [=] means
    // that any variable like "emptyMass"
    // that is referenced inside the anonymous function (lambda) will be
    // automatically copied, so when we call f() it will use copies just fine.
    
    auto f = [=](long n)->double {
        double d = distance * (1.0 / n); 
        return n * ( takeoffFuel + emptyMass * (exp(d/K) - 1.0) );
    };
    
    long lo = 1, hi = 40000100000L;
    while (lo + 2 < hi) {
        long ha1 = (lo*2 + hi) / 3, ha2 = (lo + hi*2) / 3;
        if ( f(ha1) < f(ha2) ) {
            hi = ha2;
        } else {
            lo = ha1;
        }
    }
    // the minimum is one of f(lo), f(lo+1) or hi = f(lo+2):
    vector<double> res = { f(lo), f(lo+1), f(lo+2) };
    return *min_element(res.begin(), res.end());
}

It turns out that I had to go out in the middle of the match , so I didn't have much time to finish this problem correctly. When I came back, there were few minutes left of coding phase. I tried to debug, I found the bug (The issue with problem statement I mentioned above). But it was still wrong. Eventually, when there were only 5 minutes left, I gave up trying to find the bug and moved on to 250. It turns out that my bug was that I did: d = distance / n. Both distance and n were integers, so it was doing integer division....

Div1 Easy

An implementation problem, I didn't have much time to finish it.

And more

It turns out that lambdas and closures are awesome. There is plenty of more in regards to this new functional c++.

After the match, I figured, that I could actually make myself a TernarySearch library (template) function that does the whole TernarySearch for me and can be used in any situation.

// D: function domain 
// R: function range  , so f: D -> R
// Yep, std::function is useful:
template<class D, class R> R TernarySearch(D lo, D hi, std::function<R(D)> f) {
    while (lo + 2 < hi) {
        D ha1 = (lo*2 + hi) / 3, ha2 = (lo + hi*2) / 3;
        if ( f(ha1) < f(ha2) ) {
            hi = ha2;
        } else {
            lo = ha1;
        }
    }
    vector<R> res = { f(lo), f(lo+1), f(lo+2) };
    return *min_element(res.begin(), res.end());
}

So, if I wanted to use this function to solve today's problem:

double minimizeFuel(int distance, int K, int emptyMass, int takeoffFuel)
{
    return TernarySearch<long, double>(1, 40000100000L, [=](long n){
            double d = distance * (1.0 / n); 
            return n * ( takeoffFuel + emptyMass * (exp(d/K) - 1.0) );
        });
}

Or maybe you prefer the following version:

double minimizeFuel(int D, int K, int m, int t)
{
    std::function<double(long)> f = [=](long n){
        return n*( t + m * (exp(D / (1.0*n*K)) - 1) );
    };
    return TernarySearch(1L, 40000100000L, f);
}

Comments?

What do you think of g++ 4.8.1 so far? Worthwhile upgrade?

Thursday, August 08, 2013

Test SRM #2 : Revenge of the C++11

So, there is another test SRM scheduled for Saturday. This is great news. Whilst TC will be testing new compiler versions for c++ and python, I will be testing a new code environment.

It looks like TopCoder is getting gcc-4.8.1. This is quite serious. Because the c++11 support will be enabled. Take a look to the list of c++11 features available in gcc-4.8.1: http://gcc.gnu.org/gcc-4.8/cxx0x_status.html. Compare it with the puny gcc-4.4 list: http://gcc.gnu.org/gcc-4.4/cxx0x_status.html. Of course, some things like threads will not be supported in SRMs, but otherwise the list of syntax candy will be increased greatly. gcc-4.8.1 is so new, that ubuntu 12.04 repositories don't support it. The list of new features is humongous. If you thought auto and tuples were a big deal. Brace yourselves...

range-based for loops

string canTransform(string init, string goal)
{
    string x = "";
    //Beautiful:
    for (auto ch : goal) {            
        if ( ch != 'z') {
            x += ch;
        }
    }
    return (x == init) ? "Yes" : "No";
}

You could use char ch too if that's your taste.

Lambda expressions

Remember sorting in c++? When the ordering is not a trivial one, you often had to declare a whole struct with strange syntax inside and then use the struct in the call to sort. This is an alternative to all that:

// How to sort elements in a vector by their absolute value:
vector<int> v = {5, 0,-1,-5, -3,3,2, -5};
sort(v.begin(), v.end(), [](int a, int b) {
    return abs(a) < abs(b);
});
// shows: 0 -1 2 -3 3 5 -5 -5
for (auto &y : v ) {
    cout << y << ", ";
}
cout <<endl;

Lambdas are anonymous functions that you can create inside code blocks. In the example we use the [](int a, int b) { return boolean } syntax to create a function pointer that is then used and called by sort. The world of lambdas is a huge one. And their usefulness is understated the first time you hear about them. Here is a page that explains them throughly.

Closures

One of the good things about lambda syntax is that it enables closures. Aaaand things just got crazy...

For example, what if you want to sort some numbers `x` by the values `y[x]` in some array?

vector<int> y = {10, 7, 8, 3, 5, 1 ,1};    

vector<int> x = {0, 1, 2, 3, 4, 5 ,6};
// sort x[i]s by y[x[i]]:
sort(x.begin(), x.end(), [=](int a, int b) {
    return (y[a] < y[b]);
});
// note the [=] syntax, we are making copies of vector<int> y.

// display 5, 6, 3, 4, 1, 2, 0,
for (int z : x ) {
    cout << z << ", ";
}
cout <<endl;
// we can use int z : x
// If we did int &z: x, they would be references
//   (modifiable and maybe faster)

So...

It's hard to overstate my satisfaction.

If only the python version was python 3, everything would be 110% perfect

Tuesday, July 02, 2013

More about c++11 - tuples, tie and make_tuple

Remember that last post about new C++ features enabled in TopCoder's servers? These new features (and far more) are also available in Codeforces if you choose GNU c++0x 4 as language (instead of plain GNU c++) and in those contests in which you can compile locally by just using a recent g++ version with the -std=c++0x modifier.

But there is more about the improvements in C++. Have I ever told you about how much I like std::pair? Let me introduce pair's younger brother, std::tuple:

Why and how were tuples added?

One flaw about std::pair was how it stores only two values. Which made it of limited use. There were plenty of times during contests where I had to use nested pairs. Ugly code like queue<pair<int, pair<int,int> > >. The optimal would be to have something that behaves like std::pair, but allows more than two values. A solution like a std::triple would have been a poor solution though, because it would still be limited. But of course, templates with variable argument number would be crazy, right man?

Enter variadic templates, a c++11 killer feature, in my opinion. Templates with variable number of arguments. They allow these tuples and plenty of other things through recursion. It is amazing the sort of things c++ will be able to do at compile time with these. Imagine a strongly typed printf-like function, for example. The good news is that these templates work in TopCoder's ultra old g++ 4.4, and CodeForces' compiler is even more recent. So they are fair game. While C++, the language, adds variadic templates, the STL, the standard template library, is the one responsible of adding tuples.

Let us get to use them

Just add the include: #include <tuple>

Now we basically use std::tuple the same way we used std::pair in the past, except that we can now use more than two arguments. Let us start with an easy one, imagine that you are running a BFS (Breadth-first-search) on a graph that is described by three dimensions, so imagine that it is a 3D grid with x, y and z. (So typical).

queue<tuple<int,int,int> > Q;
bool visited[X][Y][Z];
memset(visited, 0, sizeof(visited));

visited[0][0][0] = true;
Q.push( make_tuple(0,0,0) );
while (! Q.empty()) {
    tuple<int,int,int> node = Q.front(); Q.pop();
    int x = get<0>(node), y = get<1>(node), z = get<2>(node);
    //... etc
    // augment edges from node (x,y,z)?
}

The first bad news is that things like get<index> are needed to access the tuples. They are a bit verbose for my taste. Also note that although they use integers as a matter of indexing, they have to be constants. (Although if you really need variable indexes, you probably need a vector and not a tuple). However, there was possibly no choice in this case, because in order to implement variadic templates, you need some recursion, which means that indexing is likely needed...

But let us go on. Although the get>0> seems heavy, compare with the alternatives. Coding a whole struct? Using a nested pair like I did in one of the first paragraphs? So verbosity could be worse.

The reality is that tuples (or pairs) are not meant to be a replacement of structs or classes, but just something that will help you in some peculiar cases. Like in the post about std::pair, let us deal with a comparison problem: John wants to pick between two strings. He prefers strings that have a larger number of even letters (b,d,f,...). If two strings have the same number of even letters, he prefers the one with a smaller length, if two strings have the same number of even letters and the same length, he prefers the lexicographically smallest one. As simple as:

int even(const string & x)
{
    // count the number of even characters
    int c = 0;
    for (int i=0; i<x.size(); i++) {
        c += ( x[i] % 2 == 0);
    }
    return c;
}

// Given a and b, picks the string that John likes the most:
string johnLikesString(string a, string b)
{
    auto q = std::min( make_tuple(-even(a), a.length(), a), 
                       make_tuple(-even(b), b.length(), b) );
    
    return get<2>(q);
}

Like with std::pair, tuples can be compared using the common relational operators (They implement lexicographical comparison for the values in the order in which they are in the tuple, so element 0 is compared first, if both are equal, element 1 is compared and so and so). Which means that we can use std::min() to pick the smaller of two tuples. Since we wanted the string with the larger number of even characters to be picked, we call -even(...) instead of just even(). At the end, the tuple that is picked by std::min will contain John's preferred string as element #2.

Thanks to relational operators like `<` , `>` , `<=` , `>=`, being implemented automatically for our tuples, we can sort tuples using std::sort. We can also use tuples in std::set, or as keys in std::map.

Also note the cameo by the auto keyword. A great feature. In this case, we don't even need to know the type of the q variable in order to use it. After some inspection, we can see that its type is: tuple<int,int,string>.

std::tie

std::tie is an interesting thing, it does something similar to std::make_tuple, but it uses "lvalue references" instead of creating copies of the values. For simple types, like ints it is not important. Big deal, you are creating a copy of number 5 in memory. But what if you want to compare data structures that could be sort of big? Like in the previous example, what if the strings could be 10000000 characters long? Creating new copies of the strings could be too expensive, so expensive that you reach a memory limit in a problem. But it is not only memory, creating copies of objects takes time.

string johnLikesString(string a, string b) //using by value-arguments sorts of makes
                                           // the following optimization a bit worthless,
                                           // I know.
{
    // tie uses references, but we can't use a reference to a temporary
    // object. So we couldn't use a.length() directly inside of std::tie
  
    int aeven = -even(a), alen = a.length();
    int beven = -even(b), blen = b.length();
    
    // Comparing without creating a copy of the strings:
    if (std::tie(aeven,alen, a) < std::tie(beven, blen, b) ) {
        return a;
    } else {
        return b;
    }
}

Pitfalls and a bonus

Nothing is perfect, specially not c++, not even c++11. Actually, c++11 follows the long tradition that coding in c++ is a lot like operating riding a rocket. A rocket is a fast method of transportation and it is the product of plenty of work and research. But god, if you ride it, it will explode in your face eventually.

The usual std::pair issues stand, although they seem to behave a bit better. But for example, the following line of code will make your compiler throw hundreds of syntax errors at you:

tuple<int,int,string> a = make_tuple(4,5,"ssss");

Well, of course, the issue is that "sss" is a const char*, did you expect the tuple to know that it can cast const char* to std::string? hahaha. And since the error occurs in the third step of the tuple recursion, your compiler will be very confused. There are ways to fix this.

// make the typeo f the argument explicit:
tuple<int,int,string> a = make_tuple(4,5, string("sss") );

// use make_tuple with explicit types:
tuple<int,int,string> a = make_tuple<int,int,string>(4,5, "sss" );

// use make_tuple with explicit types and abuse auto:
auto a = make_tuple<int,int,string>(4,5, "sss" );

// Also abuse the new { } initializer:
auto a = tuple<int,int,string>{4,5, "sss" };        

// Now that I think about it, the whole thing was very silly:
tuple<int,int,string> a(4,5,"sss");

But don't be sad. Here is an eastern egg for you. Swapping the values of two variables, python style:

int x = 5, y = 7;
//swap x and y:
tie(x,y) = make_tuple(y,x);

// All possible because tie uses references and make_tuple creates copies

int z = 6;
//cycle x,y,z = (y,z,x)
tie(x,y,z) = make_tuple(y,z,x);


//Greatest common divisor, the way Euclid intended:
int gcd(int a, int b)
{
    while (b != 0) {
        tie(a,b) = make_tuple(b , a%b);
    }
    return a;
}

Subscribe to: Posts ( Atom )