Saturday, March 08, 2014

SRM 611 Python review

SRM 611 was the first of 10 topcoder SRMs in which I am going to exclusively use python during the contest. During the match, I had a bit of a bad day. Now I finished the first part of the editorial so I had to solve all the problems (except the hardest one, for now). I have some things to say.

Div2 250: The one with digits

link to problem statement

The problem is just an implementation thing: Return if a number (provided as a string) is interesting. A number is interesting if for each digit d from 0 to 9, either it doesn't appear in the number or it appears in exactly two positions and there are exactly d other digits between those positions.

My first reaction was:

class InterestingNumber:
 def isInteresting(self, x):
    for d in range(10):
        ch = chr(d + ord('0') )
        if ch in x:
            indexes = [ i for i in range( len(x) ) if x[i] == ch ]
            if len(indexes) != 2 or indexes[1] - indexes[0] - 1 != d:
                return "Not interesting"
    return "Interesting"

It is okay, although not particularly interly or showcasing of what's different in python. What is interesting is extracting the indexes that contain ch in one line.

It can be simplified a bit, although it is unclear whether this makes the code clearer or not:

class InterestingNumber:
 def isInteresting(self, x):
    for d in range(10):
        ind = [ i for i in range( len(x) ) if x[i] == chr(d + ord('0') ) ]
        if (len(ind) > 0) and ( len(ind) != 2 or ind[1] - ind[0] - 1 != d ):
            return "Not interesting"
    return "Interesting"

Too imperative, maybe?

class InterestingNumber:
 def isInteresting(self, x):
    def validIndexes(d, ind):
        return (len(ind) == 0) or (len(ind) == 2 and ind[1] - ind[0] - 1 == d)
    
    def extractEqual(d):
        return [i for i in range(len(x)) if int(x[i]) == d]
    
    def validDigit(d):
        return validIndexes(d, extractEqual(d) )
     
    return ["Not i","I"][ all(validDigit(d) for d in range(10) ) ] + "nteresting"

The all() function is a useful builtin, returns True if all the stuff you provided it are True too.

I like this last one because it is almost pseudocode (to me). But I don't know, really. At least in general the code in this problem is quick to write in python.

Div2 500: The one with LCM

problem statement

You are given a set `S` of up to 50 elements and a number `x`. All elements and `x` are between 1 and `10^9`. Return "Possible" if there exists a subset `S'` such that the LCM (least common multiple) of all the numbers in `S'` is equal to `x`.

The solution is basically to just take the LCM of all elements in `S` that are divisors of `x`. Answer is Possible if and only if that LCM is equal to `x`.

The first part of all of this is to find the LCM of two numbers. We can use the formula `lcm(a,b) = (ab)/gcd(a,b)`. Then we need to find the GCD... We use Euclid's algorithm. This takes me to how to implement Euclid's algorithm in python...

def gcd(a,b):
    while b != 0:
        (a,b) = (b, a%b)
    return a
    
def lcm(a,b):
    return a*b / gcd(a,b)

This is a good approximation. There is a slight problem and it is, what if `0 = a < b`? The correct result in that case is `b`. This is not an issue in this problem specifically because no element is 0. But we can solve it by making sure `b` is always the smallest of the two.

In c++ you'd do :

if (a < b) {
    swap(b,a);
}

But in python, there are some other ideas. For example: `(b,a) = (min(a,b), max(a,b))` . But my favorite is:

def gcd(a,b):
    (b,a) = sorted( (a,b) )
    while b != 0:
        (a,b) = (b, a%b)
    return a

The sorted function will sort the tuple `(a,b)`, so its result will be `(min(a,b), max(a,b))`, then we do the assignment in such a way that `b` will hold the minimum and `a` the maximum.

Now we need to extract the elements of `S` that are divisors of `x` (easy), and then take the LCM of them all... Our lcm function takes a pair of numbers, but this is what reduce was made for! reduce is another useful builtin with a functional origin story. Basically `"reduce"(lcm, (a,b,c,...)) = "reduce"( lcm, (lcm(a,b),c,...) )`... So calling reduce with lcm as argument will apply the lcm of all the numbers. A bit of a corner case is when the set of divisors of `x` in `S` is empty... Then reduce would get an empty set. For those cases, reduce would throw an exception unless you give it a default value. A good default value is 1.

def div2solution(S, x):
    return x == reduce(lcm, [z for z in S if (x % z == 0) ], 1)

Reduce is one of my favorite things after starting to use python. It makes code so much simpler once you "get" it. The imperative alternative would be to set a variable to 1, do a loop and repeat lcm( variable, x[i] ) and things like that.

Div2 hard: The one with elephants

This was a long implementation problem with a long problem statement that it is difficult for me to abridge and an even longer explanation for a solution that is quite obvious. So here is the problem statment and here's the editorial (Too bad topcoder is still hiding editorials to guest users, I guess those are *great* at SEO...).

The crux of the matter is, you need a brute force to pick all combinations of starting positions in the top and bottom rows. There are at most 5 columns, so there are at most `2^10` such combinations. So really, we need to iterate through all `2^(2n)` subsets of the numbers from `0` to `2n-1`. In the contest world, we usually use bitmasks, we are so used to them that we forget how horrific of an incomprehensible low level hack they are. The other option is to use backtracking, but that's a lot of code... In python, though, we have something better: itertools.combinations. It allows you to easily iterate through all the subsets of size `r` of some list.

The second issue with this problem is that you need to simulate a search that can start in the top,right,left or bottom side of the map, progress towards the opposite direction until it finds a "pond" or a square that is already blocked, and block all the cells in its path. This sort of simulation is best if we engineer it to use a single function. Nesting the function makes things just easier here:

class ElephantDrinkingEasy:
 def maxElephants(self, theMap):
    n = len(theMap)
    mx = 0
    for r in xrange(2*n + 1):                                #number of elements of subset
        for prows in itertools.combinations(xrange(2*n), r): #subsets of r elements
            
            # copy the contents of the map so we can overwrite them
            sim = [ [ theMap[i][j] for j in xrange(n) ] for i in xrange(n) ]
            res = 0

            #the simulation in a cnvenient nested function
            def simulate(x,y, dx,dy):
                while (0 <= x < n) and (0 <= y < n):
                    if sim[x][y] == 'Y':
                        sim[x][y] = '#'
                        return 1
                    elif sim[x][y] == '#':
                        break
                    else:
                        sim[x][y] = '#'
                        (x,y) = (x + dx, y + dy)
                return 0

            for i in prows:
                x = i % n            
                (y,dy) = (0, 1) if i < n else (n-1, -1)
                res += simulate( x,y, 0,dy)
            for j in xrange(2*n):
                y = j % n
                (x,dx) = (0, 1) if j < n else (n-1, -1)
                res += simulate( x,y, dx,0 )
            mx = max(mx, res)
    return mx

Div1 easy : The one I failed

Problem statement

The LCM set of a set S is the set of all numbers that can be generated by taking LCM of a non-empty subset of S. Given sets `A` and `B`, find if the LCM sets of the two sets are equal.

I failed this problem during the match. It turns out that my logic was mostly right, the mistake, however was in checking if it is possible to build a number out of some of the other elements in the set. During the match I did an overcomplicated mess based on prime factors, that was wrong. Now we know that this is actually the solution to the division 2 problem above. So we need to do the same thing "compress" both sets getting rid of the redundant numbers and then compare the compressed sets.

def compress(X):
    return { y for y in X if not div2solution( [z for z in X if z != y] , y) }

def div1solution(A, B):
    return compress(A) == compress(B)

Another shiny python implementation. [z for z in X if z != y] returns the elements in `X` that are not equal to `y`. Note that I return a set (surrounded by `{}` instead of a list, this saves me the issue of sorting after compressing (A and B are provided as tutples, so their orders may be different even though they represent the same sets).

Div1 medium: The one with Spanning Tree

You are given 50 points. Find the Spanning Tree to connect those points through straight lines (ignoring that there might be intersections), but instead of a usual MST problem, we want the Spanning Tree with the minimum standard deviation of edge weights...

Long story short, we need to repeat the following for each dummy average `m = (w_i - w_j)/2 + \epsilon` where `w_i` and `w_j` are any two edges of the original graph. Find the minimum spanning tree using weights: `(m - w_i)^2`, where `w_i` is the original edge weight. Take the standard deviation of the `w_i` weights in this tree. One of the standard deviation of the `O(n^4)` trees you are going to try is going to be the answer. *Somehow* (This is a python post, not an editorial, read the long story short for what passes as a proof).

Anyway... This problem seems impossible in python through this `O(n^6)` approach. `O(n^6)` under two seconds in TC servers for `n<=20` is a piece of cake for c++ and Java, but apparently not for python. This is my fastest python solution:

class Egalitarianism2:
 def minStdev(self, x, y):
     points = [ (float(p), float(q)) for (p,q) in zip(x,y) ]
     
     def euclid(p, q):
         return math.sqrt( (p[0] - q[0])**2 + (p[1] - q[1])**2 ) 
     
     #Prim's MST algorithm:
     def deviationPrim(m):
         n = len(points)
         par = range(n)
         w = [ [ (m - euclid(points[i],points[j]))**2 for i in xrange(n) ] for j in xrange(n) ]
         c = 0
         reached = [ i == 0 for i in xrange(n) ]
         parent = [0] * n
         dist = [ w[0][i] for i in xrange(n) ]
         r = [0.0] * (n - 1)

         for i in xrange(n - 1):
             v = min( (dist[j],j) for j in xrange(n) if not reached[j] )[1]
             u = parent[v]
             r[c] = euclid( points[u], points[v] )
             c += 1
             reached[v] = True
             for j in xrange(n):
                 if dist[j] > w[v][j]:
                     dist[j] = w[v][j]
                     parent[j] = v

         # Take the standard deviation of weights
         m = sum(r) / (n - 1)
         return math.sqrt( sum( (x-m)**2 for x in r) / (n - 1) )

     res = 10.0**9
     edges = [ euclid(a,b) for a in points for b in points ]
     for (e1,e2) in itertools.combinations( edges, 2):
         m = ( e1 + e2 ) / 2
         res = min(res, deviationPrim(m + 1e-9) )
    
     return res

A cool thing is how it takes the unordered pairs of edge weights to find `m = (e_1 + e_2) / 2`, instead of using indexes i and j, we can just use itertools.combinations yet again :).

Unfortunately, it is too slow. It needs 30 seconds to run for `n = 20` in my computers, which means it should probably take 25 seconds in topcoder's servers. Could it be improved? Maybe, my python skills are still young and this is the first time I try Prim's in python. Either way, the c++ version of the code easily passes under 1 second. So it is at least much harder to pass in python.

To sum up

Some great python opportunities in most of the match, except for div1 medium which is extremely difficult to pass in python if not impossible.

1 comment :

Ivan Kazmenko said...

["Not i","I"][ all(validDigit(d) for d in range(10) ) ] + "nteresting"

That's not very readable, I'd propose:

"Interesting" if all(validDigit(d) for d in range(10)) else "Not interesting"