## Wednesday, April 16, 2014

### SRM 616 recap and python review

It was a fine match. Actually the first SRM problem set by tourist. Well, it is also the sixth match of 10 matches in my python challenge. Having finished 60% of it, I have a new perspective. At this moment, I think that when the challenge ends, in SRM 621, I might actually still use python. At least in problems where I am sure it can shine.

## Div1 250 - The one with alarm clocks

Match starts, open the easy problem in division 1. Meet a simulation problem. problem statement. In this problem, you start with a value S at time 0. Every second, S is increased by D, then the alarms scheduled to run at that second ring. Each alarm that rings will decrease S by some special value specific to the alarm. We have the alarms periods (Between 1 and 10), first ring time (between 1 and its period) and volume (the value by which S is decreased). Each alarm rings at times: start, start + period, start + 2*period, etc. There are up to 50 alarms. When S becomes 0 or less , you wake up immediately. Find the maximum starting value of S such that you will eventually wake up. If no matter what value of S you pick, you wake up eventually, return -1.

We can combine the cycles together by taking the LCM of all their periods. Because the periods are not more than 10, the combined cycle length is not more than 2520 = 2*2*2*3*3*5*7. You can answer the question by just simulating the first LCM seconds. For more info, read the editorial.

This is a problem that does okay in python. Python can really help keep the code small in comparison to the other languages. I took some advantage of it during the match and got a good score. But in reality, my original code could be improved in many ways. For example, did you know there is a builtin GCD function?.

from fractions import gcd
def lcm(x,y):
return x * (y / gcd(x,y))

class WakingUp:
def maxSleepiness(self, period, start, volume, D):
sleep = 0
n = 0
for i in xrange(1, reduce(lcm, period) +1 ):           # LCM is at most 2520
sleep += D
for (p,s,v) in zip(period, start, volume):
if (i - s) % p == 0:
sleep -= v
n = min(n, sleep)
return -1 if (sleep < 0) else -n


Many details, for example using reduce to get the lcm of all the periods. The zip function to iterate through the alarms without having to do much indexing...

A subtle one: To check if an alarm should ring at time i, we use (i - s) % p == 0. In c++ you would need : i % p == s % p. The reason is that python's % operator is not broken for negative numbers. Very handy.

For a comparison, try the fastest c++ submission. The algorithm is exactly the same, but you can see that it has to "weigh" more, it has many sources of complications. Having to use a loop to calculate the LCM of all periods, or using the for j = 0; j < n; j++ to get the special alarms.

## Div1 500

After finishing div1 250, I moved to the next problem. It just seemed ... impossible. It turns out that it is just a mathy problem. I am not very good with them.

I skipped this problem during the match. Later I , of course, had to learn it to solve the editorial. Once I understood I implemented a python solution. It actually took a while to code, and I am not sure if it is really using all of python's potential. You can read the code at the editorial.

## Div1 1000

I explained this problem in detail before: here

Just like there are topcoder problems that are not my style at all. There are some which are really my style. This was the rare div1 hard I actually solved the instant after reading it. Because I just like grid-bitmasks dp problems a lot.

There is a world of difference between having a correct idea for a problem and actually implementing it. I noticed the O(n^5) idea , while correct and possibly able to pass in c++, would fail utterly to run in time in python.

So let's dissect why the following python solution (closest I got to good performance) have much more issues passing than the c++ equivalent you can see in the editorial I linked.

sys.setrecursionlimit(10**6)

class ThreeLLogo:
def countWays(self, grid):
n = len(grid)
m = len(grid[0])

def allValidMasks():
yield 0
for i in xrange(m-1):
yield (1 << i)
for j in xrange(i+1, m-1):
yield (1 << i) | (1 << j)
for k in xrange(j+1, m-1):
yield (1 << i) | (1 << j) | (1 << k)

MAX_MASKS = 4090
validMasks = [0] * MAX_MASKS
getMaskId = dict()
t = 0
for mask in allValidMasks():
validMasks[t] = mask
getMaskId[mask] = t
t += 1

@MemoizedFunction( [n+1,m+1,4,MAX_MASKS,2] )
def f(x,y, need, maskid, horz):
mask = validMasks[maskid]
if x == n:
# base case
return 1 if ( (mask == 0) and (need == 0) ) else 0
elif y == m:
return 0 if (horz == 1) else f(x + 1,0, need, maskid, 0)
elif (1<<y) & mask:
if (horz == 1) or (grid[x][y] == '#'):
#must continue the left "L" shape, but also must continue the above one
return 0
else:
# 1) Finish the L shape above.
nmask = mask ^ (1 << y)
res  = f(x,y+1, need, getMaskId[nmask], 1)
# 2) Continue the L shape vertically
res += f(x,y+1, need, maskid, 0)
return res
elif horz == 1:
return 0 if (grid[x][y] == '#') else sum(f(x,y+1, need, maskid, s) for s in (0,1))
else:
# 1) Do nothing
res = f(x,y+1, need, maskid, 0)
# 2) Do something, Create a new L shape
if (grid[x][y] == '.') and (need > 0) and (y < m - 1) :
nmask = mask | (1 << y)
res += f(x,y+1, need - 1, getMaskId[nmask], 0)
return res

return f(0,0, 3, 0, 0)

# decorator that adds memoization to a function using a list of bounds
# ej: @MemoizedFunction( [10, 10, 10] )
def MemoizedFunction(bounds):
def deco(f):
mem = [-1] * reduce( lambda a,b: a*b, bounds )
def g(*args):
p = 1
k = 0
for i in xrange(len(args)):
(a,b) = (args[i], bounds[i])
k = a * p + k
p = p * b
if mem[k] == -1:
mem[k] = f(*args)
return mem[k]
return g
return deco


Well, things that were already mentioned, the default recursion limit is a bit large. Python is just slower than c++, mostly because its code does much more than c++. It has this dynamism that makes it so powerful but also slower. In an i5 computer, it takes around 10 seconds, while the goal is 3 seconds in TC's servers... Another large issue was the memory usage. Even with the clever decorator that makes all of the dp table contents exist within a single list, we would need around 768 MB to make all fit in memory.

During the match, I began to code a solution to the problem anyway. Because of the python challenge, I couldn't use c++. I still wanted to solve it because it was fun and also, if the python solution works in small inputs, it would mean I at least understood the problem, which would be helpful later and it would be a moral victory :). So I tried to do it, but I had bugs. Story of my life, I couldn't really finish the code until after I spent a couple of days working in the rest of the editorial..

My first code (once fixed) was actually more compact and slow than the one above:

import sys
sys.setrecursionlimit(10**6)
class ThreeLLogo:
def countWays(self, grid):
n = len(grid)
m = len(grid[0])
@MemoizedFunction
def f(x,y, need, arms, horz):
if x == n:
# base case
return 1 if ( (arms == () ) and (need == 0) ) else 0
elif y == m:
return 0 if (horz == 2) else f(x + 1,0, need, arms, 0)
elif y in arms:
if (horz == 2) or (grid[x][y] == '#'):
#must continue the left "L" shape, but also must continue the above one
return 0
else:
# 1) Finish the L shape above.
narms = tuple( z for z in arms if z != y )
res  = f(x,y+1, need, narms, 2)
# 2) Continue the L shape vertically
res += f(x,y+1, need, arms, 0)
return res
elif horz == 2:
return 0 if (grid[x][y] == '#') else f(x,y+1, need, arms, 1)
else:
# 1) Do nothing
res = f(x,y+1, need, arms, 0)
# 2) Do something
if (grid[x][y] == '.'):
# 2.1) Create a new L shape
if need > 0:
narms = tuple( sorted(arms + (y,)) )
res += f(x,y+1, need - 1, narms, 0)
# 2.2) Continue the left L shape:
if horz == 1:
res += f(x,y+1, need, arms, 1)
return res

return f(0,0, 3, (), 0)



However, this helped me discover something cool. After writing this code, it was easy to optimize it and later translate to c++. Behaving as a human python-c++ compiler. When the python challenge ends, I might try this more often. Code a prototype in python making sure the logic works, then translate to c++ :).

## The other ones

Then I wrote the editorial. Besides of the div1 hard that appears to be impossible in python, the remaining problems work just fine.

A notable example: This 3-liner solution for div2 500 which could be made into a one-liner.

class ColorfulCoinsEasy:
def isPossible(self, values):
limits = sorted([ (values[i+1]/values[i]) for i in range(len(values) -1) ])   ## get list of values[i+1] / values[i], sort it
condition = all(i+1 < limi for (i,limi) in enumerate(limits))                 ## Required: for each i: i + 1 < limits[i].
return "Possible" if condition else "Impossible"