Saturday, April 07, 2018

Google Code Jam 2018 Qualification Round

That was certainly an interesting codejam. One that was marked by the question "Do I even want to do this?".

The new system

Turns out Codejam is using a different system and rules than before. I don't really like it Click that link if you want to see my live reaction after finding out at 7:00 PM last night. My plan for the weekend was to possibly spend the whole Friday night and Saturday working on the code jam and prepare a cool blog post about it with explanations and all for old time's sake. That didn't work out.

I already said too much on those tweets, but what I just want to be clear about is: The Codejam Qualifaction was not just a contest. It was an event. It was always really great to see what happened. It always has the most massive number of participants ever. A big part of this is the idea that some coders might try some bizarre things. Solve the contest in ... Assembler? Or that year I used a programming language I designed myself. Looking at the score board and filtering it so it only shows you people you know (this option is gone). And sometimes there would be an epic problem that keeps you busy the whole day.

But the worst of all is the fact that (one of) the biggest programming competitions just got way less accessible than it was. Telling people that they can just use any programming language was great, but we lost that.

Do I even want to do this?

So I am not even sure if I want to participate in the contest. I know how to solve problem A, but I can't even find a vital bit of info about the new rules: What is the CPU of the servers where our code runs? So this sounds suspiciously like one of those badly made ACM contests:/ do I even want to participate? I no longer have so much free time as before. I barely have time in the weekends to rest. So a Saturday is not something I can just spend in some contest if it doesn' sound at least a bit fun. And the prizes are as low as ever. I decide that at least for Friday I am not going to bother.

The last few hours - Problem B

So okay, so I am a bit curious about how the codejam is going and I accidentally read problem B and figure out that it is an extremely easy problem.

There is a badly made sorting algorithm. Taking an array `V`, it finds two indexes `i`, and `(i+2)` such that `V[i+2] > V[i]`. In that case, it takes the part of the array: `(V[i],V[i+1],V[i+2])` and reverses it `(V[i+2],V[i+1],V[i])`. It repeats this until it cant find any more such `i`, and `(i+2)` pairs. This algorithm can't always sort the array correctly, and given an input `V ` we are supposed to predict the part where the algorithm is going to fail. If it is going to fail, we have to find the first index that is not sorted correctly. For B large the number of elements can be up to 100000.

The algorithm is `O(n^2)` so for the large version of the problem we can't just simulate it. The key of the problem is to notice that after reversing `(V[i],V[i+1],V[i+2])` into `(V[i+2],V[i+1],V[i])`, the position of `V[i+1]` is not going to change. So the operation is the same as swapping `V[i]` and `V[i+2]`, which are the elements we compared. This means that there are actually two independent partitions of the array. The elements with even indexes will never interact with the elements with odd indexes. So imagine the array `[6,5,4,3,2,1]` : It has two independent sub-arrays: `[6,..,4,..,2,..]` and `[5,..,3,..,1]` and in these sub problems, all you can do is swap consecutive elements. This means that the two sub arrays are being sorted with normal bubble sort. And eventually the bubble sorts will end running and we will get: `[2,..,4,..,6,..]` and `[1,..,3,..,5]` and when we put the two sub-arrays back into the big array we get: `[2,1,4,3,6,5]` and it is clearly not sorted. So to solve this problem, we just split the array into two sub-arrays, one with the even indexes and one with the odd indexes. Sort the two arrays and put them back together. If the result is sorted, then all is fine, else we return the first index where it breaks.

// get the sub array with parity (0 or 1)
vector<int> getv(const vector<int> &V, int parity) {
    vector<int> r;
    for (int i = parity; i < V.size(); i += 2) {
        r.push_back( V[i] );
    }
    return r;
}

// mix two sub-arrays back into one large one:
vector<int> mix(const vector<int> &v1, const vector<int> &v2) {
    vector<int> V;
    for (int i = 0; i < v1.size() + v2.size(); i++) {
        V.push_back( (i%2 == 0)? v1[i/2] : v2[i/2] );
    }
    return V;
}

string solve(const vector<int> &V) {
    // split
    vector<int> v1 = getv(V,0);
    vector<int> v2 = getv(V,1);
    // sort
    sort(v1.begin(), v1.end());
    sort(v2.begin(), v2.end());
    // merge again
    vector<int> v3 = mix(v1,v2);
    // check
    for (int i = 0; i+1 < V.size(); i++) {
        if (v3[i] > v3[i+1]) {
            return to_string(i);
        }
    }
    return "OK";
}

Problem C is interactive

Okay ... so apparently I am participating in this. I am too bored to bother with A, so I open C. This is an interactive problem. You are given a 1000 x 1000 grid and want to paint at least `A` cells (in the easy version, `A` is 20 and in the hard version, 200). The objective is that the bounding box of all the painted cells should not contain any unpainted cell. And the catch is that when you pick a single cell to paint, the system is actually going to pick one of the 9 squares in the neighborhood of the cell you picked and paint that one. Even if the cell is already painted, if the system picks it, it will paint it again and waste a turn. You have only 1000 turns. Of course it picks any of the 9 squares in the neighborhood with 1/9 probability.

So okay, this is an interesting thing that we wouldnapos;t have been able to have in the old system (or maybe we could, by making the system expose a public API that interacts with your program?). At least it is a bit interesting. The solution is to come up with a strategy and prove that the probability that it needs more than 1000 turns is very low.

My solution works by making it into a linear problem. Instead of worry about 2D rectangles, you go in a straight line from bottom to top. Imagine we keep giving the system the order to paint at point `(x,y)`. Eventually, all 9 points around it will be painted. Once this happens, you can start returning `(x,y+3)` and paint another `3 x 3` square. And keep repeating. Until you/apos;ve painted enough squares. The result will always be a perfect rectangle.

The only catch is to calculate that this will tend to require less than 1000 steps. Actually, due to the random nature of the problem, there is always a probability that for some reason it could pick the same point 1000 times, but that is a very improbable thing.

So let's calculate the expected value for the number of turns we need before filling a complete 3 x 3 square. The number is between 25 and 26. To paint at least 20 squares, you would need 3 squares of `3 x 3` in total. That means that the expected number of turns you'll need is 228, which is pretty reasonable. Of course, the expected value is not the same as the probability it will work in less than 1000 steps. But a) It is easier to calculate and b) The expected value is designed to give us a good estimate. it's the average number of turns we will need, and because everything is so random, it would be a bit crazy to expect that the number of turns will get much larger than 228. Definitely not up until 1000...

But that's not the solution I tried. Mine is a bit smarter, no need to wait for the whole 3x3 square, you only need to wait for the bottom row of 3 square cells. Once it is full, we can start trying a square higher. So basically, we start with `(x,y)`, once the row of 3 cells bellow `(x,y)` is full, we start trying with `(x,y+1)` and so and so. We only need to figure out a good ending condition for this. If filling the current `3x3` square surrounding the current `(x,y)` is enough to complete the requirement for `A`, then we can stop

My calculations told me that the expected number of turns will be around 106 for `A=20` and more than 1200 for `A=200`. So it would pass the easy but not the hard problem. But in fact my calculations where over pessimistic and it passes the hard version of the problem as well.

You may be wondering how the hell did I calculate all of this? Well, here is an example: Imagine that we will keep printing `(x,y)` until the bottom row of `(x-1,y-1)`, `(x,y-1)` and `(x+1,y-1)` are all painted. What is the expected number of steps we need to perform?

There are 9 cells in total that can be painted by our `(x,y)` move but our objective is to paint 3 of them. The function `f(t)` will tell us the expected number of turns before we paint `t` specific cells (the other ones don't matter.

  • For `f(0)` : There are 0 cells that matter to us, so we don't need to paint anything, all is ready.
  • For `f(1)` : We need at least one turn. there are two possibilities:
    • With probability `1/9`, we painted the correct cell and there are now zero more cells to paint. The expected number of steps will be `1 + f(0)`
    • With probability `8/9`, we painted a cell that doesn't matter so there is still one special cell. The expected number of steps will be `1 + f(1)`
    • The total is: `f(1) = (1/9)(1 + f(0)) + (8/9)(1 + f(1)), note the recursion, but the recursion is not a problem, just consider `f(1)` as a variable, like `z`:
    • Solve the equation `z = (1/9)(1 + f(0)) + (8/9)(1 + z), the result of `z` will give us `f(1)`.
  • Then we can use that result to calculate `f(2)` (it's the same logic), and so and so. By filling the values of `f(t)`, I could find that it takes around 16 turns before we fill the bottom row and 25.something steps before filling the whole square. The reason this estimate is too pessimistic is that I don't consider that after filling the bottom row, there will be many squares in the `(x,y)` neighborhood that are painted, which increases the chance that future iterations of `y` are going to receive less work to do.

Turns that the problem came with a tester tool to interact with your program locally. But I didn't notice until after solving the whole problem and testing manually multiple times :/

Problem D and A

A is very easy but greedy and I am not feeling like explaining it right now. D was all about geometry, and I honestly was feeling a bit lazy. Do I really want to spend all that much time solving and specially debugging a geomtry problem? Not really. So I skipped it.

Turns out Stack Overflow has a nice explanation: https://twitter.com/fakevexorian/status/982800480886775808 , but the question was posted during the contest.... So ... why bother even If I went through the problem of solving this, no one can tell if anyone solved the problem on their own or because they found the stack overflow thread?

That's it

Thursday, April 05, 2018

I just wanted to ask, what's up with the 2010-quality prizes in algorithm competitions?

So hey, yes, I am going to be participating in "Google Code Jam 2018". I've been thinking for long to make a huge blogpost explaining my situation and why I stopped participating in algorithm contests and (more painfully) making editorials in TopCoder. (Just to be clear, it's not because I don't want to participate). I am unsure if I am going to make that blogpost, but one of the many tangents I wanted to include in that post is just a rant about the prize pools we've had in these contents for years now.
So as a bit of context, I used to be very active in these competitions. I started in 2008, I think. Back then the Google Code Jam was basically a Google-sponsored contest at Topcoder. And the prize pool in those times was just ridiculous in comparison to what we have now. First of all, there were both regional and global contests. And that's how I started, I was a finalist in the Latin American regional. And I really think that if not for the encouragement received from qualifying to that regional, I probably wouldn't have had such a large and extended "career" in these contests. Why? Because it would not have been worth it.
Another example: The TCO was also a thing back then - We had qualification rounds and there were t-shirts for the 3000 coders who qualified. The latter rounds would have other prizes. Well, probably the best thing you could get besides the finals' money prizes was a Topcoder-branded USB drive. But it was still pretty good.
And the money prizes were also quite there. Every month there were 2 SRMs where you could earn money prizes by acquiring the top positions in your room and this included division 2 coders. And I am not talking about those 5 dollars you can get for registering in those Harvard experiment matches. A talented coder could get 40 USD per match in average.
But let's forget about the money. Even those other prizes or the much higher chance to get a t-shirt were very important as encouragement.

Back to reality

The year is 2018, and although the Google Code Jam admins want us to believe it is a big deal that it is its 15-th year. It doesn't feel like such a special year. For some reason we are still following the prize pool from 2010. There was an economic crisis in 2010, (ironically, this crisis was caused in part by the irrational worship of algorithms as infallible) and this crisis brought extreme deflation in the availability of budgets for algorithm competitions. Topcoder eventually dropped their money SRMs (sans very extreme rare cases). The number of finalists dropped to the incredibly small value it is now. And there are at most 1000 t-shirts in code jam. (For TCO I think it is 450)? Although all of this was understandable during the recession, it stayed that way. And it has been 8 years since. I guess that the powers that be are content with seeing the same 25 people in all finals and not having to care about the remaining tens of thousands that participate in these contests. But maybe they should? You know, if there was more encouragement for those 10K coders, maybe more of them would be willing to commit the ridiculous amount of time it takes to become good at this stuff. Which would mean we would have more people who reach finalist level and there would be more competition there and we would all improve for it (maybe).
And although the number of t-shirts hasn't increased, the competition has only gotten tougher. Problems are harder and there's far more coders participating than before. I really think 1000 t-shirts is too little. Because I think that t-shirts should exist as a way to encourage even beginners to participate and continue participating. I don't think there's really so much benefit in having so few t-shirt winners. If you so desire make the top 1000 tshirts a different version so that people who really want to feel special would be able to remain feeling special.
In the last two years I've been able to look at these contests from more of an outsider perspective and it is really amazing how much of a niche we are. That there's so few benefits for complete new comers to start working hard to participate in these contests does not help.

Saturday, April 09, 2016

Google Code Jam 2016

I completely forgot about the qualification round. This morning I woke up and was planning to have a normal day when I accidentally noticed in my calendar applet that "Qualification Round" was on. What qualification round!? Oh no, the google one...



So I decide to register to the contest and it's 15 hours late. What are even the rules? Is it the same as usual? What's the cut-off? It turns out everything is mostly the same. I need 30 points and just the small inputs are enough to give you 37 points. So it should be easy to qualify.

Perhaps it's too similar to previous editions? I think 1000 t-shirts is pretty low, considering how much the attendance has grown since this number was established. Actually when I look at the past I think I would have never really put as much work on programming contest if, ages ago, when I was getting started, there weren't as many small rewards for simple things as they were. In the past, in codejam or topcoder open, qualifying to the elimination rounds was worth a t-shirt and there were extra little rewards for advancing in other rounds. In fact, the first codejam done using Google's own site had regional semifinals so if you were in the top 500 you'd already win a paid trip to a google office.

Back to the present. Qualifying, even 15 hours late should be pretty easy for a decrepitveteran contest participant such as I. But the real contest is to get a perfect 100/100. So what are my chances there? Looking at the score board it seemed like this qualification round had easier problems than previous editions, as a really considerable amount of competitors already had the perfect score. So I gave it a shot. Note that I am starting to write this before the end of the round, so I don't know if I scored properly.


Problem A (Large)


Problem Statement

This is a straight-forward problem, you just need to simulate the steps until all digits are found. A pressing question is what's up with impossible cases? Well, I actually just coded right away and tested all 1000001 possible inputs from 0 to 1000000, with a maximum number of 1000 steps to find all digits. Even then, the simulation is very fast and the only case that doesn't reach all digits is `N = 0`, which will always be 0 no matter how far you go. So with that knowledge I just submitted an easy code:



    const int MAX_ITERATIONS = 1000;
   
    long solve(long x)
    {
        long y = x;
        set<char> s;
        for (int i = 0; i < MAX_ITERATIONS; i++) {
            ostringstream st;
            st << y;
            for (char ch: st.str()) {
                s.insert(ch);
            }
            if (s.size() == 10) {
                return y;
            }
           
           
            y += x;
        }
        return -1;
        // not pictured: I/O stuff including translating -1 to "impossible"
    }



Problem B (Small and Large)


Problem Statement

Well, the small version can be solved using a Breadth-First Search. Just consider a string of +/- as a vertex in some graph and each possible step is an edge between its initial setup and its resulting setup. Then you want to find the shortest path between the input and +++++...+ .  With `N <= 10`, there are at most `2^10 = 1024` vertices in the graph, so this is quite doable.


But I wanted to solve the hard version. Clueless as I often are, I just did the small version hoping that it would help me think of the hard solution.  I concluded some greedy approach was needed because all permutations of cookies are reachable from any initial setup so cutting the search space wouldn't work (for me).

After coding the small version and being able to see some of the solutions it generates, I noticed a very obvious pattern:



+++-++---+--+
----++---+--+
++++++---+--+
---------+--+
++++++++++--+ 
------------+ 
+++++++++++++  



Basically keep flipping the largest sequence of cookies at the top that have the same sign (+ or -) as the top-most cookie. You will eventually have a bunch of happy cookies. And this is the optimal solution. Because this is the optimal strategy in a different problem where a step consists only about changing the sign of a prefix of a string, and the other options in the real problem are not better.




    int solve(string s)
    {
        int n = s.size();
        int step_count = 0;
        while (s != string(n, '+') ) {
            int t = 1;
            while ( (t < n) && (s[t] == s[t-1]) ) {
                t++;
            }
            for (int i = 0; i < t; i++) {
                if (s[i] == '+') {
                    s[i] = '-';
                } else {
                    s[i] = '+';
                }
            }
            step_count++;
        }
        return step_count;
       
       
    }
   
   
    /* _Used this to think of the real strategy which was pretty obvious in
        hindsight ... */
    int solve_slow(string s)
    {
        int n = s.size();
        map<string, int> dist;
        queue<string> Q;
        dist[s] = 0;
        Q.push(s);
        map<string, string> parent;
        while (! Q.empty()) {
            string s = Q.front();
            Q.pop();
            for (int t = 1; t <= n; t++) {
                string ss = s;
                reverse(ss.begin(), ss.begin() + t);
                for (int i = 1; i <= t; i++) {
                    char & ch = ss[i - 1];
                    ch = ( (ch == '+') ? '-' : '+' );
                }
                if ( dist.count(ss) == 0) {
                    dist[ss] = dist[s] + 1;
                    Q.push(ss);
                    parent[ss] = s;
                }
            }
        }
       
        string w = string(n, '+');
        assert( dist.count(w) == 1 );
       
        {
            stack<string> S;
            string x = w;
            while (true) {
                S.push(x);
                if (parent.count(x) == 1) {
                    x = parent[x];
                } else {
                    break;
                }
            }
            while (! S.empty()) {
                cout << S.top() << endl;
                S.pop();
            }
        }
       
        return dist[w];
    }



Problem C (Small and Large)


Problem Statement

My initial hunch was that a huge majority of the possible binary strings are valid coins. Because really, primes tend to be hard to find and we want strings that aren't primes. Even though they can't be primes in many bases. But I still thought it would take a while to find 500 coins. So I decided to do something with threads. I find threads easier in python so I used python. When I finished coding it and tested, it turned out that the code was very fast, so the threads weren't needed :/, but it's still a cool looking solution:



N = 32
J = 500

NUM_THREADS = 4

EASY_PRIMES = [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71]

import random
import threading
import sys

mutex = threading.Lock()

seen_coins = dict()

def add_if_new(coin, divisors):
    with mutex:
        if coin not in seen_coins and len(seen_coins) < J:
            seen_coins[coin] = divisors
            sys.stderr.write( str(len(seen_coins)) + ' already!\n' )
            sys.stderr.write( str(attempt_count[0]) + ' attempts\n' )
            sys.stderr.flush()
           
           
def work_done():
    with mutex:
        if len(seen_coins) >= J:
            return True
        else:
            return False

def random_coin():
    x = [ random.randint(0,1) for i in range(N) ]
    x[0] = 1
    x[-1] = 1
    return tuple(x)


def is_divisor(coin, b, d):
    m = 0
    p = 1
    for x in reversed(coin):
        if x == 1:
            m = (m + p) % d
        p = (p * b) % d
    if m == 0:
        return True
    else:
        return False

def get_divisors(coin):
    divisors = []
    for b in xrange(2,10 + 1):
        for d in EASY_PRIMES:
            if is_divisor(coin, b, d):
                 divisors.append(d)
                 break
        else:
            return None
    return divisors

attempt_count = [0]

def worker():
    while not work_done():
        coin = random_coin()
        divisors = get_divisors(coin)
        if divisors != None:
            add_if_new(coin, divisors)
        with mutex:
            attempt_count[0] = attempt_count[0] + 1
            #sys.stderr.write( str(attempt_count[0]) + ' attempts\n' )
   
threads = [ threading.Thread(name='worker%d'%i, target=worker) for i in range(NUM_THREADS) ]

for t in threads:
    t.start()

for t in threads:
    t.join()

sys.stdout.write( 'Case #1:\n')
for x in seen_coins:
    for y in x:
        sys.stdout.write( str(y) )
    for div in seen_coins[x]:
        sys.stdout.write( ' ' + str(div) )
    sys.stdout.write( '\n')
    sys.stdout.flush()
    



I think one thing that improves the execution is I only test if the coin is a multiple of a small list of primes. Instead of looking for coins that are any composite number , I add the condition that they are a multiple of small primes. Most composite numbers are multiples of small prime numbers, so this shouldn't be an issue in theory, and in practice, it isn't.



Problem D (Small and Large)


Problem Statement

First the small problem is super trivial. Because `S = K`, so you can just test positions 1,2,3,...`S` and it will be enough. Because if there is any G in the initial string, then the final string will definitely have a G in that same position.


The larger solution is more complicated but not that much. Even though it took me a couple of hours to think of it .

If you want to be able to find any case that contains a G, then we should focus on the hardest cases, the ones that have only one G in the input: GLLL, LGLL, LLGL, LLLG. It turns out that if you include a G in position 1, it spreads to many positions in the later iterations. So imagine we had four positions: ABCD in the first string.  We want to find which positions of the second string would be effected by positions A, B, C and D (E.g: If there is a G in position A, then the second position of the second string will contain a G. If there is a G in position B, then the second position of the second string will also contain a G, so position 2 of the second string can be classified as AB. If you repeat this you will find: A, AB, AC, AD, AB, B, BC, BD, AC, BC, C, CD, AD, BD, CD, D. Perhaps it will be easier if you read it as: AA, AB, AC, AD, AB, BB, BC, BD, AC, BC, CC, CD, AD, BD, CD, DD. In fact, with 3 steps we have: AAA, AAB, AAC, AAD, ... DDD. And so and so, it's the same as thinking of numbers in base K. So for C = 3, if we want positions that cover as many initial Gs as possible then we want the positions that have ABC and DAA (or DAB or ADC, anything containing a D). You can get the position index by converting from base k to base 10. This explanation is probably very bad. Sorry, I can't do better without drawings.




    long translate(vector<int> digits, int K)
    {
        long res = 0;
        long p = 1;
        for (int i = 0; i < digits.size(); i++) {
            res += p * digits[i];
            p *= K;
        }
        /*for(auto x: digits) cout << " " << x;
        cout << " -> "<<res<<endl;*/
        assert(res >= 0);
        return res;
    }
    list<long> solve(int K, int C, int S)
    {
        list<long> res;
        int p = 0;
        while (p < K) {
            vector<int> digits;
            for (int i = 0; i < C; i++) {
                if (p < K) {
                    digits.push_back(p);
                    p++;
                } else {
                    digits.push_back(0);
                }
               
            }
            res.push_back( 1 + translate(digits, K) );
        }
       
        if (res.size() > S) {
            //assert(false);
            return {};
        } else {
            return res;
        }
    }




Final results


As I was typing, the round ended, let's check the final results...

All my large solutions were okay. But tons of people had them okay as well. I am barely in position 1246. That's pretty low, but I did start the round 15 hours late... (excuses)