Saturday, April 09, 2016

Google Code Jam 2016

I completely forgot about the qualification round. This morning I woke up and was planning to have a normal day when I accidentally noticed in my calendar applet that "Qualification Round" was on. What qualification round!? Oh no, the google one...



So I decide to register to the contest and it's 15 hours late. What are even the rules? Is it the same as usual? What's the cut-off? It turns out everything is mostly the same. I need 30 points and just the small inputs are enough to give you 37 points. So it should be easy to qualify.

Perhaps it's too similar to previous editions? I think 1000 t-shirts is pretty low, considering how much the attendance has grown since this number was established. Actually when I look at the past I think I would have never really put as much work on programming contest if, ages ago, when I was getting started, there weren't as many small rewards for simple things as they were. In the past, in codejam or topcoder open, qualifying to the elimination rounds was worth a t-shirt and there were extra little rewards for advancing in other rounds. In fact, the first codejam done using Google's own site had regional semifinals so if you were in the top 500 you'd already win a paid trip to a google office.

Back to the present. Qualifying, even 15 hours late should be pretty easy for a decrepitveteran contest participant such as I. But the real contest is to get a perfect 100/100. So what are my chances there? Looking at the score board it seemed like this qualification round had easier problems than previous editions, as a really considerable amount of competitors already had the perfect score. So I gave it a shot. Note that I am starting to write this before the end of the round, so I don't know if I scored properly.


Problem A (Large)


Problem Statement

This is a straight-forward problem, you just need to simulate the steps until all digits are found. A pressing question is what's up with impossible cases? Well, I actually just coded right away and tested all 1000001 possible inputs from 0 to 1000000, with a maximum number of 1000 steps to find all digits. Even then, the simulation is very fast and the only case that doesn't reach all digits is `N = 0`, which will always be 0 no matter how far you go. So with that knowledge I just submitted an easy code:



    const int MAX_ITERATIONS = 1000;
   
    long solve(long x)
    {
        long y = x;
        set<char> s;
        for (int i = 0; i < MAX_ITERATIONS; i++) {
            ostringstream st;
            st << y;
            for (char ch: st.str()) {
                s.insert(ch);
            }
            if (s.size() == 10) {
                return y;
            }
           
           
            y += x;
        }
        return -1;
        // not pictured: I/O stuff including translating -1 to "impossible"
    }



Problem B (Small and Large)


Problem Statement

Well, the small version can be solved using a Breadth-First Search. Just consider a string of +/- as a vertex in some graph and each possible step is an edge between its initial setup and its resulting setup. Then you want to find the shortest path between the input and +++++...+ .  With `N <= 10`, there are at most `2^10 = 1024` vertices in the graph, so this is quite doable.


But I wanted to solve the hard version. Clueless as I often are, I just did the small version hoping that it would help me think of the hard solution.  I concluded some greedy approach was needed because all permutations of cookies are reachable from any initial setup so cutting the search space wouldn't work (for me).

After coding the small version and being able to see some of the solutions it generates, I noticed a very obvious pattern:



+++-++---+--+
----++---+--+
++++++---+--+
---------+--+
++++++++++--+ 
------------+ 
+++++++++++++  



Basically keep flipping the largest sequence of cookies at the top that have the same sign (+ or -) as the top-most cookie. You will eventually have a bunch of happy cookies. And this is the optimal solution. Because this is the optimal strategy in a different problem where a step consists only about changing the sign of a prefix of a string, and the other options in the real problem are not better.




    int solve(string s)
    {
        int n = s.size();
        int step_count = 0;
        while (s != string(n, '+') ) {
            int t = 1;
            while ( (t < n) && (s[t] == s[t-1]) ) {
                t++;
            }
            for (int i = 0; i < t; i++) {
                if (s[i] == '+') {
                    s[i] = '-';
                } else {
                    s[i] = '+';
                }
            }
            step_count++;
        }
        return step_count;
       
       
    }
   
   
    /* _Used this to think of the real strategy which was pretty obvious in
        hindsight ... */
    int solve_slow(string s)
    {
        int n = s.size();
        map<string, int> dist;
        queue<string> Q;
        dist[s] = 0;
        Q.push(s);
        map<string, string> parent;
        while (! Q.empty()) {
            string s = Q.front();
            Q.pop();
            for (int t = 1; t <= n; t++) {
                string ss = s;
                reverse(ss.begin(), ss.begin() + t);
                for (int i = 1; i <= t; i++) {
                    char & ch = ss[i - 1];
                    ch = ( (ch == '+') ? '-' : '+' );
                }
                if ( dist.count(ss) == 0) {
                    dist[ss] = dist[s] + 1;
                    Q.push(ss);
                    parent[ss] = s;
                }
            }
        }
       
        string w = string(n, '+');
        assert( dist.count(w) == 1 );
       
        {
            stack<string> S;
            string x = w;
            while (true) {
                S.push(x);
                if (parent.count(x) == 1) {
                    x = parent[x];
                } else {
                    break;
                }
            }
            while (! S.empty()) {
                cout << S.top() << endl;
                S.pop();
            }
        }
       
        return dist[w];
    }



Problem C (Small and Large)


Problem Statement

My initial hunch was that a huge majority of the possible binary strings are valid coins. Because really, primes tend to be hard to find and we want strings that aren't primes. Even though they can't be primes in many bases. But I still thought it would take a while to find 500 coins. So I decided to do something with threads. I find threads easier in python so I used python. When I finished coding it and tested, it turned out that the code was very fast, so the threads weren't needed :/, but it's still a cool looking solution:



N = 32
J = 500

NUM_THREADS = 4

EASY_PRIMES = [2,3,5,7,11,13,17,19,23,29,31,37,41,43,47,53,59,61,67,71]

import random
import threading
import sys

mutex = threading.Lock()

seen_coins = dict()

def add_if_new(coin, divisors):
    with mutex:
        if coin not in seen_coins and len(seen_coins) < J:
            seen_coins[coin] = divisors
            sys.stderr.write( str(len(seen_coins)) + ' already!\n' )
            sys.stderr.write( str(attempt_count[0]) + ' attempts\n' )
            sys.stderr.flush()
           
           
def work_done():
    with mutex:
        if len(seen_coins) >= J:
            return True
        else:
            return False

def random_coin():
    x = [ random.randint(0,1) for i in range(N) ]
    x[0] = 1
    x[-1] = 1
    return tuple(x)


def is_divisor(coin, b, d):
    m = 0
    p = 1
    for x in reversed(coin):
        if x == 1:
            m = (m + p) % d
        p = (p * b) % d
    if m == 0:
        return True
    else:
        return False

def get_divisors(coin):
    divisors = []
    for b in xrange(2,10 + 1):
        for d in EASY_PRIMES:
            if is_divisor(coin, b, d):
                 divisors.append(d)
                 break
        else:
            return None
    return divisors

attempt_count = [0]

def worker():
    while not work_done():
        coin = random_coin()
        divisors = get_divisors(coin)
        if divisors != None:
            add_if_new(coin, divisors)
        with mutex:
            attempt_count[0] = attempt_count[0] + 1
            #sys.stderr.write( str(attempt_count[0]) + ' attempts\n' )
   
threads = [ threading.Thread(name='worker%d'%i, target=worker) for i in range(NUM_THREADS) ]

for t in threads:
    t.start()

for t in threads:
    t.join()

sys.stdout.write( 'Case #1:\n')
for x in seen_coins:
    for y in x:
        sys.stdout.write( str(y) )
    for div in seen_coins[x]:
        sys.stdout.write( ' ' + str(div) )
    sys.stdout.write( '\n')
    sys.stdout.flush()
    



I think one thing that improves the execution is I only test if the coin is a multiple of a small list of primes. Instead of looking for coins that are any composite number , I add the condition that they are a multiple of small primes. Most composite numbers are multiples of small prime numbers, so this shouldn't be an issue in theory, and in practice, it isn't.



Problem D (Small and Large)


Problem Statement

First the small problem is super trivial. Because `S = K`, so you can just test positions 1,2,3,...`S` and it will be enough. Because if there is any G in the initial string, then the final string will definitely have a G in that same position.


The larger solution is more complicated but not that much. Even though it took me a couple of hours to think of it .

If you want to be able to find any case that contains a G, then we should focus on the hardest cases, the ones that have only one G in the input: GLLL, LGLL, LLGL, LLLG. It turns out that if you include a G in position 1, it spreads to many positions in the later iterations. So imagine we had four positions: ABCD in the first string.  We want to find which positions of the second string would be effected by positions A, B, C and D (E.g: If there is a G in position A, then the second position of the second string will contain a G. If there is a G in position B, then the second position of the second string will also contain a G, so position 2 of the second string can be classified as AB. If you repeat this you will find: A, AB, AC, AD, AB, B, BC, BD, AC, BC, C, CD, AD, BD, CD, D. Perhaps it will be easier if you read it as: AA, AB, AC, AD, AB, BB, BC, BD, AC, BC, CC, CD, AD, BD, CD, DD. In fact, with 3 steps we have: AAA, AAB, AAC, AAD, ... DDD. And so and so, it's the same as thinking of numbers in base K. So for C = 3, if we want positions that cover as many initial Gs as possible then we want the positions that have ABC and DAA (or DAB or ADC, anything containing a D). You can get the position index by converting from base k to base 10. This explanation is probably very bad. Sorry, I can't do better without drawings.




    long translate(vector<int> digits, int K)
    {
        long res = 0;
        long p = 1;
        for (int i = 0; i < digits.size(); i++) {
            res += p * digits[i];
            p *= K;
        }
        /*for(auto x: digits) cout << " " << x;
        cout << " -> "<<res<<endl;*/
        assert(res >= 0);
        return res;
    }
    list<long> solve(int K, int C, int S)
    {
        list<long> res;
        int p = 0;
        while (p < K) {
            vector<int> digits;
            for (int i = 0; i < C; i++) {
                if (p < K) {
                    digits.push_back(p);
                    p++;
                } else {
                    digits.push_back(0);
                }
               
            }
            res.push_back( 1 + translate(digits, K) );
        }
       
        if (res.size() > S) {
            //assert(false);
            return {};
        } else {
            return res;
        }
    }




Final results


As I was typing, the round ended, let's check the final results...

All my large solutions were okay. But tons of people had them okay as well. I am barely in position 1246. That's pretty low, but I did start the round 15 hours late... (excuses)





Thursday, September 10, 2015

One handed programming and other things you didn't expect to read about in this blog

Hello all,

It's really been a while since I updated this blog. I've been dealing with difficulties balancing the work demands of writing editorials in topcoder and etc.

Tomorrow we have a new SRM, 667. It's going to be a very special SRM for me although not for happy reasons. You see; last September 1st a car hit me and I got a shoulder fracture. It was described as a lucky outcome and it seems like conservative therapy that consists of immobilization and waiting for the bone to heal on its own is the best approach. As a result, tomorrow I will only use one hand, my left hand (I am right-handed) in tomorrow's SRM. Let's see what happens.

Of course my real concern is about how it will affect my performance writing the editorial. As of late and after finally dealing with the old delayed problems that stacked up since last year. I've been really trying to get the editorials back to their former quick publishing glory. It kinda worked with 666's editorial. But of course the issue at hand might slow me down. I am optimistic, however. I was able to solve and write the editorial for the TopCoder Open round 2D after the accident. I just needed longer breaks after each problem.

In happier news: I have recently started a Patreon . It's focusing on my computer art and twitter anti spam work. I wonder if I can expand it for explanations of algorithmic problems? Maybe.

Saturday, May 30, 2015

Out of codejam : The curse of the dp sense

That was it for me in the code jam. I really think the whole outcome centers around how badly I've dealt with problem D. The 11 extra points from D small would have made me advance and get a t-shirt. Honestly what I really don't like is that you needed to be in top 500 for round 2 for the distributed codejam, which is a completely different contest format. I don't see the point for this restriction.

Bad omen

The first demoralization of the day was learning that the codejam-commandline tool no longer works. Google finally deprecated the authentification used by this tool, which was not updated in a looooong while. I really wonder if anyone other than me actually used this tool...

Problem statements

Problem A: pegs

The trick is to notice that peg man can pick any cell. So if there is any cell that takes you out of the board, pegman will pick that cell. Why is this important? Consider the arrows that point directly to the outside of the map. You NEED to change them. Because pegman can always start at that arrow and exit the map. So it is necessary that these arrows point elsewhere.

Once all arrows that point to the outside of the map point somewhere else. They will each point to another arrow. And this is sufficient. Because no arrow in the map points to the outside of the map. So it really doesn't matter if we point to an arrow, this other arrow will never take us to theo utside of the map.

So the solution (for both A-small and A-large) is to just take each arrow that points to the outside of the map. If you can pick any new direction to it that makes it point to another arrow, increase cost by 1. Else it is impossible.

Problem B: pool

I read the first paragraphs and seemed a bit too mathy, I felt that it was likely the other small problems were more approachable so I decided to skip it to read the other problems.

Problem C-small: sentences

There are at most 18 sentences with unknown language, we can try each of the `2^18` combinations of English/French assignments and just simulate it. Or so you'd think...

A small blunder: I didn't notice that the number of words was a bit large so just naively doing the simulation was a bad idea. Even with my trick to use 4 cores so that each core runs each case separately, my first solution still needed around 10 minutes to clear a A-small input.

You don't need to parse and compare the words every time, you only care about which words are common to the two languages, so for example just replace each word with an integer, you need at most 2180 integers (although a case with 2180 distinct words is trivially 0). An extra improvement is to use bitsets from STL or something similar. Basically, you represent each sentence by 2180 / 64.0 bit masks, if the i-th bit is 1 then the i-th word is included in the sentence. Doing union / intersection operations with bit operations so that you can do the stuff rather fast (2180 / 64.0) steps.

Here is the code for the sub-problem, once you assign English or French to each sentence:

int N;
vector<string> sentences;
vector<bitset<2180>> word_masks;

int sub_problem(vector<bool> &eng)
{
    bitset<2180> english_words, french_words, inter;
    for (int i = 0; i < N; i++) {
        if (eng[i]) {
            english_words |= word_masks[i];
        } else {
            french_words |= word_masks[i];
        }
    }
    inter = english_words & french_words;
    return inter.count();
}

You'd need to initialize the bitsets before simulating all assignments:

   int word_ids = 0;
   map<string, int> word_id;
   
    
   word_masks.resize(N);
   for (int i = 0; i < N; i++) {
       word_masks[i].reset();
   }
   for (int i = 0; i < N; i++) {
       istringstream st(sentences[i]);
       string x;
       while (st >> x) {
           int j = 0;
           if (word_id.count(x)) {
               j = word_id[x];
           } else {
               j = word_ids++;
               word_id[x] = j;
           }
           word_masks[i][j] = true;
       }
   }

The rest is the usual bruteforce for the assignments.

When I got correct in this problem I felt confident. There was quite a lot of time for the match but I was 400-th -ish. I thought that just solving D-small would put me in top 500. I was right...

Problem D-small

For some reason I instantly thought dynamic programming was the way. You can certainly fill the cells using dynamic programming. Remembering the previous row and some other stuff. Well, seems hard at first except when you notice:

  • Numbers larger than 4 are clearly never usable. A cell can't have 5 neighbors.
  • Turns out 4 is also never usable. You cannot have a 4 in the top or bottom rows. If you place a 4 elsewhere, all 4 cells adjacent to that cell must be 4 too. If you repeat this logic, you'll eventually need the top or bottom row to contain a 4, which would be wrong.
  • So you only need numbers 1-3 for the cells.

There is more, much more. But I really wish these 3 facts where the ONLY ones I noticed, because then I would have been discouraged to think of the bad dynamic programming solution I thought of...

  • If you place a 3 in a cell in the top row, then all cells in the row must be 3, which also means that the cells in the top-second row must be 3. (Same happens with the last two bottom rows). In fact, after this filling the rest is similar to filling the same problem but with R-2 rows. With the exception that you cannot place 3 in the third row anymore. So yeah this hints a dynamic programming solution.
  • The only time you are allowed to place a 3 elsewhere is when all the cells in the row above the row are matched to cells above the row. So really, whenever a 3 appears, it means that you need to fill 2 consecutive rows with 3s and the rows surrounding them cannot have any 3.
  • So usually you only need to decide between 1 and 2.

This led me to a dynamic programming solution and rushed to start to code. It was a mistake. For starters just dealing with bitmasks (and you needed three sets of bit masks in this solution) introduces so much debugging time to coding something. What's worse is that I completely underestimated the difficulty in dealing with the rotations. My initial strategy was to just code without caring about duplicate rotations and then just do something like dividing the result by C or something? I didn't really have enough hindsight for what to do there. Once I noticed how non-trivial the rotations were in this dynamic programming solution it was too late. I put some (wrong) patch work to make the rotations work. It passed examples, but not the small tests.

That was the blunder: A much finer strategy was to try and solve the problem using bruteforce. In hindsight it should have been obvious to me, so this hurts. Here's the logic you (me) need to remember in the future to know not to miss the chance of using bruteforce in the key-to-solve bruteforce problem in your next significant match:

  • The requirements for the values of the cells are VERY restrictive. Just a single decision removes plenty of later decisions. It's very likely that `6 times 6` grids and smaller the results should be very small. There would be very few valid ways. A big hint is that the results in the examples are both very small : 1 and 2. And the examples are very weak, so it shows the problem setter didn't want to make things too noticeable...
  • Even if after coding the brute force, it turns out that doing all the brute forces is too slow for 4 minutes. It's very unlikely it would be too slow for 20 minutes (again, the cases are quite small and the restrictions very restrictive). "But vexorian, why would I want to solve them in 20 minutes when there is a 4 minutes limit?" You ask. There are only 20, TWENTY different inputs for this problem (D-small), so you can just precalculate them all off-line and just submit the precalculated solutions after you donwload the input.
  • Even in the remote case somehow there are way too many valid assignments for some of the cases. This might be very helpful to learn more about the problem. Being able to visualize the solutions for `4 times 4` might help you notice some patterns. I am fairly sure that when you learn the patterns, you can solve D-large.
  • Fancy dynamic programming is way riskier than brute force. It's far more likely you'll have bugs in the mega complicated bit mask stuff than your brute force would be too slow to be of ANY use. It is at least far easier to deal with rotations if you use the brute force.

So yeah, you (me) really should have used brute force in this problem. It is very important to detect wrong assignments as quickly as possible so there are few invalid branchings. This is what my code would have looked like if I did the right choice. It passes the D-small input and is incredibly fast:

    int R, C;
    vector<string> board;
    
    set<vector<string>> seenBefore;

    bool cellCheck(int x, int y)
    {
        // check if cell (x,y) matches requirements 
        int dx = 0, dy = 1;
        bool unknown = false;
        int c = 0;
        int t = 0;
        for (int i = 0; i < 4; i++) {
            int nx = x + dx, ny = (y + dy + C) % C;
            if (0 <= nx && nx < R) {
                t++;
                if (board[nx][ny] == '?') {
                    // an unknown cell, if there are still unknown cells then
                    // the number of adjacent equal cells doesn't HAVE to be equal
                    // but if it is larger it is still invalid.
                    unknown = true;
                } else if (board[nx][ny] == board[x][y]) {
                    c++;
                }
            }
            tie(dx,dy) = make_tuple(dy, -dx);
        }
        int m = board[x][y] - '0';
        if ( (c == m) || ( (c < m) && unknown) ) {
            return true;
        } else {
            return false;
        }
    }
    
    bool check_rotation()
    {
        // rotate the board C-1 times , if any of those was found before, return false
        vector<string> board2 = board;
        for (int k = 1; k < C; k++) {
            for (int i = 0; i < R; i++) {
                for (int j = 0; j < C; j++) {
                    board2[i][j] = board[i][ (j + k) % C ];
                }
            }
            if (seenBefore.count(board2)) {
                return false;
            }
        }
        // else add it to the seenBefore set
        seenBefore.insert(board);
        return true;
    }
    
    int count_valid = 0;
    void backtrack(int x, int y)
    {
        if (x == R) {
            if (cellCheck(R-1,C-1) && cellCheck(R-1,0)) {
                // valid
                if (check_rotation()) {
                    count_valid++;
                }
            }
        } else if (y == C) {
            // finished fillign the row
            // don't forget to verify also the beginning of the row
            if (cellCheck(x,C-1) && cellCheck(x,0)) {
                backtrack(x + 1, 0);
            }
        } else {
            for (char ch = '1'; ch <= '3'; ch++) {
                board[x][y] = ch;
                bool valid = true;
                if (x > 0) {
                    valid = valid && cellCheck(x-1,y);
                }
                if (y > 0) {
                    valid = valid && cellCheck(x,y-1);
                }
                if (valid) {
                    backtrack(x,y+1);
                }
                board[x][y] = '?';
            }
        }
    }
    
    int solve()
    {
        board.resize(R);
        for (string &x : board) {
            x = string(C,'?');
        }
        seenBefore.clear();
        count_valid = 0;
        backtrack(0,0);
        return count_valid;
    }