Thursday, September 10, 2015

One handed programming and other things you didn't expect to read about in this blog

Hello all,

It's really been a while since I updated this blog. I've been dealing with difficulties balancing the work demands of writing editorials in topcoder and etc.

Tomorrow we have a new SRM, 667. It's going to be a very special SRM for me although not for happy reasons. You see; last September 1st a car hit me and I got a shoulder fracture. It was described as a lucky outcome and it seems like conservative therapy that consists of immobilization and waiting for the bone to heal on its own is the best approach. As a result, tomorrow I will only use one hand, my left hand (I am right-handed) in tomorrow's SRM. Let's see what happens.

Of course my real concern is about how it will affect my performance writing the editorial. As of late and after finally dealing with the old delayed problems that stacked up since last year. I've been really trying to get the editorials back to their former quick publishing glory. It kinda worked with 666's editorial. But of course the issue at hand might slow me down. I am optimistic, however. I was able to solve and write the editorial for the TopCoder Open round 2D after the accident. I just needed longer breaks after each problem.

In happier news: I have recently started a Patreon . It's focusing on my computer art and twitter anti spam work. I wonder if I can expand it for explanations of algorithmic problems? Maybe.

Saturday, May 30, 2015

Out of codejam : The curse of the dp sense

That was it for me in the code jam. I really think the whole outcome centers around how badly I've dealt with problem D. The 11 extra points from D small would have made me advance and get a t-shirt. Honestly what I really don't like is that you needed to be in top 500 for round 2 for the distributed codejam, which is a completely different contest format. I don't see the point for this restriction.

Bad omen

The first demoralization of the day was learning that the codejam-commandline tool no longer works. Google finally deprecated the authentification used by this tool, which was not updated in a looooong while. I really wonder if anyone other than me actually used this tool...

Problem statements

Problem A: pegs

The trick is to notice that peg man can pick any cell. So if there is any cell that takes you out of the board, pegman will pick that cell. Why is this important? Consider the arrows that point directly to the outside of the map. You NEED to change them. Because pegman can always start at that arrow and exit the map. So it is necessary that these arrows point elsewhere.

Once all arrows that point to the outside of the map point somewhere else. They will each point to another arrow. And this is sufficient. Because no arrow in the map points to the outside of the map. So it really doesn't matter if we point to an arrow, this other arrow will never take us to theo utside of the map.

So the solution (for both A-small and A-large) is to just take each arrow that points to the outside of the map. If you can pick any new direction to it that makes it point to another arrow, increase cost by 1. Else it is impossible.

Problem B: pool

I read the first paragraphs and seemed a bit too mathy, I felt that it was likely the other small problems were more approachable so I decided to skip it to read the other problems.

Problem C-small: sentences

There are at most 18 sentences with unknown language, we can try each of the `2^18` combinations of English/French assignments and just simulate it. Or so you'd think...

A small blunder: I didn't notice that the number of words was a bit large so just naively doing the simulation was a bad idea. Even with my trick to use 4 cores so that each core runs each case separately, my first solution still needed around 10 minutes to clear a A-small input.

You don't need to parse and compare the words every time, you only care about which words are common to the two languages, so for example just replace each word with an integer, you need at most 2180 integers (although a case with 2180 distinct words is trivially 0). An extra improvement is to use bitsets from STL or something similar. Basically, you represent each sentence by 2180 / 64.0 bit masks, if the i-th bit is 1 then the i-th word is included in the sentence. Doing union / intersection operations with bit operations so that you can do the stuff rather fast (2180 / 64.0) steps.

Here is the code for the sub-problem, once you assign English or French to each sentence:

int N;
vector<string> sentences;
vector<bitset<2180>> word_masks;

int sub_problem(vector<bool> &eng)
{
    bitset<2180> english_words, french_words, inter;
    for (int i = 0; i < N; i++) {
        if (eng[i]) {
            english_words |= word_masks[i];
        } else {
            french_words |= word_masks[i];
        }
    }
    inter = english_words & french_words;
    return inter.count();
}

You'd need to initialize the bitsets before simulating all assignments:

   int word_ids = 0;
   map<string, int> word_id;
   
    
   word_masks.resize(N);
   for (int i = 0; i < N; i++) {
       word_masks[i].reset();
   }
   for (int i = 0; i < N; i++) {
       istringstream st(sentences[i]);
       string x;
       while (st >> x) {
           int j = 0;
           if (word_id.count(x)) {
               j = word_id[x];
           } else {
               j = word_ids++;
               word_id[x] = j;
           }
           word_masks[i][j] = true;
       }
   }

The rest is the usual bruteforce for the assignments.

When I got correct in this problem I felt confident. There was quite a lot of time for the match but I was 400-th -ish. I thought that just solving D-small would put me in top 500. I was right...

Problem D-small

For some reason I instantly thought dynamic programming was the way. You can certainly fill the cells using dynamic programming. Remembering the previous row and some other stuff. Well, seems hard at first except when you notice:

  • Numbers larger than 4 are clearly never usable. A cell can't have 5 neighbors.
  • Turns out 4 is also never usable. You cannot have a 4 in the top or bottom rows. If you place a 4 elsewhere, all 4 cells adjacent to that cell must be 4 too. If you repeat this logic, you'll eventually need the top or bottom row to contain a 4, which would be wrong.
  • So you only need numbers 1-3 for the cells.

There is more, much more. But I really wish these 3 facts where the ONLY ones I noticed, because then I would have been discouraged to think of the bad dynamic programming solution I thought of...

  • If you place a 3 in a cell in the top row, then all cells in the row must be 3, which also means that the cells in the top-second row must be 3. (Same happens with the last two bottom rows). In fact, after this filling the rest is similar to filling the same problem but with R-2 rows. With the exception that you cannot place 3 in the third row anymore. So yeah this hints a dynamic programming solution.
  • The only time you are allowed to place a 3 elsewhere is when all the cells in the row above the row are matched to cells above the row. So really, whenever a 3 appears, it means that you need to fill 2 consecutive rows with 3s and the rows surrounding them cannot have any 3.
  • So usually you only need to decide between 1 and 2.

This led me to a dynamic programming solution and rushed to start to code. It was a mistake. For starters just dealing with bitmasks (and you needed three sets of bit masks in this solution) introduces so much debugging time to coding something. What's worse is that I completely underestimated the difficulty in dealing with the rotations. My initial strategy was to just code without caring about duplicate rotations and then just do something like dividing the result by C or something? I didn't really have enough hindsight for what to do there. Once I noticed how non-trivial the rotations were in this dynamic programming solution it was too late. I put some (wrong) patch work to make the rotations work. It passed examples, but not the small tests.

That was the blunder: A much finer strategy was to try and solve the problem using bruteforce. In hindsight it should have been obvious to me, so this hurts. Here's the logic you (me) need to remember in the future to know not to miss the chance of using bruteforce in the key-to-solve bruteforce problem in your next significant match:

  • The requirements for the values of the cells are VERY restrictive. Just a single decision removes plenty of later decisions. It's very likely that `6 times 6` grids and smaller the results should be very small. There would be very few valid ways. A big hint is that the results in the examples are both very small : 1 and 2. And the examples are very weak, so it shows the problem setter didn't want to make things too noticeable...
  • Even if after coding the brute force, it turns out that doing all the brute forces is too slow for 4 minutes. It's very unlikely it would be too slow for 20 minutes (again, the cases are quite small and the restrictions very restrictive). "But vexorian, why would I want to solve them in 20 minutes when there is a 4 minutes limit?" You ask. There are only 20, TWENTY different inputs for this problem (D-small), so you can just precalculate them all off-line and just submit the precalculated solutions after you donwload the input.
  • Even in the remote case somehow there are way too many valid assignments for some of the cases. This might be very helpful to learn more about the problem. Being able to visualize the solutions for `4 times 4` might help you notice some patterns. I am fairly sure that when you learn the patterns, you can solve D-large.
  • Fancy dynamic programming is way riskier than brute force. It's far more likely you'll have bugs in the mega complicated bit mask stuff than your brute force would be too slow to be of ANY use. It is at least far easier to deal with rotations if you use the brute force.

So yeah, you (me) really should have used brute force in this problem. It is very important to detect wrong assignments as quickly as possible so there are few invalid branchings. This is what my code would have looked like if I did the right choice. It passes the D-small input and is incredibly fast:

    int R, C;
    vector<string> board;
    
    set<vector<string>> seenBefore;

    bool cellCheck(int x, int y)
    {
        // check if cell (x,y) matches requirements 
        int dx = 0, dy = 1;
        bool unknown = false;
        int c = 0;
        int t = 0;
        for (int i = 0; i < 4; i++) {
            int nx = x + dx, ny = (y + dy + C) % C;
            if (0 <= nx && nx < R) {
                t++;
                if (board[nx][ny] == '?') {
                    // an unknown cell, if there are still unknown cells then
                    // the number of adjacent equal cells doesn't HAVE to be equal
                    // but if it is larger it is still invalid.
                    unknown = true;
                } else if (board[nx][ny] == board[x][y]) {
                    c++;
                }
            }
            tie(dx,dy) = make_tuple(dy, -dx);
        }
        int m = board[x][y] - '0';
        if ( (c == m) || ( (c < m) && unknown) ) {
            return true;
        } else {
            return false;
        }
    }
    
    bool check_rotation()
    {
        // rotate the board C-1 times , if any of those was found before, return false
        vector<string> board2 = board;
        for (int k = 1; k < C; k++) {
            for (int i = 0; i < R; i++) {
                for (int j = 0; j < C; j++) {
                    board2[i][j] = board[i][ (j + k) % C ];
                }
            }
            if (seenBefore.count(board2)) {
                return false;
            }
        }
        // else add it to the seenBefore set
        seenBefore.insert(board);
        return true;
    }
    
    int count_valid = 0;
    void backtrack(int x, int y)
    {
        if (x == R) {
            if (cellCheck(R-1,C-1) && cellCheck(R-1,0)) {
                // valid
                if (check_rotation()) {
                    count_valid++;
                }
            }
        } else if (y == C) {
            // finished fillign the row
            // don't forget to verify also the beginning of the row
            if (cellCheck(x,C-1) && cellCheck(x,0)) {
                backtrack(x + 1, 0);
            }
        } else {
            for (char ch = '1'; ch <= '3'; ch++) {
                board[x][y] = ch;
                bool valid = true;
                if (x > 0) {
                    valid = valid && cellCheck(x-1,y);
                }
                if (y > 0) {
                    valid = valid && cellCheck(x,y-1);
                }
                if (valid) {
                    backtrack(x,y+1);
                }
                board[x][y] = '?';
            }
        }
    }
    
    int solve()
    {
        board.resize(R);
        for (string &x : board) {
            x = string(C,'?');
        }
        seenBefore.clear();
        count_valid = 0;
        backtrack(0,0);
        return count_valid;
    }

Saturday, May 16, 2015

Cloudy Conway

I like twitter bots. Did you know that there's a sizable amount of twitter bots generating pictures algorithmically?

There's @fractweet , for example, keeps posting random mandelbrot pictures. Some of them can be pretty nice.

Another of my favorites is @greatartbot, it plays an art game by itself and posts the results. This generates a ton of colorful random pixel art.

A good thing about these twitter bots is they just work without your supervision and without you even remembering to run them. They keep making random pictures and posting them in twitter. When they have a following, there's even humans that can curate the resulting pictures for you.

I wanted to make some random art bot of my own. The big problem with these things is, if the method to generate them is too random and not interesting, the results will be a mess. If it is not random enough , the results might be very pretty but usually the same. You need some good balance.

Something that seems to generate hard-to-predict randomness that looks nice is Conway's game of life. So I wondered how to use its randomness for my silly at bot objective. Okay, most of the times you put a bunch of random live cells in a Conway simulation the eventual result looks like this:

But the interesting part is all the movement that results in this rather stable state. There's a good example in this page: http://knusper.net/knusperblog/2013/conway-s-game-of-life-in-python-numpy-matplotlib/

So I made a program, basically besides of the state of the cells, it also remembers the number of iterations that passed since each cell died. So we know the last iteration number in which each cell alive. Live cells have 0 in this value. So imagine that the total number of iterations was T. And the number of iterations a cell has been dead is X, if we divide X by T, we will get a number from 0 to 1. Now we will assign a different shade of gray to each cell. If the cell's value was 0, it would be black, if the value was 1, it would be white, but if the value was 0.5, it would be neutral gray...

It seems like a bunch of clouds. Apparently not very impressive. But to me this was perfect because it generates strange shapes and shades. We just need to add color. To add color, we turn this gray shade from 0.0 to 1.0 into a color spectrum. For example, let's make 0.0 Red, 0.5 Green and 1.0 Blue. This way 0.25 is Yellow and 0.75 is... a bluer green, I guess. This spectrum trick is what the Mandelbrot fractal from fractweet up there uses. Mandelbrot fractals are basically a fraud: All the beauty comes not from math but from picking colors for the values...


And this is really all of it. Just making different random choices for the number of colors in the spectrum and the initial live cells. Maybe also experiment with different sizes of Conway grids. Some additional parameters too. The result is a program that can generate fairly interesting pictures... A twitter bot that goes by the name @CloudConway.

So my first objective of making a twitter bot seems accomplished. But I think there's more to this whole method of generating interesting shapes from Conway's game of life. In the meantime, I plan to release source code for this bot when the code stops being embarrassing...

Monday, April 27, 2015

SRM 657

I guess I just solved the div1 easy. Tried the div1 medium, modulos, modulos , modulos. But I finished the div1 easy editorial, you can find it at http://apps.topcoder.com/wiki/display/tc/SRM+657. Or here, because really, why not:

The problem consists of 6 variables: E, EM, M, MH, H, they are the number of problems you can use in problem sets. Each problem set needs an Easy, A Medium and a Hard problem to be used. E,M and H problems MUST be used as easy, medium or hard, respectively. There are EM problems that can be either easy or medium and MH problems that can be either Medium or Hard. What is the maximum number of problem sets you can make without repeating any problem? Oh, and each variable is a integer as long as `10^18`.

Two decisions are involved in this problem:

  • Of the EM problems that can be easy or medium, how many will be assigned as medium? Let that number be `a`
  • Of the MH problems that can be medium or hard, how many will be assigned as medium? Let that number be `b`

When `a` and `b` are known, we can calculate the number of problems of each difficulty we have available:

  • `E + ("EM" - a) ` easy problems.
  • `M + a + b` medium problems.
  • `H + ("MH" - b) ` hard problems.

To have a problem set we need one of each kind of problems, so the maximum problem sets is `min(E + "EM" - a, M + a + b, H + "MH" - b)`.

Imagine we wanted to see if it is possible to have `x` problems in total:

  • If `E + "EM" < x`, no matter what value of `a` we pick, the number of easy problems would be too small.
  • If `E + "EM" >= x`, then there are normally enough easy problems unless `a` becomes too large. We can find the maximum allowed value of `a`: `M_a = x - E + "EM"`. Also `a` cannot be larger than `"EM"`
  • Similarly, `H + "MH"` must be at least `x` and `M_b = x - H - "MH"` is the maximum allowed value for `b`.
  • We just need `M + a + b` to be at least `x`, imagine we take the maximum allowed values: if `M + M_a + M_b >= x` then we can know `x` is a valid number of problem sets

Now note that if a value of `x` is allowed, then so will all smaller values and if a value of `x` is not allowed , no value larger than `x` will be allowed. So we can use Binary Search to find the maximum allowed value for `x`, this is the maximum possible number of problem sets.

    
long allowedX(long E, long EM, long M, long MH, long H, long x)
{
    if (E + EM < x) {
        return false;
    }
    long aa = std::min(EM, E + EM - x); //maximum a
    if (H + MH < x) {
        return false;
    }
    long bb = std::min(MH, H + MH - x); //maximum b
    return (M + aa + bb >= x); 

}

long maxSets(long E, long EM, long M, long MH, long H)
{
    // We have to maximize:
    // min(E + EM - a, M + a + b, H + MH - b)
    // where 0 <= a <= EM , 0 <= b <= MH
    long lo = 0; // we know 0 is allowed
    long hi = M + MH + 1; // we know this will never be allowed
    while (lo + 1 < hi) {
        long x = (lo + hi) / 2;
        if (allowedX(E,EM,M,MH,H, x)) {
            lo = x;
        } else {
            hi = x;
        }
    }
    return lo;
}

That was the solution in the editorial, but I prefer my lambda solution:

long maxSets(long E, long EM, long M, long MH, long H)
{
    // We have to maximize:
    // min(E + EM - a, M + a + b, H + MH - b)
    // where 0 <= a <= EM , 0 <= b <= MH
    return BinarySearch<long>( 0LL, H + MH + 1,  [&](long x) {
            if (E + EM < x) {
                return false;
            }
            long aa = std::min(EM, E + EM - x); //maximum a
            if (H + MH < x) {
                return false;
            }
            long bb = std::min(MH, H + MH - x); //maximum b
            return (M + aa + bb >= x); 
    });
}

It uses this Binary Search library function because it is funny to use functions as arguments.

// Library code
// takes lo, hi, P() such that:
// P(lo) is true
// P(hi) is false
// If P(x) is true and (x > y) then P(y) is true
// If P(x) is false and (x < y) then P(y) is false
// returns the maximum x such that P(x) is true
template<class D> D BinarySearch(D lo, D hi, std::function<bool(D)> P) {
    while (lo + 1 < hi) {
        D ha = lo + (hi - lo) / 2;
        if (P(ha)) {
            lo = ha;
        } else {
            hi = ha;
        }
    }
    return lo;
}

Friday, April 17, 2015

Codejam round 1A, problem B : Hair Cut

Okay, I participated in round 1A and utterly failed. A ton of factors got involved that made me waste time.

  • I thought of the solution for problem A, but when I started to code it I figured out I didn't actually set up my codejam test environment. I had a ton of setup to do to make my (worthless) solution multithreading support to work AND for it to work in gcc 4.8.2 with c++11 support.
  • I got really confused by problem A, apparently eating 1.5 (or any other fraction) mushrooms per second is valid... but I don't think that was really well explained... Also what's up with "constant ratio" when the ratio is definitely not constant? It becomes 0 at some times? Huh? And how about giving you an input example before describing what the input is?
  • I tried to solve B, my first solution was too slow, then I tried to find cycles in the simulation and it worked but I kept failing the small solution. There was a small bug - After doing the cycle stuff I forgot to return 1-indexed results... Overall what happened in problem B is I completely missed there's a very simple (but with caveats) solution until the last 2 minutes of the match, it was too late.
  • C small is just bruteforce and convex hull. Thanks to allowing colinear points (artificial difficulty at its finest :/) I actually had to do a ton of fixes to my convex hull algorithm which I haven't really used in years. My code was pretty awful and there were a couple of bugs in that old code that I had to fix.

Anyway, since B-small / B-large are the only problems I solved that are actually interesting. Here is an explanation for problem B large:

The time

I think the most important thing is to notice the problem is much easier when we know the time `t ` at which customer `N` will be served. Imagine we knew this time in advance. Then we would only need a single loop to find all the barbers that finish serving a customer at time `t`. Each barber `i` is a cycle, every `M_i` the barber serves a new customer. So barbers with `M_i` such that: `(t mod M_i = 0)` will serve a new customer at time `t`.

You might be thinking that the solution is the barber that serves a new customer at time `t` and has the smallest index. This is not entirely true. Multiple customers may be served at time `t`, and `N` is not necessarily the first customer to be served at that time. How can we approach this?

Imagine a function `b(t)` that told you how many customers are served with starting time less than or equal than `t`. Then `b(t-1)` will be the amount of customers that were served right until time `t`. And `b(t) - b(t-1)` is the amount of customers that start getting served exactly at time `t`. Now `N - b(t-1) - 1` customers will be served at time `t` before `N`. Which means that we are looking for the barber with the (N - b(t-1))-th smallest index among those barbers with `(t % M_i = 0)`.

The last piece of the puzzle is how to find `t`. It all comes down to `b(t)`, with this function we can do a binary search for the smallest `t` such that `b(t) >= N`, this is the time you want.

// Library code
// takes lo, hi, P() such that:
// P(lo) is true
// P(hi) is false
// If P(x) is true and (x > y) then P(y) is true
// If P(x) is false and (x < y) then P(y) is false
// returns the minimum x such that P(x) is true
template<class D> D BinarySearch(D lo, D hi, std::function<bool(D)> P) {
    while (lo + 1 < hi) {
        D ha = lo + (hi - lo) / 2;
        if (P(ha)) {
            hi = ha;
        } else {
            lo = ha;
        }
    }
    return hi;
}

struct solver
{
    int N, B;
    vector<int> M;
    
    int solve()
    {
                
        auto begunByTime = [&](long t) -> long {
            if (t < 0) {
                return 0;
            }
            long begun = 0;
            for (int i = 0; i < B; i++) {
                // how many times did it start?
                begun += t / M[i] + 1;
            }
            return begun;
        };
        
        // Is the start time for at least N users <= t ?
        auto enoughBegun = [&](long t) -> bool {
            return (begunByTime(t) >= N);
        };
        
        long t = BinarySearch<long>(-1, 10000000000000LL, enoughBegun);
        long old = begunByTime(t-1);
        long rem = N - old;
        
        
        //cout << "time = "<<t<<endl;
        // find smallest index of a server that starts at time t:
        for (int i = 0; i < B; i++) {
            if (t % M[i] == 0) {
                if (rem == 1) {
                    return i + 1;
                } else {
                    rem--;
                }
            }
        }
        assert(false);
        return -1;
        
    }
    void init() { }
    void read() {
        cin >> B >> N;
        //N = 1000000000;
        //B = 1000;
        M.resize(B);
        for (int & x : M) {
            cin >> x;
            //x = 1 + rand() % 100000;
        }
        
    }

Edit: Fixed bugs, oh and also `b(t)` is just the sum of a bunch of divisions. For each barber, you can find the number of times the barber has started serving customers before or during `t` by just doing a division, because for every `M_i` seconds, the barber tries a new user.

Saturday, April 11, 2015

Google Codejam 2015 Qualification Round : Part 4: C small

Part 1: Problem D in Emily

Part 2: Problem A in unnamed language

Part 3: Problem B in Julia

C small

Problem statement

Unlike the other Small problems I didn't really think of a solution of this right away. I had to take a day to do it.

The trick to the problem is that we can find in `O(n)` the product of the `x` first left-most numbers (for all `x`). And the same for the right-most numbers. So imagine if we had a substring `[a,b]`. If left[a-1] is equal to `i` and right[b+1] is `k` then all that we need to learn is if the substring is equal to `j`. If we can answer that question in `O(n^2)`, we have the whole problem solved. And we totally can answer that question! We just need to start with an `a`, then `b = a` and calculate the product for `b = a`, then it is easy to calculate the product when `b = a + 1`, just multiply one more number. And so and so for `b = a + 2`, etc.

The only extra complication is doing the multiplication with `i`, `j`, `k` and the signs... Well, really there are 8 distinct numbers, and we can easily index them and make a table, multiplying this index with this other index yields this other index...

So I tried free pascal. I used to code in Pascal, and I hated it. Now I don't really remember much. So I guess it counts? I had to relearn Pascal a bit. But I reached a solution... which failed. And then it failed again and again... I really thought the problem was in that big hardcoded matrix, and spent hours trying to think of a mistake there. Turns out that free Pascal's Strings are limited to 255 characters (What the hell, free pascal?). You have to use AnsiString for limitless strings.

program C;

function solution(s : AnsiString): String;
const
   CONST_I = 2;
   CONST_J = 4;
   CONST_K = 6;
var
   i, j, n, left, curr : Integer;
   right : Array[0..100000] of Integer;
   t : Array[0..7,0..7] of Integer = (
       (0,1,2,3,4,5,6,7),
       (1,0,3,2,5,4,7,6),
       (2,3,1,0,6,7,5,4),
       (3,2,0,1,7,6,4,5),
       (4,5,7,6,1,0,2,3),
       (5,4,6,7,0,1,3,2),
       (6,7,4,5,3,2,1,0),
       (7,6,5,4,2,3,0,1)
   );
   inputInts: Array[0..99999] of Integer;
   
    function ch2index(ch: Char): Integer;
    begin
        ch2index := CONST_K;
        if ch = 'i' then ch2index := CONST_I;
        if ch = 'j' then ch2index := CONST_J;        
    end;

begin
    
    n := Length(s);
    solution := 'NO';
    for i:=1 to n do begin
        inputInts[i - 1] := ch2index(s[i]);
    end;
    right[n] := 0;
    for i := n - 1 downto 0 do begin
        right[i] := t[inputInts[i]][right[i+1]];
    end;
    left := 0;
    for i := 0 to n-1 do begin
        curr := 0;
        for j := i to n-1 do begin
            curr := t[curr][inputInts[j]];
            if (left = CONST_I) and (curr = CONST_J) and (right[j+1] = CONST_K) then begin
                solution := 'YES';
            end;
        end;
        left := t[left][inputInts[i]];
    end;
end;


var
   X, L : Integer;
   w : AnsiString;
   ww : AnsiString;
   T : Integer;
   i,j : Integer;
begin
    readln(T);
    for i := 1 to T do begin
        Readln(L,X);
        Readln(w);
        ww := '';
        for j := 1 to X do begin
            ww := ww + w;
        end;
        //writeln(ww);
        writeln('Case #',i,': ', solution(ww) );
    end
end.

The end

I qualified! This was a lot more fun and far more stressful than usual.

Google Codejam 2015 Qualification Round : Part 3: B small in Julia

(Part 1: Problem D) (Part 2: Problem A)

After the disaster solving A-large, it became clear that I needed to do more than 2 problems to get an advancing score. But also both B and C seemed likely to be impossible in the languages I wanted. But I still didn't want to use my normal languages. I mean at least B-small seemed quite obvious. A typical problem we can solve with memoization. If we set the state to be equal to an array of 9 elements containing the number of eaters that have 1, 2, ... 9 pancakes. We need to notice that it never makes sense to create a plate that has more pancakes than the original plate from which we took the pancakes. Otherwise each minute just shifts the array...

Anyway, I decided to look for a new language to learn. But something a bit more usable than the other two. I tried Julia. It actually felt like cheating, because it turns out this is a pretty good language. There's a couple of things I don't like, like the 1-indexing. But it was overall a nice experience. Similar to python but with some differences. And apparently benchmarks say it is a bit faster. You can also make it less dynamically typed if needed. Anyway:

function solve(P)
    mem = Dict()

    function f(seq)
        if haskey(mem, seq)
            return mem[seq]
        end
        #println(seq)
        res = 1000000000
        if count(x -> (x != 0), seq) == 0
            # all zeros, base case
            res = 0
        else
            # let one minute go:
            a = [ seq[i] for i = [2:9] ]
            push!(a, 0)
            res = 1 + f(a)
            
            # move something:
            for i = [2:9]
                if seq[i] > 0
                    for j = [1:i-1]
                        #if j != div(i , 2)
                        #    continue
                        #end
                        for k = [1 : i - j - 1]
                            if seq[k] > 0
                                b = [x for x = seq]
                                b[k] -= 1
                                b[k + j] += 1
                                b[i] -= 1
                                b[i - j] += 1
                                res = min(res, 1 + f(b) )
                            end
                        end
                        # move to empty plate
                        if i - j < i
                            b = [x for x = seq]
                            b[j] += 1
                            b[i - j] += 1
                            b[i] -= 1
                            res = min(res, 1 + f(b) )
                        end
                    end
                end
            end
        end
        mem[seq] = res
        #print (seq)
        #print (" = ")
        #println(res)
        return res
    end
    
    seq = [ count(x -> x==i, P) for i = [1:9] ]
    
    return f(seq)
end

s = int(readline(STDIN))
for i = [1: s]
    n = int(readline(STDIN))
    q = split(readline(STDIN), " " )
    P = map( x -> int(x) , q)
    @printf "Case #%d: %d\n" i solve(P)
end

Coding this was easy except for googling how to do things. The logic I learned when learning python really helped.

Google Codejam 2015 Qualification Round : Part 2: A small/large

[Part 1]

My compiler

The other language I initially intended to use in codejam was an old compiler thing I coded in 2013. It was a school assignment that took me a couple of days to implement. It's just a C clone that doesn't even have arrays. But hey, at least it has integers. And can actually read. Although it can only read one integer, and write one integer. It doesn't have division either. But it compiles to 64 bits asm code which can be compiled to binary, so speed isn't a big issue.

Problem A - The one with shyness - Standing Ovation

I had to pick a good problem for this. But all seemed unsuitable. My first idea for problem A - Standing OvationWas to just bruteforce for the number of additional members. There's no reason to bring audience members who are shy, so their shyness should be 0. The maximum result is equal to Smax. For each candidate number of additional members, we can just simulate the process and check if everyone claps. So we can do this in an `O(S_(max) N)` loop quite easily.

For the large version, and also to solve using a language that has no arrays. We can use an on-line algorithm. We initially have `x` audience members with 0 shyness. We know those will be clapping. So we should worry about the rest. We encounter that there are `s_1` audience members with shyness 1. If `1 > x`, we need to invite one more audience member: `1 - x`. In fact, we can repeat this, now we know there are `max(1,x) + s_1` audience members that are clapping. We find that there are `s_2` members with shyness 2. And so and so.

The problem was that I needed to have a way to turn the neat input file into something my language can read. Then turn the language's output into something that looks like "Case #1: xxx". DISASTER, I actually learned that my asm code only does syscalls for reading bytes and then converts to integer and is not quite the same as a scanf or cin>>x... In fact, the whole thing really fails at reading files. Even ignored the format issues. I had to learn how to automatize having a terminal send keystrokes to my program for it to work. Apparently this is usually a thing that people need as there is a family of programs called expect that do this. Fortunately, after learning how to use expect I found that python has a pexpect module. This allowed me to create this nifty glue python program:

import pexpect
import sys

# call the solver program, return the result
def call_solver(smax, shy):
    try:
        child = pexpect.spawn ('./A')
        child.sendline('%d' % smax)
        child.expect( '.*' )
        for ch in shy:
            child.sendline ('%s' % ch)
            child.expect( '.*' )
        child.expect(pexpect.EOF)
        return int(child.before.split('\r\n')[-2])
    except:
        print "ERROR"
        return -1


T = int(raw_input() )
for i in range(1, T+1):
    s = raw_input()
    (smax, shy) = s.split(' ')
    smax = int(smax)
    print "Case #%d: %d" % (i, call_solver(smax, shy) )  
    sys.stdout.flush()

So now I am free to make my program using my language...

program Codejam2015A
{
    int n, a, s, i, cost;

    read n;
    read s;
    
    i = 1;
    cost = 0;
    //initial clappers (shyness = 0)
    while (i <= n) {
        read a;
        if (a >= 1) {
            if (i > s) {
                // must invite i - s
                cost = cost + (i - s);
                s = i;
            }
        }
        s += a;
        i = i + 1;
    }
    print cost;
    
    
}

Anyway, you can find the complete source code in the codejam score boards...

I submitted A-small, had a failure because it was slower than I thought, I tried to check if there was something wrong. It seems expect is slow in sending things.

I should have known that the reason it was slow would only get worse in A-Large, in fact, when I tried A.large, it turned out that it could barely solve one case per minute, even tho the program was, the glue stuff using expect was very slow, specially when there are 100000 line enters to send to the fake terminal... Failed A-large because of a time out.

Google Codejam 2015 Qualification Round : Part 1: D small in Emily

It's another google codejam qualification round and honestly, these rounds are getting old. We are supposed to spend a whole day solving problems, between the too easy Small inputs and the impossible Large inputs. And what for? Once you get enough points for the cut-off it's not worth it. Unless you planned to be connected at 19:00 and try to solve the problems as fast as possible, then your rank might actually mean something...

One time I had position 29 at a qualification round. Impressive! Except it's all fake. I didn't even qualify to the third round that year.

I actually might have some rants to make about the formats used in our competitions. Did you know that google found winning programming competitions is correlated with bad work performance? It's SCIENCE. (Supposedly). Although us fans of these contests might like to think there is some sort of mistake in this result of Google's data mining. We cannot just really say it's not true, without doing any actual analysis of the data used and how the conclusion was made.

Anyway, so I want to qualify to round 1, but I was bored of qualifications rounds. So I decided to make them interesting by doing something different to my routine of using c++ and trying to get a perfect score. Instead, I wanted to make special language choices. Use only languages that I designed myself or that I didn't actually learn before the day of the round.

Emily

Once I decided to use special languages. I knew that one of the languages to use today was Emily. It's a relatively new languages, so that's a bonus. Also because o> f luck I've been seeing many previews of its development and finding them interesting. Honestly tho I kind of made this decision without thinking a lot...

The day of the round I noticed I should start learning this language. I figured out a couple of issues. Emily doesn't seem to have a way to read input. Fortunately, it does have the ability to write things. Unfortunately, only strings that you cannot process or floats. For some reason, Emily in its 0.1 version only has floating point numbers. That was going to be a bit of a problem.

My initial plan was to only use Emily and the language I will mention next; and hope to advance using only them. *After* getting an advancing score, I could actually try to use other languages just for fun. When I opened the problems, this seemed difficult. All problems seemed to need arrays, or actual string manipulation or 2D arrays, which are worse to do in Emily. Eventually I noticed Problem D.

Problem D: Ominus Omino - (The one with n-ominos)

statement

When I read this problem, it seemed that the small input can be solved with a bunch of conditionals. And the results are just one of two strings. Conditionals are good news because at least they can be implemented in Emily. And we can print fixed strings. I would only need to use my text editor's regexes to turn input files into code that calls my Emily solution.

To solve this problem (The small version) you can just try all variations of R and C, and just try by yourself, is it possible to pick an n-omino that stops the other player from winning?

This is how my notebook looked after this:

The rest was just implementing everything on Emily. It was HARD. So many conditions. The interesting thing about Emily is everything is an unary function. And I mean EVERYTHING. This includes objects and numbers. Unfortunately, I couldn't think of a good way to think advantage of this in this problem.

\version 0.1

# Hi, I am using a language called Emily, it is in version 0.1 :/
# https://bitbucket.org/runhello/emily/wiki/Home


ALWAYSAWAY = "GABRIEL"
CANMAKELOSE  = "RICHARD"

min = ^a ^b (
    (a < b) ? a : b
)
max = ^a ^b (
    (a > b) ? a : b
)

case4 = ^R ^C (
    ( (max R C) == 4 && ( (min R C) >= 3 ) ) ? ALWAYSAWAY : CANMAKELOSE
)

case3 = ^R ^C (
    a = max R C
    b = min R C
    
    maxIs4 = ( (b == 3) ? ALWAYSAWAY : CANMAKELOSE )
    maxIs3 = ( ((b == 3) || (b == 2) ) ? ALWAYSAWAY : CANMAKELOSE )
    
    (a == 4) ? maxIs4 : (  (a == 3) ? maxIs3 : CANMAKELOSE )
    
)

even = ^X (
    ( (X == 0) || (X == 2) || (X == 4) )
)

case2 = ^R ^C (
    ( (even R) || (even C) ) ? ALWAYSAWAY : CANMAKELOSE
)

# woops, here was my mistake
case1 = ^R ^C (
    ALWAYSAWAY
)


mode = ^X(
   ( X==1? case1: X==2? case2: X==3? case3: X==4? case4: null ) 
]

solution = ^X ^R ^C (
    ( mode(X) ) R C
)


# Examples
println: solution 2 2 2
println: solution 2 1 3
println: solution 4 4 1
println: solution 3 2 3

I couldn't help but get carried away with the functional programming here. Case4 is a function that solves the case for tetraominoes, Case3 for tetraominoes, etc. mode is a function that picks which function to use according to X. I wanted this to just be an array, but apparently that's currently a bit difficult to do.

TCO 2015 Round 1A

I guess I advanced? Unless there is an implementation bug in my algorithms, which is quite possible. I know they are theoretically correct.

While solving the first two problems I kept having the feeling I was doing things the overcomplicated way.

250 points: The one with digits

You are given `1 <= L < R <= 100000`, find a pair `L <= a < b <= R` such that `a` and `b` have as many digits in common (ignore repeated digits).

So for a 250 in an "easy" round, this is not like the div2 easies. Can't just try all pairs `(a,b)`. In fact, the solution I used is quite complex (not hard, complex) and sounds like a division 2 hard thing?

The numbers will have at most 5 digits. So each number `L <= i <= R` will have at most `2^5 = 32` subsets of digits. The idea is to try them all and check if any number in the past had that subset of digits. Hey, it works.

// extract digits from x, call f(d) for each digit d we find.
void forEachDigit(int x, std::function<void(int)> f)
{
    while (x > 0) {
        f(x % 10);
        x /= 10;
    }
}

int maxsim(int L, int R)
{
    vector<bool> exists(1024, false);
    int best = 0;
    for (int i = L; i <= R; i++) {
        // create a bit mask of the digits in i:
        int mask = 0;
        forEachDigit(i, [&](int d) {
                mask |= (1 << d);
        });
        // for each subset of i's digits:
        for (int smask = mask; smask != 0; smask = (smask - 1) & mask) {
            // did we see this subset before?
            if (exists[smask]) {
                // if so, consider for maximum
                best = std::max(best, __builtin_popcount(smask) );
            }
            exists[smask] = true;  //whoops
        }
    }
    return best;
}

I did a wrong thing in my first submission, instead of marking exists[smask] = true for each subset I was marking exists[mask] = true for just the set...

500: The one with simulation

There is a graph in which all vertices have exactly one out-edge. You initially place tokens, at most one per vertex. In each step of the game, the tokens move to the vertex pointed by their out-edge. If at any time there are two tokens in the same vertex, you lose. Count the number of ways to place tokens without losing. The steps are repeated `K`<= 100000000` times.

So once two tokens end up in the same vertex, they will never separate again. So for example we can try simulating with a graph that has a token each. At the end of the simulation, some vertices will have multiple tokens. The trick is to notice that each of these tokens represent an initial vertex. If there are 4 tokens in a single vertex, it means there are 4 vertices that must all contain at most one token in total initially. So for each of the vertices at the end of the game, multiply (Number of tokens in vertex + 1).

The problem is that `K` can be pretty large. I used something similar to matrix exponentiation to simulate. But I think maybe it is possible to prove that you don't need to do too many simulations?

const int MOD = 1000000007;


// merge two simulation vectors
vector<long> merge(const vector<long> &A, const vector<long> &B)
{
    int N = A.size();
    vector<long> R(N);
    
    for (int i = 0; i < N; i++) {
        R[i] = 0;
        for (int j = 0; j < N; j++) {
            if ( (1LL << j) & B[i] ) {
                R[i] |= A[j];
            }
        }
    }
    
    return R;
}

int wayscnt(vector<int> a, int K)
{
    int N = a.size();
    // The "identity" vector each token in its initial vertex
    vector<long> I(N);
    for (int i = 0; i < N; i++) {
        I[i] = (1LL << i);
    }
    // B represents what happens after a single step:
    vector<long> B(N);
    for (int i = 0; i < N; i++) {
        B[i] = 0;
        for (int j = 0; j < N; j++) {
            if (a[j] - 1 == i) {
                B[i] |= (1LL << j);
            }
        }
    }
    // P[0] is what happens after one step
    // P[1] is what happens after two steps
    // P[2] is what happens after four steps
    //...
    // P[i] is what happens after 2^i steps 
    const int MAX = 30;
    vector<vector<long>> P(MAX, vector<long>(N) );
    P[0] = B;
    for (int i = 1; i < MAX; i++) {
        P[i] = merge(P[i-1], P[i-1]);
    }
    // with that in mind, we can find what happens after any number of steps
    vector<long> x = I;
    for (int j = 0; j < MAX; j++) {
        if ( (1<<j) & K ) {
            x = merge(x, P[j]);
        }
    }
    
    long res = 1;
    for (long y: x) {
        res = (res * (__builtin_popcountll(y) + 1 ) ) % MOD;
    }
    
    return (int)(res % MOD);
}

1000: The one with bipartite matching

Find the minimum total weight of edges to remove from a bipartite graph so that there is no perfect bipartite matching. o_O.

According to this: ItayGonshorovitzThesis.pdf, if we reduce the bipartite matching problem with flow and try to find the minimum cost to reduce the flow, that is NP-hard. Although with such a low constraint for `n`, maybe non-polynomial IS intended. I think that paper has interesting reductions for the problem so maybe it helps?