Saturday, April 14, 2012

Google Code Jam 2012: Qualification round

What a problem D. In an attempt to honor the difficulty spike, the first qualification had very easy problems, until problem D. I spent the whole afternoon coding a solution only to find out right now it was doomed to perish anyway.

Problem A: Speaking in tongues

The algorithmic part is easy. But you have to find out the actual rules to replace the alphabet. I was feeling lucky because it was just recently I participated in codeforces April Fools contest. Because I used the very same python script to quickly translate the input/output specification into a look up table. Some letters were missing, but you could get them from the statement. One letter was still missing, but since it was only one letter, there was only one possibility for what z meant...

I guess the point of this problem was to reward general problem solving skills.

Problem B: Dancing With the Googlers

This one looks like it was meant to test your basic knowledge . It was definitely the most standard of all the problems. B-small could have been solved by mere brute force. As for B-large, I used dynamic programming.

Let us say you have a score a+b+c. Is it possible its maximum is at least X assuming the triplet (a,b,c) is not surprising? Is it possible the maximum is at least X assuming the triplet (a,b,c) is surprising?. You can actually just build a table couldBe[a+b+c][p][surp] that is true if there is a triplet (a,b,c) such that its maximum is at least p, (a+b+c) is the sum and the triplet is surprising if and only if surf=1. Since individual judge scores are at most 10, it is easy to fill this table by actually trying all the triples possible.

After you get that table. You just need to do use the following recurrence: F(n,s, P), the maximum total of googlers with a maximum sub-score that is at least P if S of them were surprising among the N first googlers. So the overall result is F( total googlers, S, P). Now, let us say we want to solve F(n, s,P). We pick googler # (n-1). We know his score, and we can decide to assume his triplet was surprising or not (Assuming the previous table says the combination of total score, maximum and surprise is possible). This will lead to two sub-problems : F(n-1, S, P) and F(n-1, S-1, P) (S could have been reduced if you picked the googler to be surprising). If you solve both smaller sub-problems then you just have to pick the best between each of the choices.

Problem C: Recycled numbers

This seemed easier than B. I guess it just tests your basic optimization skills. C-small is easy and you can just iterate through all the pairs in the range and simulate everything. C-large is still easy: For each x, count the number of ys such that (x,y) is a valid pair. You can build all possible y, since you need to take a left portion of x. Then there are at most log(|x|) (the number of digits) values of y you can pick.

I hesitated for a second in C-large, because maybe my simulation was too slow (I used simple STL code rather than extracting digits using calculations, so constant factor was large). It was not, but I still used two cores just in case. It is great to have a template that can do split input into two processes easily in cases you have doubts-

Problem D: Mirrors of Hell ... Hell of mirrors ... Hall of mirrors

Update: This explanation is incomplete and the formula is wrong. Fear not, cause I have written a long tutorial about this problem, including D-large: It's here

I stopped feeling lucky. My initial plan for this match was to finish it in around 2 hours and then enjoy my perfect score. But this problem was just evil. Evil to think about, evil to implement, evil everything.

I spent a lot of last night trying to solve D-large, because I did not really know how could the 2*W+2*H-4 restriction in D-small would help make it easier. I was baffled, because many coders were solving D-small and not D-large, so it must have been a useful thing... Somehow, I missed for all this time that there was a condition that all of the border columns and rows were going to be full of mirrors. So in fact, D-small is easier because the shape is not irregular and grid-based, but it is just a rectangle.

Unfortunately, the rectangle business does not make the problem TOO easy either. Unless you are familiar with the way rays bounce inside a rectangle. My approach was to try a lot of them until I noticed something quite excellent. For every pair (bx, by) (bx and by could be negative but at least one is non-zero) there is exactly one way to make a ray that starts at (ANY position inside the rectangle) and ends at such position again such that the ray of light bounces bx times on a horizontal mirror and by times on a vertical mirror (Consider those special corners that always return the ray of light as just bouncing once in the vertical side of the corner and once in the horizontal side of the corner). More so, the total length of this trajectory can be found using a (rather clever) formula that you will need some magic to develop. Because of this, you can just pick all relevant pairs (bx,by) (You will later see that because of the constraints and the formula for the total length, abs(bx) and abs(by) can be bounded by 100 or even a lower value). Now, something tricky to understand is that bx and by can be negative. Whatever negative bounces means...

The formula... Let us say you are located at (x0,y0), you want to bounce on horizontal mirrors bx times and on vertical mirrors by times. Now let us imagine you want your trajectory to begin in the first quadraint (important). Draw in some paper how is it like to bounce 6 times on horizontal mirrors and once on vertical mirrors and then return to (x0,y0). You will see something amazing:

The actual formula for the distance depends on (bx,by), as you can see, it is similar to multiplying something that depends on bx * H and something from by * W. There are some conditions like what to do when bx and/or by are odd. In those cases, you have to sort of subtract the original position from the ending portion. All in all, the general formula looks like this: (The idea of using negative bx and by is so that you can make it work for any quadraint as starting trajectory).

             int dx, dy;
             if (bx & 1) {
                 dx = W * (bx- sign(bx) ) + 2*( (bx >= 0)?x0:(x0-W));
             } else {
                 dx = bx*W;
             }
             if (by & 1) {
                 dy = H * (by - sign(by) ) + 2*( (by >= 0)?y0:(y0-H));
             } else {
                 dy = by*H;
             }

Where (dx,dy) is the final point. So that the distance es equal to sqrt(dx*dx + dy*dy). You just need to pick the values of (bx,by) such that the distance is less than or equal to D... Ok, we lied, this formula works great in order to find bouncing shapes that finish and start on a point, but they may visit that point more than once and continue going. The fix to this is to make sure to only consider values of (bx,by) that are not a multiple of a value you already tried (The smaller pair will have smaller distance and the same trajectory, but will finish the first time it re-visits the point).

This is difficult to explain. I recommend taking a paper or a graphics environment and drawing some cases, it becomes clear eventually.

The final code looks like this: Note that I extract the rectangle from the grid and in order to avoid using floats, I multiply all coordinates by 2.

int W, H, D; 
char grid[31][31];

int gcd(int a, int b)
{
if (a < 0) {
return gcd(-a, b);
} else if (b < 0) {
return gcd(a, -b);
} else if (a < b) {
return gcd(b,a);
}
while (b != 0) {
int c = b;
b = a % b;
a = c;
}
return a;
}


#define sign(x) (x/abs(x))
int solveFinal(int W, int H, int D, int x0, int y0)
{

/* ____________________ (W,H)
| |
| |
| |
| (x0,y0) |
| |
|____________________|
(0,0)
*/

set< pair<int,int> > st;
for (int bx = -200; bx < 200; bx++) {
for (int by = -200; by < 200; by++) {
if ( !bx && !by ) continue;


int dx, dy;
if (bx & 1) {
dx = W * (bx- sign(bx) ) + 2*( (bx >= 0)?x0:(x0-W));
} else {
dx = bx*W;
}
if (by & 1) {
dy = H * (by - sign(by) ) + 2*( (by >= 0)?y0:(y0-H));
} else {
dy = by*H;
}

//2*sqrt(dx*dx + dy*dy) <= D

if ( (dx*dx + dy*dy) <= D*D) {
int g = gcd(dx,dy);
int old = st.size();
st.insert( make_pair( dx/g, dy/g ) );
if (old != st.size()) {
cout << "("<<dx/g<<","<<dy/g<<")"<<endl;
}

}
}
}


return st.size();
}

int solve() {
int x0,y0;
for (int i=0; i<W; i++) {
for (int j=0; j<H; j++) {
if (grid[i][j] == 'X') {
x0 = i; y0 = j;
}
}
}
//cout << W <<", "<<H<<" ; "<<x0<<","<<y0<<" ;; "<<D<<endl;
//Convert
x0 = 2*x0 + 1;
y0 = 2*y0 + 1;
x0 -= 2;
y0 -= 2;
W = 2 * (W - 2);
H = 2 * (H - 2);
D *= 2;

return solveFinal(W,H,D,x0,y0);
}

Failing at D-large

So, after finishing D-small, I had a lot of other things to do. Then I tried to solve D-large later in the afternoon. My idea was that using the logic from D-small, you can prove that you only need to try starting direction vectors (dx,dy) where each of the components is between -100 and 100 and dx and dy are co-prime. Then my idea was to simulate the light for each of those starting directions.

I first tried to use fractions, because I have been very worried about precision in this problem. I tried to use gmp. And although I managed to get it to work. Even simple tests were looking quite slow. It was going to be unlikely this would work well under the 8 minutes limit when I finished coding all the logic and stuff. So I tried to do it using doubles being careful about precision.

It took me a lot of time to implement correct logic for this. I extracted line segments and saved them in arrays. Extracted special corners. Squares that can block. I then tried to find the closest intersection between current point (X,Y) and something when using (dx,dy) as a direction vector. It is all equations but it gets tiresome.

When I noticed, 4 hours flew implementing all that stuff. And once I was ready to begin testing cases. There were tons of bugs. I used the last round of the match in trying to find all those bugs and fix them. But eventually, after the end of the match I think I found out that precision issues with my code were so evil that the light was getting really deviated from its real trajectory.

Too bad about not getting a perfect score. A lot of people did. I am thinking there might be a trick like in D-small that works with irregular shapes. Or rather, a general version of the trick.

Update: looking at the contest analysis, it seems there was an even more clever way to solve D-small that truly helped D-large.

5 comments :

Fred Coughlin said...

FYI, B could be solved greedily. For a given score, its scores given by the judges were unique in both the surprising and the unsurprising cases. Analysis of those cases leads to a simple formula for surprising and unsurprising. Then you can go through the scores, and greedily assign the surprising scores.

harshadura said...

awesome work, I actually tried only the first one.. hehe most one did it though

César Puerta said...

I wrote a very simple solution for B and I'm wondering if something might be wrong with it? Every other solution I've seen is much harder.

In order to have a best result of at least p with no surprises, you need a score of 3*p-2 or above. If you factor in surprises (only for a score >=2), you can then accept 3*p-3 and 3*p-4. The final solution looks like this:

int remainingSurprises = Surprises;
int count = 0;
foreach (var score in Scores)
{
if (score >= 3 * Threshold - 2)
{
++count;
}
else if (score >= 2 && score >= 3 * Threshold - 4 && remainingSurprises > 0)
{
++count;
--remainingSurprises;
}
}
return count;

vexorian said...

I think this is exactly Fred Coughlin's algorithm. From what he explained, it is probably fine. The tops and people that have been too long in these contests like me prefer to do dp and bruteforce - even though the amount of code looks longer, it is much easier to think of those solutions and they don't need proof. But it does not necessarily mean the easier solutions are wrong.

César Puerta said...

Well, I guess they are just the mathematical or computational views of the same problem :)