Tuesday, July 26, 2011

SRM 513

I ended up testing this match. The suspense regarding my eerily close to red rating will have to be postponed.

The editorial is up: Editorial

The quick write up has been removed because , well it wasn't that good.

Saturday, July 09, 2011

TopCoder open 2011 Round 3 Part 2: RowOfColors

(Link to part 1: The 250-pointer)

1000 : RowOfColors
I was lucky to be able to get a problem accepted as a 1000 pointer. 200 bucks :). Anyway, this problem initially started as an idea for (w,h,k <= 50). Then there is as solution that is O(w*w*k*k*h), if I remember correctly. The O(w*h*k) uses the same idea, but needs a small trick.

The first row will always begin with a given color. Note that this color may appear later in the row. Imagine we decided the color of this first cell and the other cells on the top row that will have that color:

C...C.........CC.C..

All the other rows will somehow cooperate with us to try as hard as possible for that color placement to be possible. Note that the C colors will HAVE to be connected, so there is going to be some sort of bridge connecting them. Since the other cells in the top row won't be of that color, the bridge has to go through the other rows:


C...C.........CC.C..
C...C...CCCCCCCC.C..
CC.CCCCCC......CCC..
.CCC................


But the rows need not to only contribute with us to allow the distribution of C colors, but also the distribution of all the other colors. This means that we should be careful not to use space that would be useful for any other bridge between the other colors. The required trick is to notice that the following is the best way to place the color in the other rows:

C...C.........CC.C..
C...C.........CC.C..
C...C.........CC.C..
CCCCCCCCCCCCCCCCCC..

Why? There is a bridge, but we are opening a lot of space for potential bridges between the first and second C and the second and third C. This actually solves the problem. Note that there are some empty spaces:

.111.222222222..3:44
.111.222222222..3.44
.111.222222222..3.44
..................44

These rectangles (1, 2, 3 and 4). Are now places in which we can place new colors. But note that, for example, the first cell in the top row of rectangle 2 will be able to get repeated, and that such color will not be able to jump to rectangle 3 or 1, because we cannot build a bridge. So, inside rectangle 2 we may have something like this:

Y222Y22YY
Y222Y22YY
YYYYYYYYY

Y is a another color but it follows the same rules. So, we can solve this recursively. At every recursion step, we pick the positions of the cells that will share the same color as the first cell. We then assign colors out of the K available for each rectangle and so and so. But that solution is very slow. We can optimize it by using two recursions and it will become the afore mentioned O(w*w*k*k*h) solution. That is still too slow.

The solution is to keep the recursion knowledge but to solve it with another method. First, acknowledge that the problem has become similar to parenthesis expressions. Each shape of colors starting from the first cell is a very recursive structure.

C...CY...Y..YYCC.C..
C...CY...Y..YYCC.C..
C...CYYYYYYYYYCC.C..
CCCCCCCCCCCCCCCCCC..

Imagine we decide the color to use in the first cell. In our previous work we picked in which other cells to place the same color. Not anymore, we will instead only decide if we will use the color again later or not:

C................... C?..................
C................... C?..................
C................... C?..................
C................... CC..................

In the second case, we would then add C to a list of colors that we need to use letter. Else we would continue to the next cell. There are k choices for color C.

Now, imagine that we are in another cell, and there is already a list of colors that need to be used again. We can decide to place a new color OR we can decide to place one of those colors that need to be used again. However, the order of the colors in that list matters. If we first added Green and then Red to this list, then we cannot place green again without first removing Red. In fact, this list should be treated as a stack, and since there is no choice on the color but to pick the first one on top of the stack, then we only need to remember the size of the stack.

If we decide to place the color from the top of the stack, we actually have two decisions, either we will still foresee to use that color again, or we will decide to stop using that color. In the first case we should keep that color in the top of the stack. Else we can take it out of the stack (decrease the size of the stack).

Even when there are colors in the stack, we may still want to place a whole new color. Due to the recursiveness, adding a new color works the same as when the stack is empty. There are k choices and we should decide whether to leave the color in the stack or not.

Note that the stack should never grow larger than h. In fact, we should never place a new color when the stack size is h, because there would be no space for the new color.

That is enough to find a recurrence that solves the problem:

// x: horizontal position in the row
// s: size of the stack
// k: unused colors left
int F(int x, int s, int k) {

if (x == w) {
// the row is full. Base case:

if ( (s==0) && (k==0) ) {
// if the stack had elements, then we wouldn't have fulfilled the
// plan to place those elements for the last time in the row.
//
// if k is not 0, then we didn't place all the colors that we had
// to place.
//
// Else there is exactly one way to finish this base case.
return 1;
}

result = 0;

if ( s < h ) {
// place a new color (plan to use it again)
result += k * F(x+1, s+1, k-1)

// place a new color (won't use it again)
result += k * F(x+1, s, k-1)
}

if ( s > 0 ) {
// place a color we planned to use again.
result += F(x+1, s, k) // will use it again

result += F(x+1, s-1, k) // won't use it again
}


return result;
}
}



That recurrence will obviously be too slow. If we do try to memoize it, there won't be enough space in memory for all w*h*k states. The alternative is to notice that we always use (x+1) in every called sub-instance of the recurrence. So we can use dynamic programming using a trick in which we keep only two tables. One table dp[s][k] keeps the results [x][s][k] for the current value of x and the other table tdp[s][k] keeps the results [x+1][s][k]. So, when calculating dp[][] we will use the results from tdp. Then we can move dp to the space used by tdp. In total we will need only O(h*k) memory which will fit the game.

It may happen that even after this optimization, your O(w*h*k) implementation does not pass. There are many ways to avoid the time out. For example, note that whenever we multiply by k inside the recurrence, we will first do it by (K) then by (K-1) then (K-2), ... etc. Where K is the original K from the statement and not the one used in the recurrence. We can just forget about multiplying by k and instead just multiply the final result of the dp by K! (factorial) in the end. This allows us to implement the logic using only 32 bits integers in the operations without worrying about overflow. Due to architecture, 32 bit operations are faster.

But a less lower level solution is to note that certain states with some values of k and s depending of x are not going to be reachable. For example, if x is too large then there will not be a good chance to decrease k and s until x becomes w. Combining both optimizations, the following Java implementation requires only 0.5 seconds in the worst case (in TopCoder's servers):

final int MOD = 1000000007;

public int countWays(int w, int h, int K) {
int[][] dp = new int[h+2][K+1];
int[][] tdp = new int[h+2][K+1];

for (int p=w; p>=0; p--) {
for (int stack=0; stack<=h; stack++) {
for (int k=Math.min(K,w-p); k>=0; k--) {
int res = 0;
if (p == w) {
//reached the end.
// but the stack should be empty.
// and all colors should be used.
res = ( ( (stack==0) && (k==0) ) ? 1 : 0);
} else if ( stack > h) {
// too many colors in the stack, which means there is no
// vertical space for them
res = 0;
} else {
if ( (k > 0) && (stack+1<=h) ) {
// we can try a new color.
// plan to use this color again
res += tdp[stack+1][k-1];
while (res >= MOD) res-=MOD;

// don't use it again
res += tdp[stack][k-1];
while (res >= MOD) res-=MOD;
}

if ( (k <= w-p+1) && (stack>0) ) {
// we could also just use the top color.
// will use it again.
res += tdp[stack][k];
while (res >= MOD) res-=MOD;

// won't use it again
res += tdp[stack-1][k];
while (res >= MOD) res-=MOD;
}
}

dp[stack][k] = res;
}
}
// move tdp to dp and viceversa
int[][] aux = tdp; tdp = dp; dp = aux;
}
// multiply by K!
long p = tdp[0][K];
for (int k=2; k<=K; k++) {
p = (p * k) % MOD;
}

return (int)p;
}

TopCoder open 2011 Round 3 Part 1: ArtShift

A common question I get is why didn't I participate in this year's TopCoder Open. The thing is that in order to submit problems to be used in the TCO you are not able to continue competing in the algorithm track. I made the scarily pragmatic choice of picking a decent probability of getting at least one problem approved for TCO over the very small probability to advance to the on site rounds.

The plan wasn't working out great until this week in which I finally got assigned a couple of problems for an online round (not qualification). Two of my problems were assigned to be the 250 and the 1000 pointers in this match.

This morning, I opened the 500 and the worry began that maybe the 1000 is too easy. It also seemed like the other writer had issues solving the 250, so maybe it was too hard as well.

250: ArtShift
Link to statement
I didn't really expect such a low success rate in this problem (51%). It turns out there was a good hidden story about corner cases that I was not aware off.

The main idea of this problem is to notice that in the "normal" case in which you have a sequence of n different colors. Let us say {1,2,..., n} Then the solution is:

Max( lcm(x1,x2,...) where x1+x2+...<= n )

Well, that's right. For example, imagine if the colors were {1,2,3,4,5}. Then an art shift that affects all 5 elements is : {1,2,3,4,0} and in total 5 different sequences will be reachable: {1,2,3,4,5}, {2,3,4,5,1} , {3,4,5,1,2}, {4,5,1,2,3} and {5,1,2,3,4}. Note that the shift just rotates the permutation by 4. And it causes a cycle between all five elements. Now, cycle is the great idea here. Imagine an art shift that was like {1,2,0,?,?} Then the cycle is only between the first 3 elements. The first three elements will cycle like this: {1,2,3,?,?}, {2,3,1,?,?}, {3,1,2,?,?}. For the remaining two indexes, we can make them belong to a cycle: An art shift {?,?,?,4,3} will make the last two indexes cycle like this: {?,?,?,4,5}, {?,?,?,5,4}, {?,?,?,4,5}, ... . If we combine the two ideas together we get the shift: {2,3,1, 4,5}, and the nice thing is that for each of the three parts of the first cycle, there will be two ways of the second cycle, in total we have 2*3 = 6 different sequences:
{1,2,3, 4,5}, {1,2,3, 5,4}
{2,3,1, 4,5}, {2,3,1, 5,4}
{3,1,2, 4,5}, {3,1,2, 5,4}

Now note that the order doesn't matter, we just need there to be two different cycles, one with three indexes and one with two indexes: {1,0, 4,2,3} will also allow 6 cycles. But if in a different case, one cycle had length 4 and the other had length 8. Multiply 4*8 would not work, because in fact we need the LCM(4,8) and not the product 4*8. So, for a total of n elements, we need to add the element to one cycle such that the LCM of all the cycle lengths is as large as possible.

Max( lcm(x1,x2,...) where x1+x2+... <= n )

Where lcm(a,b,...) is the Least common multiple of a,b,...

That solves the case with n uniquer colors. And it is also a common problem , so most people in the match probably knew it. But how does it help us in the problem where there are repetitions and only two colors?

The trick is to notice this: Imagine a sequence of all-white colors: WWWWWWWW . Then we cannot really make a cycle of size different than 1. But if we instead had WWWWWWWB then we can actually have a cycle of length 8. It turns out that all that matters is for there to be two different colors in the part of the sequence that we want to make a cycle. Assume that we made a cycle of length 2 out of WWWWWWWB, then the cycle would have to have used the sole black color and all that is left is 6 W: WWWWWW , we cannot do any cycle anymore, so we can only do at most one cycle of size larger than 1 in that case. Now, imagine WWWWWWBB, we can now do two different cycles: WWB and WWWB. Because this time there are two B characters. Since each cycle will need at least one white color and one black color, then the maximum number of cycles is min(count of W colors, count of B colors).

In fact, once we know that we can conclude that the answer is:

Max( lcm(x1,x2,...,xt) where x1+x2+...+xt and t<= min(count W, count B) ) <= n

All that is left to do is to find a quick way to get such a maximimum lcm. It turns out that bruteforce is fast enough. For example, a recurrence that remembers the current lcm value, the minimum unused number, the remaining number of cycles and the maximum cycle number we can use so far.


It turns out the second writer (All legendary soul-net) had issues with this problem because the original problem I mentioned earlier is a very well-known one and it is also well-known that the used cycle lengths would be primes or powers of prime numbers. But in the version with a limit on the number of cycles, this is not true. It also turned out to be the reason many people failed the problem. I really, didn't know about this. I was actually aiming to have a large success count, and thus if I knew about this I would have included a case that needed a composite non-prime power cycle length.

I initially thought that this problem was going to be a 500. The constraints were n<=100. In fact, bruteforce will work with 100 as well, but it needs some cropping.

(Part 2: The 1000 pointer)

Saturday, July 02, 2011

SRM 511: And thus the good times end

I haven't blogged in a while, so here it goes: SRM 511, my rating was 2072, there was a good chance I would become red, but on the other hand it has been a good while since I lost rating, so I knew I had to lose rating eventually.

I was very expectant of this SRM, I knew all the possibilities: I could become red, I could get a small rating gain, I could lose a small amount of rating or I could drop back to 19XX. IMHO the worst possibilities would be losing a lot of rating or getting a small rating gain. One because it would put me back to 2009, and the other because I would be in the same situation as today.

It was 11:15 AM, and I was still waiting. I sent mystic_tc some messages about the SRM voting results (Now they include the rank numbers) and then... My ISP went down. Gosh , I was really mad about that, I waited and waited, by 11:55 AM it seemed that it was not coming back, so I went for plan B and decided to go to a cybercafe. I don't like solving SRM in another computer. For starters, you have to use windows. Then you don't have your plugins nor a local compiler so testing takes ages. I have also personalized my local testing quite a big deal (for example, I rely a lot on valgrind). It is also difficult to get used to programming in other screens, and then we have loud people playing DOTA (Which BTW blows). It is an awful experience.

I nevertheless had to accept my fate and went to have a SRM.

Div1 500 - FiveHundredEleven
Link to statement
"Fox Ciel and Toastman" somebody is having fun... I thought. So, we have these cards. Each card does | (binary or) and when 511 is reached you lose. I was really out of ideas on this one. Tried many things during the match.

511 is a small number, so we can have something exponential on the number of bits to remember which bit has been hit. The problem is how to remember which cards are left. There are 50 of them. For some reason I just couldn't get a working recursion because of this detail. So the number of states is something like 511*2^50 or other optimizations that turn too small.

I look at the division summary and Petr has already solved 250 and 500... (Note that this was early in the match just "some time" after thinking about 500. This tells me that Petr found both problems to be straightforward o_O. Many submissions for 500 could come, so I better solve it if I want a good position.

The other idea was to try learning a greedy approach. Then each player could use it and you would not need dynamic programming (and thus a good way to save the state would not be needed). The problem is how to think of one and prove it. (I am not sure right now if such an approach is possible.)

If Greedy was needed, they wouldn't have gone with such a low constraint (511). So I kept trying to somehow optimize the number of states.

If we take a card 1110 (binary) and there are other cards 0001, 0010, 0111, then all the other cards could become 1,0,1 after you pick the 1110 card because we can ignore those bits that we turned to 0. This was the main idea I could get but I couldn't advance from it (to either think of something greedy or to optimize the state).

It is getting late. Division summary kind of looks like a fast 250 was going to be enough for a good position, so I switch. What scares me is that many people have high scores for 250, so speed was going to be important and I was going to be slower due to the lack of a local tester.

Div1 250 - Zoo
Link to statement
Cats and rabbits... yeah having fun... I am initially confused about this. The scores made it look like a problem that can be solved by many with 242+ pòints. This thing didn't seemed so easy...

...until I start to wonder if the heights are unique, re-read and they are! Knowing that the heights are unique, then the set of rabbit answers and the set of cat answers would each be in the form: 0,1,2,...number-1. Where number is the number of animals of that species. For example, three cats would HAVE to say {0,1,2} and 5 rabbits {0,1,2,3,4}. Then we can iterate for the number of rabbits r (then the number of cats is n-r). If there are r rabbits then the elements {0,1,2,...r-1} should exist in the input. IF there are c cats then the elements {0,1,2,...c-1} should exist in the input, but an animal cant be both a rabbit and a cat. So, for each i less than min(r,c) there should be TWO elements in the input equal to i. Each of these elements could be a cat or a rabbit, so we have to multiply by two. Finally for each i from min(r,c) to max(r,c)-1 there should be EXACTLY one element equal to that i. For each valid pair (r,c) we add 2min(r,c), that's the result.

Fun fact: Second Div1 250 that does not need a 64 bit integer as the result yet it uses it anyway.

I was lucky that my first code turned out to compile right away and pass all example cases. I submitted it. I would say that not having a local tester did not affect my score much. I got 227 points, I would say tht with a local tester I would have gotten 228 at most because I wouldn't have to manually click the test button so much to try all examples. Not a big deal.

Challenge phase
By the time I ended 250, everything was looking badly. A lot of people ended up submitting 500. I kept trying but I couldn't think of anything. My rank seemed very low before the challenge phase. I tried to prepare challenge cases. I noticed that The div1 500 - as usual - had very weak examples. For starters, besides a few special cases Fox Ciel always wins in thos cases. So I thought, what about a non-trivial case in which Toastman wins? I ended up trying 510,510,511,511.

When the challenge phase started, I opened my first 500 hoping to see a trivial mistake. Then I saw this state [511][2]. And that was when the solution to 500 finally hit me, but that's another topic.

I opened someone's 500 and noticed that he tried some sort of greedy thing. It looked odd. So I tried simulating 510,510,511,511 on it mentally, and it semeed that it would return "Fox Ciel". So I was about to challenge when I hesitated - Maybe I did not read something correctly. I re-read and while looking a way someone else already challenged it.

The match ended and I lost rating, but not that much, I think this was one of the best results possible given the circumstances. I am also glad that the ISP issues didn't affect my result much (This was really my fault for not solving 500).

Back to Div1 500 - Zoo
The idea was actually very simple. When you pick 01110 and there are cards 00101 and 11101, then they can become 1 and 1 and we ignore the other bits, remember? Well. In fact, this also means that they could become 01111 and 11111. Or rather, we can just represent the state as the current total bit-wise or current. Then all cards would get affected by it. How do you know which cards you have picked yet? All those that are subsets of current could have been picked before. For example, we could just apply current to all the cards. After picking 01110 the other cards become 01111 and 11111, the cards that are equal to current after doing this could be used cards, but maybe some of them are not used and are just unused cards that turned "useless" (Cannot change the state if you use them).

In order to handle useless cards, note that once cards are made useless, their values do not matter, only the number of useless cards. The number of useless cards can be found if we remember the turn number. Turn number - number of cards that could be used = useless cards. We can use 'use' one of these useless cards by increasing the turn number. So, a recurrence (number of turns, current) that returns ture if the next player to pick a card loses can solve the problem.


Sorry if the blog is more vague than usual, I am still without an internet connection at home, so I am doing this once again from another computer :(.