Monday, September 24, 2012

Surprise test SRM the 25-th (Tomorrow!)

http://apps.topcoder.com/forums/?module=Thread&threadID=762941&start=0
After the TCO, TopCoder plans to start releasing functionality updates and bug fixes into both algorithm and marathon competition software. This will be done via sofware competitions, so in addition to algo/mm systems becoming better, those of you active at software side will get a chance to win some money for helping us make them better.

Last several months were devoted to different preparations in order to make it possible (some of you could have noticed assembly contests we've run for that purpose). People who previously developed those systems are no longer working at TopCoder and it happened so that large part of knowledge about them was lost. Even putting together a code to be used for further updates presented a certain challenge, which now is close to be solved.

At this point we think we have a codebase which resembles the current competition systems very closely (we know about a couple of minor differences, but almost everything is the same). While we've done quite a lot of testing, we still feel it would be useful to have even more extensive testing.

Therefore we would like to deploy this new code temporarily instead of the current systems and use it to conduct a test SRM. The more people compete in this, the more feedback we will be able to gather, so we invite and kindly ask all of you who can take part to do this tomorrow.

Some more details:

- We will reuse problems from some old SRMs (between 200 and 400). Exact problems to use will be chosen more-less randomly.

- The exact time of the SRM is not yet known, since we exactly don't know how much time it takes to do the deploy and some difficulties can potentially arise during this process. In fact, it's even possible that we won't be able to conduct this SRM tomorrow due to some unforeseen difficulties, though of course we'll do our best to conduct it. We will announce the exact start time in this thread at least 2 hours before the actual start. It is almost certain that start time is 2 PM or later.

- The SRM is not rated. We can't guarantee that SRM runs smoothly, since the code has not been tested very extensively and in fact, it is the purpose of this SRM to find out problems (hopefully and most likely, minor ones) with this new code.

If you have any questions, please feel free to ask here.

Codeforces #140 (Div 1)

Another attempt to finally participate in codeforces again. This was more successful than the last one because for once I have registered for the match.

Problem A : Flying Saucer Segments

Link to statement

This reminds me of Hanoi towers although it is a little different.

We got n aliens in segment 1 that we want to take to segment 3. Define the function that solves that f(n). It is better to try to move the one alien with the smallest rank to the third segment first.

We cannot move the lowest-rank Alien to segment 2 until we move all the other Aliens to segment 3. The solution for this step is f(n-1). As that is the cost to move n-1 Aliens, since the other Alien has the lowest rank, it will not affect their costs.

We can now move the lowest-rank Alien to segment 2. But now we will not be able to move it to segment 3 until we move the other (n-1) Aliens from segment 3 to segment 1. By symmetry, the cost will also be f(n-1). Then we can move the smallest Alien to segment 3 (cost 1).

We now got a very useful state, the lowest-rank alien is in segment 3, and the other aliens are in segment 1. We need to move the remaining aliens to segment 3, add f(n-1) again.

In short, the formula to solve f(n) is : 3*f(n-1) + 2. The base case is f(1) = 2 (Two steps to move a single alien).

The problem is then about calculating f(N) using that recurrence. Believe it or not I found myself having the most issues with this. A formula is be: (30+31+...+3n-1)*2. But I prefered to use a linear recurrence (and matrix multiplication) during the match.

I took way too long to solve this problem.

Problem B : Naughty Stone Piles

Link to statement

This one required a bit more thought. Anyway, imagine if K was at least n-1. Then it would be best to move the n-1 smallest piles to the largest pile. This ensures a minimum possible cost equal to the sum of the n-1 smallest piles.

With more imagination, imagine that k is exactly the half of n-1. At the end, you will move k piles containing all the stones to the largest one. This is still the sum of the n-1 smallest piles. However, in order to reach a state with k piles, you first need to move k piles to merge the piles into a single group of k piles. The piles you move in this phase will have their cost repeated in the total cost. It is best to pick the k smallest piles, move each to any of the other k piles and the total cost will be: 0*(largest pile) + 1 * (k largest piles that are not the largest ever) + 2*(k smallest piles).

In fact, the maximum number of piles you can move in the second phase is k*k. To put it simply:

  • The last phase involves verifying that the largest pile now contains all stones (Cost 0 * (size of largest pile) )
  • The second-last phase involves moving k piles to the largest pile. The total cost is: 1*(total size of the n-1 piles).
  • The third-last phase involves moving k*k piles. to other k piles. The cost is sum(of the n-1-k smallest stones).
  • The fourth-last phase involves moving k*k*k piles. to other k*k piles. The cost is sum(of the n-1-k-k*k smallest piles).

And so and so, there is our algorithm. The largest pile will cost 0. The k next largest piles will each be added only once. The next k*k piles will each be added twice. The next k*k*k will each be added thrice and so and so...

You can calculate the cost in O(log_k(n)). Note that k can be 1, which turns it into O(n). The total complexity is O(q*log(n)) as long as you make sure not to calculate the case k=1 more than once.

int n, q; 
long a[100000];
long acum[100001];
long k[100000];
long res[100001];

long solve(long K)
{
long res = 0;
long N = n;
long p = 0;
long r = 1;
while (N > 0) {
long hi = N;
long lo = std::max(0LL, N - r);

// p times the sum of the stone piles between lo and hi.
res += p*(acum[hi] - acum[lo]);

N = lo;
if (r > N / K) {
//we don't want overflows here...
r = N;
} else {
r = r*K;
}
p++;
}
return res;
}

void solve()
{
memset(res, -1, sizeof(res));
sort(a, a+n);
acum[0] = 0;
for (int i=0; i<n; i++) {
acum[i+1] = a[i] + acum[i];
}
long whenOne = solve(1);
// when k=1, solve(k) is linear in complexity, you do not want it
// to be executed many O(q) times...
for (int i=0; i<q; i++) {
if (i > 0) {
cout << " ";
}
cout << ( (k[i] == 1)? whenOne : solve(k[i]) );
}
cout << endl;
}

I went to have lunch between figuring out the solution and implementing it. Slow to solve is today's theme.

Problem C : Aniversary

Link to statement

Did not really have much time to think of a solution for this. Google Gibonacci GCD and you will find out that there is a very interesting property about the GCD of Fibonacci numbers. In effect, since the GCD of Fibonacci numbers is equal to the X-th Fibonacci number such that X is the GCD of the indexes of the Fibonacci numbers (And also because the larger the index, the larger the Fibonacci number) then the problem becomes about finding the largest GCD of a k-subset of numbers between l and r, inclusive. Then use this largest GCD as an index for the Fibonacci function (if the value is large, you can use Matrix multiplication to find out this large Fibonacci number modulo m).

Could not solve the sub-problem.

Let us see how it goes.

Opinions, corrections, complaints, etc

Feel free to post comments.

Friday, September 14, 2012

Learning to find bugs

In SRM 556, I made an algorithmic mistake in a problem that completely ruined the day (although the failed challenge was a larger factor, BTW, I found out why my challenge did not work out. It turns out that the solution I tried to challenge had two major bugs, but I only noticed one, but in the specific test case I gave it, the second bug made it work.

I keep saying this. After a match, ask yourself what should you have done to solve one more problem than the amount of problems you solved correctly? In case of failing system tests due to a bug that amounts to modifying a single line of code, then the answer is "find the bug before the end of the coding phase". But how?

Well the answer is in remembering the match. I actually submitted the first two problems quite fast yesterday. So I had like 40 minutes to do basically nothing. I tried to put some interest in div1 1000. But I kept having the feeling I had to find bugs in the other solutions just in case. I kept reviewing the codes and not finding anything. I think that was a mistake. Because we can do better than that to find bugs.

Bruteforce

In the aftermath, I kept reviewing the events. I noticed that it is not difficult to make a bruteforce solution for this problem that works in all right speed for (number of digits) <= 17.

It is not difficult. After placing the first digit card. Then you have two choices for the side at which you place the following cards. So if there are n cards, there are 2n choices of numbers you can make. A single bruteforce using bitmasks can simulate all the cases. Then you can just compare the generated strings and find the one that works better as a result.

// Brute force solution. 
string minNumberBrute(string digits, string lowerBound)
{
string best = "z"; //represents a very large number...
// in string lexicographical comparisons
// "z" is larger than any chain of digits.
int n = digits.size();
// try a mask for the 2^(n-1) choices
for (int mask=0; mask < (1<<(n-1) ); mask++) {
string x = string(1, digits[0] );
for (int i=0; i<n-1; i++) {
// depending on the choice, put the digit left or right
if ( mask & (1<<i) ) {
x += digits[i+1];
} else {
x = string(1, digits[i+1]) + x;
}
}
// compare and remember the best.
if (x >= lowerBound) {
best = std::min(best, x);
}
}

return (best == "z") ? string("") : best;
}

Generating test cases

The idea is then to make a program that generates random test cases of 17 digits runs the solution I submitted and the brute force solution and compares the results. If they are the same, repeat. Else output a message saying that the program found a bug.

In c++, use the srand() function to initialize a random number generator. It is very convenient to always use a fixed seed. This way you can repeat the experiment in case something went wrong. (It could happen that your brute force implementation had bugs).

Then use rand() function to generate each of the 17 digits in each of the strings. There are two ways to use rand() to generate integer numbers within a range: A correct one, and an "almost" correct one.

You might use %. Like this page describes: http://www.cplusplus.com/reference/clibrary/cstdlib/rand/, but keep in mind it is not really uniform. If you really want uniformly random then you will need to use something like (int)(X * (rand()/(float)RAND_MAX) ). I just used %. With rand()%10 you get a number from 0 to 9, add it to '0' and you get a random digit.

Some critical thinking please

If you find a mismatch between your solution and bruteforce , it does not always mean you found a bug in your solution. It could be a bug in bruteforce. This is the problem with this idea. You have to develop and debug twice and brute force solutions are not always very easy to implement.

There is also another problem. Random test cases might not be the best way to find bugs in your specific code. Perhaps your code needs something very specific to fail. The alternative is to, instead of using random, try making up your challenges yourself. But this requires you to already suspect that a part of your solution might be wrong. It might also happen that your bug is only found when the size of the input is large, and in that case, you cannot use brute force to test...

Write it down

If you truly found a bug. Then write the case you found down. Once you understand the bug in your solution, resubmit and keep it in mind at the time of the challenge phase. Maybe this information will be of benefit...

Results

Ok, so I tried to measure how well this would help me to find a bug in my failed 500 yesterday. For this specific bug, it took only about 50 random test case attempts before it found a mismatch. I also found out that 2 of the 3 other guys who failed system tests in my room would fail the case I found. I think next time that I have so many time after solving div1 500 I will do this again, because trying the div1 1000 out rarely pays.

Thursday, September 13, 2012

SRM 556 : #@!$%!

I am getting tired of this. Every latest SRM is the same situation. Approachable div1 easy and div1 mediums thus everyone solves them. But I take too much time to solve the easy a lame mistake stops me from solving div1 medium. Thus I end up with very low score whereas most people have two problems. Rating is killed. Over and over and over again. Can we go back to the times with very hard problems? I am starting to miss them. At least when the div1 easy is hard a low score can still allow you to keep your yellow rating.

Div1 easy

Nothing to see here. Just a BFS. I took very long because I used the wrong variable in one index.

Div2 medium

You are given a stack of digits. Initially, place the top digit on a table. Then place the top of the remaining digits either to the left or right of the digit on the table. Repeat placing each top digit to the right or left of the group of digits in the table. When you are finished, a number will be generated.

The number should be the smallest possible number you can make that is larger than or equal to lowerBound. If it is impossible to make a number larger than or equal to lowerBound return ""

So, at first I thought it was a typical [use dp to build numbers by digits] problem. Then I noticed that the top rule makes it a bit harder than usual. Then I noticed that you can still solve it like that.

It is important to make sure to make a number greater than or equal to the bound. Thus we need to somehow add numbers from left to right. But there is a catch, at some situations you might to place a digit to the right. More formally, let us do the opposite. Instead of placing the top card in the center of the table, think of the problem as first placing the bottom-most card either to the left side or the right side of the available space. Repeat until you run out of cards.

Thus we start from the largest index of the stack of digits, and we have a decision to make: We can place it left or we can place it right. There are some details to cover. We need to remember if the number we are building in the left side is already greater than lowerBound (If it is greater than, then we can place any digit on the remaining left side, else we can only post digits greater than or equal to the specific digit position in lowerBound. Thus we will call a variable greater.

The other catch is that, when placing digits to the right side, they might be smaller than the specific digit position. Note that this means that later moves should be done in such a way that the number is guaranteed to be greater before the right-most index is reached. This will be the variable mustGreater.

In the base case, we will reach a moment in which all available positions to the left and right are used - except a last one. In this case, we will have only one digit left. We must ensure that the digit follows the rules (If the digit is not greater yet, then the digit must be greater than or equal; Also make sure that if mustGreater, the number should be greater after placing this digit.

This is where my mistake was. I missed one special case (and I blame the examples for being so simple that this was not reached). What if mustGreater is true, but we place a greater digit to the right side? This means that the mustGreater condition has already been fulfilled. But after this, notice that if one of the following digits placed to the right side is again smaller, then mustGreater must be set back to true.

yadda yadda yadda, this should translate to a dp:

struct LeftRightDigitsGame2 
{
int n;
string digits, lowerBound;
string dp[50][51][2][2];

// How to fill the remaining interval [a, b[ if:
// The left side is already greater <==> (greater)
// The right side must be precceeded by a greater number <==> (mustGreater)
string rec(int a, int b, int greater, int mustGreater)
{
string & res = dp[a][b][greater][mustGreater];
// We are using memoization, at first all contents of dp are "".
// but that is invalid because the result is either z or at least one digit.
if (res == "") {
if (a+1 == b) { // base case
// the position of the next bottom-most digit we have not
// used yet.
// We have used (a) digits for the left side and n-b digits for
// the right side.
// index 0 is the top, so do it in reverse...
int p = n - (a + n - b) - 1;


// "z" is the lex greatest string, a sort of infinite.
// Lex smallest string is the same as smallest number in this case.

// If left side it is already greater, then we can place any digit
if (greater || (digits[p] >= lowerBound[a])) {
// as long as the digit follows the mustGreater condition...
int ngreater = greater || (digits[p] > lowerBound[a]);
if ( !mustGreater || ngreater) {
res = digits[p];
} else {
res = "z";
}
} else {
res = "z";
}
} else {
res = "z";
// p: same stuff as above, really...
int p = n - (a + n - b) - 1;

// put digits[p] to the left side.
// - a is incremented.
// Make sure we only place smaller digits if left side is already "greater"
// maybe upgrade greater if we found a good case for that.
//
if (greater || (digits[p] >= lowerBound[a])) {
int ngreater = greater || (digits[p] > lowerBound[a]);
string tm = rec(a+1, b, ngreater, mustGreater);
if (tm[0] != 'z') {
res = std::min(res, string(1, digits[p]) + tm );
}
}
// put digits[p] to the right side.
// - b is decremented
// - If digits[p] is smaller than bound, then something before must be greater.
// - But if digits[p] is larger, then it can cancel a previous requirement.
// - But if some other digit in the future is smaller, we will need the requirement again
//
int nmust = (mustGreater || (digits[p] < lowerBound[b-1]) ) && (digits[p] <= lowerBound[b-1]) ;
string tm = rec(a, b-1, greater, nmust);
if (tm[0] != 'z') {
res = std::min(res, tm + string(1, digits[p]) );
}

// recursion works like that, if the decision leads to a valid case
// then you have the smallest number to fill the remaining interval...
// so append the digit you placed (to the left or the right) to
// generate the real candidate for smallest number. Take the minimimum
// out of both numbers.
}
}
return res;
}

string minNumber(string digits, string lowerBound)
{
this->n = digits.size();
this->digits = digits;
this->lowerBound = lowerBound;
string tm = rec(0, n, false, false);
return (tm[0] =='z') ? string("") : tm;
}
};

Div1 1000

Seemed interesting. My first idea was terribly wrong. Like a Max flow giving an capacity between source and a1 and between a2 and sink. And bn capacity between source and b1 and between b2 and sink. This approach was obviously wrong (because a path between b1 and a2 would count as flow sometimes). But maybe there was a solution to it?

Writer mentioned that the intended solution involves making compound nodes? Like (x,y) where x is the current position starting from a1 and y is the current position starting from b1. Maybe the writer meant something else. But this does sound like a good idea. hmnn.

Challenge phase

I discovered a code with a bug. But somehow the test case I crafted did not catch the bug. I have to investigate further once statistics re-appear (must be something silly). So to the top of thing, I lost 25 points in challenge. If I failed div1 easy as well, then the negative score would have kicked me to the worst rating in years. It is rule #1: DO NOT challenge.

Any questions?

As usual, feel free to place comments, questions, rants, opinions and corrections in this post's comments section.

What an awful sequence of terrible matches for me. No, the problem sets were all fine in these recent SRMs. But this situation is killing my rating. I am really mad this time, because without the hindsight in div1 500 I would have been in top 20. Without the failed challenge in top 150. And if I actually did the challenge correctly (the code WAS wrong) in top 100.

SRM 554 div1 hard: TheBrickTowerHardDivOne

Link to problem statement

We got infinitely many bricks of dimensions 1x1x1 and C different colors. Count the number of towers of size 2x2xX such that X is between 1 and H, inclusive such that there are at most K pairs of adjacent bricks (bricks whose faces touch) that share the same color. K is at most 7. In division 2 was H<=47 and division 1 has H<= 474747474747474747. In division 2, C is at most 4 whilst in division 1, C is at most 4747.

First solve the division 2 version

There are multiple explanations for the easy version already available. In the TCO blog I already explained my variation. I also posted my code for this problem. There are other explanations in this topcoder thread.

It is necessary to understand the div2 solution. The div1 solution is basically the same idea but very optimized. It is a dynamic programming solution. Let us make a recurrence f(c0,c1,c2,c3,k, h). c0,c1,c2 and c3 represent the colors of the bricks we placed before (In the initial case, let us say these colors are "imaginary", to represent an empty tower). It is also very convenient to imagine that c0,c1,c2,c3 are ordered in clock-wise order. k is the number of remaining opportunities for new pairs of adjacent bricks that share the same color and h is the maximum height of the towers we can place on top of the c0,c1,c2,c3 bricks.

If c0,c1,c2,c3 are not imaginary, then we can choose the "empty" tower to place on top of the bricks. Else, there are not really any bricks and the empty tower does not count.

We can then just pick some new colors (c0',c1',c2',c3') after choosing these colors, there may be new pairs of adjacent bricks with the same color (for example: c0 with c0'). The new bricks may also share colors with each other. This is where imagining the indexes 0,1,2 and 3 to be in clockwise order is very useful.

So, 0 is adjacent with 1, 1 with 2, 2 with 3, 3 with 0, x is adjacent with (x+1) % 4. Once you count the number of new pairs that are adjacent, subtract them from k and you find a new k'. The recursion step is clear now, move and call f(c0',c1',c2',c3', k', h-1). Of course, the base case is when h=0, then only the empty tower is possible. You have to add up all results for each tuple (c0',c1',c2',c3') we can generate. The initial case is f(-1,-1,-1,-1, K, H).

When H is very large

In the division 1 version, H is very large. C is also quite not small. So large that we probably want to have a O(log(H)) factor into the total complexity. To be honest, I think most coders of the lower-than-medium div1 level can tell right away that this is going to be a Linear recurrence with matrix exponentiation problem. I know I did.

The trick is that when you have a linear recurrence in your dynamic programming, you can sometimes get rid of a factor and turn it logarithmic. So when one of the input variables can be very large, maybe we are going to need logarithmic time.

But to be honest, I am lately starting to think that instead of seeing these problems as a linear recurrence thing it might be better to see them as a "count paths of a certain size in a graph" thing. This time I will start by explaining the trick to make it logarithmic.

H is the one input variable that is extremely large. So let us consider it separate from c0,c1,c2,c3 and k. Let us call each tuple (c0,c1,c2,c3,k) a state. There are many states in a instance of this problem. There is a special one (-1,-1,-1,-1, K) which is the starting state (we have not placed any tower yet). There are others... You can tell that the number of total states will be very large: 4747*4747*4747*4747*8, but we will for now imagine this is not the case (In a later section, we will try to think of ways to optimize this, many states can be classified as really the same state). So, let us assume that the number of states is going to be quite small...

Let us now turn the problem into a counting paths problem... The starting state will be state 0 (which represents (-1,-1,-1,-1,K) ). From it we will move to many other states like (0,0,1,2, K-1). There is an edge connecting the starting state and those states. Now here is the deal, each of these states may represent a valid tower (height 1). What we need is a [final] state so that the problem is about counting the number of ways to reach the final state starting at state 0. So we will make every valid tower have an edge connecting them to the final state.

If we count all the paths of size 2 between state 0 and the final one, we will have the number of towers of height 1.

Each of the states other than 0, also have many other states they can spawn from (0,0,1,2, K-1) we could move to (0,1,0,1, K-2), this is another edge. Now let us consider the path that goes through 0 -> (0,0,1,2, K-1) -> (0,1,0,1, K-2) -> final. This path has size 3, but it represents a tower of height 2. (In the bottom of the tower, place colors 0,0,1,2, then on top of them, place 0,1,0,1.).

Now, there might be cycles. For example (0,1,2,3, K) can be connected with (1,2,3,0, K) as no adjacent bricks will touch colors ( (1,2,3,0) on top of (0,1,2,3) ). Then (1,2,3,0, K) might be connected to (0,1,2,3, K) and so and so.

Consider the path 0 -> (0,1,2,3,K) -> (1,2,3,0,K) -> (0,1,2,3,K) -> (1,2,3,0,K) -> (0,1,2,3,K) -> (1,2,3,0,K) -> (0,1,2,3,K) -> (1,2,3,0,K) ... > final. This is a tower of very large height. But we can count it.

In order to count the towers of height h count the number of paths of length (X = h+1) connecting state 0 and the final state (let us assume the index of this final state is 1). Note that we need logarithmic time, height can be very large. This is a known problem. Let us assign indexes to each state, state 0 is inicial, state 1 is final, then all the remaining ones have some index... Let us make an Adjacency Matrix. Effectively the adjacency matrix already tells us the number of paths of length 1 between each pair of indexes. Raise the matrix to the X-th power... and voila! The resulting matrix will give you the number of paths of length X. It is like magic. Well, it is just an application of the linear recurrence with matrix exponentiation I mentioned up there.

With Repeated squaring, we can raise a matrix of t x t elements to H+1 power in O(t3 log(H)) time. We already assumed t (the number of states) will be quite small.

But we wanted AT MOST H

Yeah, that is a problem. We are currently the number of towers with exactly a given height, but the problem wants the towers of height at most H. I have recently dealt with this in the editorial for srm 550. Simply do this other magic trick:

The final state is where all paths of the wanted height end. But what if we added a loop to the final state. So that there was an edge connecting the final state with itself? Imagine a path 0 -> 2 -> 1 of size 2, there will be a new path 0 -> 2 -> 1 -> 1 of size 3. Now the size 3 paths will include all towers of size 2, AND all towers of size 1. The paths of size (H+1) will now include all towers of size (1<=X<H). Just add that loop to the final state and raise the adjacency matrix to H+1 power.

Optimize the states

We now know that, as long as we can reduce the number of states to something small enough so that O(t3 log(H)) is reasonable complexity, we can solve the problem

You only need to remember four colors.

The first obstacle is that C can be quite large. But is that really that much of a problem? Our states contain four colors. What is important to notice is that we do not really need to specify the actual colors we picked. We just need to specify the kind of pattern. Consider the following 3 color combinations:

Instead of representing each of those states by the specific colors (e.g: (0,0,0,0), (1,1,1,1), (2,2,2,2)...) represent them by the pattern. We can find that (0,0,0,0) is as good to represent each of those.

Let us define a format to represent a pattern. Let us index the colors not by their real values from 0 to C-1 but by the order at which they appear in c0,c1,c2,c3. For example a combination of four real colors is (4, 6, 4, 6), we can assign index 0 to color 4 (it is the first to appear) and index 1 to color 6. Then the state can be represented by (0,1,0,1). How about (102, 56, 1, 0), by using the same logic it becomes (0,1,2,3). As you can see colors (5,1,5,1) and (4,5,6,7) will also be turned into (0,1,0,1) and (0,1,2,3), respectively.

This will add complications when building the graph of states. (We need to build the graph of states. I bet the whole matrix stuff sounded easy, but we need to build it). From a pattern of colors (c0,c1,c2,c3) we need to generate each pattern of colors (c0',c1',c2',c3') that can be reached. The patterns must always follow the format we specified. We also need to find the new value of k after picking the new pattern. Note that this allows loops between the normal states in the graph. (0,1,2,3,K) is now connected with (0,1,2,3,K). Imagine placing a group of four different colors on top of other four different colors, K does not change, but the patterns are the same. There are also now multiple ways to reach the same state starting from another. For example from (-1,-1,-1,-1, K) there are C ways to reach (0,0,0,0, K-3) just pick any of the available colors and repeat it 4 times. Those changes still allow our matrix product to work, just make it so instead of an adjacency matrix we just have then umber of edges between each pair of states. This number is still the same as the number of paths of length 1.

We still got to solve the problem of generating (c0',c1',c2',c3', k') from (c0,c1,c2,c3, k). Let us say we want find the edges that can be reached from (-1,-1,-1,-1, K) (We should not represent the initial state as (0,0,0,0, K) four imaginary blocks (an empty tower) are not the same as four blocks of the same color.). Instead of iterating through each possible combination of 4 colors, we need to find the reachable patterns. And remember that the indexes are assigned by the order at which the colors appear.

I would recommend backtracking. To generate the patterns. It needs some modification. At each of the four color slots, you can either reuse a color we have already used (it has an index already) or introduce a new color. Let us say that m is the number of colors currently in use. When introducing a new color there will be C-m choices of colors to make. So we need to multiply the final number of choices by this (C-m) number. But the actual index of the new color will be m in the pattern. We picked one of (C-m) but we represent the choice by index m

When generated the states that can be reached from a state that is not the initial one. There is new a problem. The past four colors are still there and we still need to consider them. When picking the colors of the new set of 4 colors, we might have to reuse the past colors (and take it into account when finding the new value of k). So we start with some available colors. (In order to find the number of colors used in the previous state, just get the maximum of the indexes and add 1). We can still use the same backtracking, but now at the end we might have indexes greater than 3. For example, if the past state was (0,1,2,3) we could generate (4,5,6,7) as new state (use completely different colors) ). What we need is to, after generating this pattern through backtracking, fix it again. e.g: (4,5,6,7) becomes (0,1,2,3) after fixing it.

You only need to remember three colors

The first color to appear is color 0. Therefore c0 will always be 0. Great. We now only need c1,c2,c3 and k to represent a state because c0 is always 0.

Let us try to estimate the number of states. k can be from 0 to 7. c1,c2 and c3 from 0 to 3. We got 8*4*4*4=512. That is much smaller than what we had before, but still quite large for cubic complexity.

That estimation is not very good. For example, state (2,0,0, K) cannot appear, because 2 cannot be the second color index we place. So we would have to count the number of ways to have c1,c2,c3 such that:

c1 <= 1 c2 <= c1 + 1 c3 <= max(c1,c2) + 1

The value of K is also not completely independent from the colors you pick. (0,0,0,0, k=K-1) is not reachable, for example, because the maximum value of k is K, and in order to have four repeated colors the original k must have dropped by 3...

It is not trivial to do it, but I recommend just making the program that generates the states and their transitions/edges. And then printing the value t that was found for the maximum case. You will find that the real value of t is 102 (Including starting and final state). Significantly lower than 512. So small that it is possible to solve the problem with it (But it is not very easy, you will need VERY optimized matrix exponentiation).

Rotations

If you do not want to depend on low level optimization, we can compress the number of states even more. Let us consider the two following combinations of colors: (0,1,1) and (1,1,0) (We are now using the (c1,c2,c3) notation, c0 is always 0).

These two combinations added to the same value of k, are really the same thing. They represent the same state, just rotated. What we could do is, before fixing the color indexes values to follow the pattern rules, we can try each of the 4 rotations of the combination of colors. After fixing the rotation, pick the fixed combination of colors that is lexicographically smallest (this is an arbitrary rule to make sure we always pick the same kind of state for two different states that are equal after rotating them).

This will reduce the number of states from 102 to 49. It is a very relevant optimization. Which allows the solution to run in 280ms in TC servers...

More?

I am not sure if 49 is the optimal number of states (including starting state and final state). There may be more optimizations. Do you know of any? I tried things like the reverse, but it seems that the combination of the pattern rule and the rotation rule already ensure that two patterns that are the reverse of each other are counted only once.

Code

It is quite an implementation task. You need to find all the states that are reachable from (-1,-1,-1,K). In order to do that, I used a search that is basically a Breadth-first-search. But in order to generate, for a given state, each reachable state, I used Backtracking. At the time you generate the adjacency matrix, remember to multiply the factors and you will need matrix exponentiation by using squaring... And of course, use long longs to avoid overflows when multiplying possibly large numbers and it is all done modulo 1234567891 so there is an implicit requirement for Modular arithmetic. Quite a problem...

const int MAX_IDS = 49; //useful constant I found by just running the code 
// and printing the total number of states I found in
// the maximum case.
const int MOD = 1234567891;

const int MATRIXMAX = MAX_IDS;
typedef long long int64;
#define long int64

// What is not included is my matrix class.
// It is just a class that allows me to instantiate matrixes
// that can be then multiplied or raised to an exponent.
//
// matrix M(n, v) makes a n x n matrix filled with v as values.
// M.assignPower(X, n) raises matrix X to the n-th power and assigns it to M.

struct TheBrickTowerHardDivOne
{
int K, C;
struct normalState
{
int k; int c1, c2, c3;
};
normalState description[MAX_IDS];
int t;
int savedId[8][4][4][4];
long transitions[MAX_IDS][MAX_IDS];

void backtrack4(int id, int k, int *c, int *nc, int m, int p, int multiplier)
{
if (k < 0) {
return;
} else if (p == 4) {
// all chosen

// Count the number of adjacent pairs.
// The indexes are presented in a circular way:
// [0][1]
// [3][2]. This allows using (i+1)%4 for adjacency
//
for (int i=0; i<4; i++) {
if (nc[i] == nc[ (i+1) % 4]) {
k--;
}
}
if (k >= 0) {
// It is a valid transition!
// But let's normalize the list of colors....
vector<int> fnc(4, 5);
// Try each value of r, the starting index for the rotation
// for each r, "fix" the color indexes so that they follow the pattern
// rules. Then remember the lexicographically-smallest of the fixed
// results of all rotations.
//
for (int r=0; r<4; r++) {
// This rotation stuff reduces the number of states from 102 to 49
// Which in turn reduces the worst execution time from
// almost 2 seconds to 279 ms....
vector<int> rot(4);
vector<int> colorId(8, -1);
int s = 0;
for (int i=0; i<4; i++) {
int x = nc[(i + r) % 4];
if (colorId[x] == -1) {
colorId[x] = s++;
}
rot[i] = colorId[x];
}
fnc = std::min(fnc, rot);
}

// Did we catch this state before? Reuse the id.
// If this is the first time, save important daya...
int & sid = savedId[k][fnc[1]][fnc[2]][fnc[3]];

if ( sid == -1) {
sid = t++;
normalState & ns = description[sid];
ns.k = k;
ns.c1 = fnc[1];
ns.c2 = fnc[2];
ns.c3 = fnc[3];
}
// The multiplier contains the number of cases that
// end in this transition...
transitions[id][sid] += multiplier;
}
} else {
// Just pick a color for the current position.
for (int i=0; (i <= m) && (i < C); i++) {
// We can either re-use a color we've seen before
// or, if (i==m) use a new color (There are C - m more
// colors available) (i < C) because we cannot use more than C colors.
nc[p] = i;
int nm = multiplier;
if ( i == m ) { //new color, C-m choices (multiply)
nm = (int)( ( (C - m ) * (long)(nm) ) % MOD );
}
// Keep backtracking...
backtrack4(id, k - (nc[p] == c[p]), c, nc, max(m,i+1), p+1, nm);
}
}
}

void bfs(int id)
{
// A BFS through the states (tuples (k, c1,c2,c3)).
int k;
int c[4];
int m = 0;
if (id==0) {
k = K;
c[0] = c[1] = c[2] = c[3] = -1;
m = 0;
} else {
normalState & d = description[id];
k = d.k;
c[0] = 0;
c[1] = d.c1; c[2] = d.c2; c[3] = d.c3;
m = max( max(c[0], c[1]), max(c[2], c[3])) + 1;
}
int nc[4];
// backtracking works great. Let us pick the colors of the 4
// positions...
backtrack4(id, k, c, nc, m, 0, 1);
if (id != 0) {
// can finish...
transitions[id][1] = 1;
}
}

int find(int C, int K, long H)
{
this->K = K;
this->C = C;
t = 2;
// There are two special states:
// id 0 : start
// id 1 : end.
memset(savedId, -1, sizeof(savedId));
memset(transitions, 0, sizeof(transitions));
int i = 0;
while( i < t ) {
if (i != 1) {
bfs(i);
}
i++;
}
transitions[1][1] = 1;
cout << t << endl;

//Build that cute matrix.
matrix A(t, 0);
for (int i=0; i<t; i++) {
for (int j=0; j<t; j++) {
A[i][j] = transitions[i][j] % MOD;
}
}
matrix B;
B.assignPower(A, H+1);
return (int)( B[0][1] );
}
};
#undef long

Comments / complaints / questions

If you got any doubt, or have corrections to make do not hesitate to post a comment.

Rant

Thanks to blogger being a mountain of fail, I could not submit this for almost an hour after I finished it. Because it kept giving me an error with no description at all of what is the issue. I slowly tracked the error as caused by my syntax highlighted code. It is generated automatically by jEdit as usual, but it seems that blogger just did not like to cooperate today. For now, I will embed the code using pastebin.com (Which does not add scrollbars, lame).

Update: fixed.