In January I made a post about SRM 451's BrickPuzzle. I think it is the hardest problem I released yet and I am also in love with it because of the knowledge its solution can transmit. Yes, I am exaggerating so you are more interested in reading this post. Can you really blame me?
That day of January, I promised that the second part of the explanation would come the next day. I obviously lied.
There are many ways to solve a hard problem. One that turns out to be very good is to first solve an easier problem. I will talk about a problem I wrote recently in an attempt to teach dp stuff. (To be honest, I am still very skeptical of the ICPC and programming camps as a whole, but that is another topic). This problem was then submitted to spoj and it is turning out to be very hard. Everybody who solved it to this point is not using the intended solution. Unlike BrickPuzzle, Max2214's is written in a way that turns out to be more permissive to heuristics and that sort of thing. But we will pretend that we do not know that, and will instead solve MAX2214 using dynamic programming. You will see that once you know the dp solution to MAX2214, you also know the solution to BrickPuzzle, but BrickPuzzle also has implementation complications...
In MAX2214, we want to know the maximum number of blocks we can place on a grid without making the blocks overlap with each other or with cells marked with X. Two kinds of blocks are allowed: 2x2 and 1x4.
Easy bitmask dp
The cynical in us, tend to view bitmask dp problems as very easy problems. Mostly because they are just dp problems (which become easy after you seen a good amount of them), yet their small constraints make it obvious that they are the intended solution.
Not in this case, if you committed to the idea of solving MAX2214 using dynamic programming and bitmasks, you would inevitably calculate the exponential complexity that is needed for time and memory and learn that it is impossible to solve the problem like that.... Except that it is. So let us ignore that part for now.
There are many ways to solve a hard problem. One that turns out to be very good is to first solve an easier problem. Before solving MAX2214, solve FourBlocks. A topcoder problem I wrote back in the dark ages of topcoder. The time in which editorials were not paid and nobody really put any effort in them in the wiki. So the editorial for SRM444 is very bad. But I have found some stuff about FourBlocks in other places:
MAX2214, again To summarize the "easy" bitmask dp
Let us define a recurrence f(x,y, mask). We have already assigned the contents of the cells in the first y rows and the first x cells in row #y (0-indexed). The mask is a bitmask the first x bits tell us about the contents of the x first cells of row (y+1). The remaining bits tell us about the contents of the remaining cells in row y. A bit is turned on if and only if the related cell contains part of a 2x2 block.
In the image, we can see what x and y are. The mask represents the blue cells. The orange cells have already been decided. We want to find out the maximum number of new blocks in the cells that are not orange.
There are three decisions we can take regarding cell (x,y):
- Place a 2x2 block in such a way that cell (x,y) is its bottom-left corner. This requires us that the cells that will be occupied by the 2x2 block are not used by a cell marked with X or by a block we already placed. The maximum number of new blocks we can get is 1+f(x+2,y, new mask). Because we can just skip the next two cells as we already know they will contain a block. The new mask will now contain two consecutive turned on bits.
- Place a 1x4 block. Not a problem, just make sure the next 4 cells are usable, then skip to f(x+4,y, mask).
- Place nothing in this cell, we skip to f(x+1, y, new mask). The new mask might have one less of a turned on bit, in case the cell at (x,y) was already busy..
The image shows the sub-case f(x+2, y, new mask).
Some base cases:
- When we reach f(x = w, y, mask), then we have already used all the cells in the row, we can just skip to the next row. The result is f(x = 0,y + 1, mask).
- When we reach f(0, y = h, mask) then we have reached the end of the whole grid. The result is 0 (That is the maximum number of blocks we can add to the 0 remaining cells.
I think we can be very sure that this approach will give us the correct answer. But is it fast enough? You already read somewhere that it is not. The mask is a bitmask and can have any combination of w bits. That means 2w different values for the mask. There are w * h pairs (x,y) so the complexity is O(w * h * 2w ). In spoj, the width is at most 22 and the height is at most... 52. This certainly sounds too heavy, even for the 15 seconds per case limit.
Is it really so slow?
The key thing to notice is that there really are not that many different ways to set the mask. If you remember, the only bits we remember are pairs of consecutive bits that come from the 2x2 blocks.
This needs us to pay attention to the way the bit mask works. If at some point we add a 2x2 block using (x,y) as the bottom-left corner of the 2x2 block, it always adds two consecutive bits to the bitmask.
Let us talk about the case when we find a turned on bit at (x,y). We can just assume that whenever we find this bit, the next bit will also be turned on, so we can skip them both and not add a block to those positions. By doing this, the mask will lose both bits.
The image shows that transition.
What we can tell is that mask will never contain a lone turned on bit. 110001101102 and 000011011112 are valid values for the mask. 101001101102 and 000010000012 are not.
Let us count the number of valid masks of length w. For w=1, there is only one valid mask: 0. For w=2, 00 and 11. For w=3: 000, 011 and 110. For w=4: 0000, 0011, 0110, 1100, 1111.
You will eventually notice that the results are all fibonacci numbers. It is not a coincidence, you can verify that a good formula is f(w) = f(w-1) + f(w-2), because you add either "11" or "0" to a previous valid mask. Let us say fib(w) gives the number of masks. The complexity of our algorithm has become O(h * w * fib(w)). You will see that for w=22, the difference between 222 and fib(22) is very large. So our recurrence turned out to be much faster than we thought. In fact, it is fast enough for the 15 seconds limit. But there is a catch.
Memoization vs. iterative dp, not the same thing
This is the key to the problem. If we implement the recurrence using memoization, then only valid values of mask will be generated and thus the speed will be as predicted O(h * w * fib(w)). The catch is that if we implement it using a mem[h][w][fib(w)] array, it will go well beyond the 256 MB limit. If we use a hash table or something like that, we would be adding enough overhead that our fast solution would not be that fast anymore.
So, in order to save memory, we can do iterative dynamic programming. At every point, when calculating f(x,y) you need only keep in memory the results for row y and row y+1. Thus we can use the memory of 2 rows instead of 52 rows. This will fix the memory problem... But when doing iterative dp, you cannot enjoy cropping invalid states. So we will not only generate only the valid values of mask if we do an iteration from 0 to 2w-1. We are back to exponential time.
Have we reached a dead end? It sure seems so. Iterative dp allows us to optimize memory, but we can't crop invalid states. Memoization allows us to crop invalid states, but we need cannot optimize more memory usage. We need the best of both worlds and it is completely possible to do it. Who says it is bad to get greedy? (not the algorithm category). Sometimes you can have your cake and eat it too. Let me explain the work around tomorrow (or maybe in 7 months, who knows?.