Saturday, June 22, 2013

SRM 583 Editorial preview: RandomPaintingOnABoard

Edit: So I just noticed a flaw with my strategy. Yep, MathJax renders my math perfectly... except for RSS, if you see giberish in your feed, look closer, it is TeX code, if you want to see the formulas you'll need to read the post in the site, sorry. Will look for work-arounds

Another week, another ultra late editorial ( Just in case, SRM 582 is being written by someone else ).

This was a interesting problem. Very math-focused and I took this chance as a chance to test my idea to start to use mathjax in editorials. I found out about mathjax, thanks to blue.boy's blog and since then I liked the idea, because frankly, my ascii formula art is very poor and I think we need TeX to explain some problems. Specially something like this problem.

I had a week-long difficulty trying to solve this problem. I actually had the main idea two days ago. Thanks to Petr and google translate. But something about the google translated explanation didn't make any sense, how was he pulling a magic trick to try `2^{12}` subsets instead of `2^{21}` ? . I could only notice last night, that there was another constraint for the number of cells...

RandomPaintingOnABoard

There is a board. In each turn a random cell is picked and painted black. Each cell `(i,j)` has probability equal to board[i][j] / (sum of all values in board[][]). The values are between 0 and 9 and each column and each row have at least one non-zero value. Return the expected number of steps before each column and each row have at least one painted cell. Note that the same cell can be picked multiple times.

A note about the constraints

Constraints indicate us that the maximum width and height will be 21. There is another constraint though: The maximum total number of cells is 150. This can help us because it means that the smaller dimension will be at most 12. `13 * 12 > 150`. Since rows and columns behave in the same way in this problem, we can just assume that the width is always the smallest dimension - In case it is not, we can just rotate the board. From now on assume that `1 <= w <= 12` and `w <= h <= 21` are the width and height of the board, respectively.

A sum of probabilities

The expected value of the necessary number of steps is:

\[ \begin{align*} E(\text{number of steps}) = \sum_{i=0}^\infty {i P(\text{i steps are needed}) } \end{align*} \]

We can think of many ways to find the probability that i steps are needed. This time it is convenient to think of the probability that the objective (at least one cell in each row/column is painted) is not fulfilled after `i` steps. Then you can tell, if the objective was not fulfilled after `i-1` steps, but it is fulfilled after `i` steps, then exactly `i` steps were needed. The following is a way to develop a formula to solve the problem after finding this conclusion:

\[ \begin{align*} P\left(\text{i steps are needed}\right) &= P\left(\text{Not done after i-1 steps}\right) - P\left(\text{Not done after i}\right) \\ E\left(\text{number of steps}\right) &= \sum_{i=0}^\infty {\left(i \left( P\left(\text{Not done after i-1 steps}\right) - P\left(\mbox{Not done after i}\right) \right) \right)} \\ E\left(\text{number of steps}\right) &= \sum_{i=0}^\infty {\left(i P\left(\text{Not done after i-1 steps}\right) - i P\left(\text{Not done after i}\right) \right)} \\ E\left(\text{number of steps}\right) &= \sum_{i=0}^\infty {i P\left(\text{Not done after i-1 steps}\right)} - \sum_{i=0}^\infty {i P\left(\text{Not done after i steps}\right)} \\ E\left(\text{number of steps}\right) &= 0 * P\left(\text{Not done after -1 steps}\right) + \sum_{i=1}^\infty {i P\left(\text{Not done after i-1 steps}\right)} \\ &- \sum_{i=1}^\infty {\left(i - 1\right) P\left(\text{Not done after i-1 steps}\right)} \\ E\left(\text{number of steps}\right) &= \sum_{i=1}^\infty {\left(i P\left(\text{Not done after i-1 steps}\right) - \left(i-1\right) P\left(\text{Not done after i-1 steps}\right) \right)} \\ E\left(\text{number of steps}\right) &= \sum_{i=1}^\infty {P\left(\text{Not done after i-1 steps}\right)} \\ E\left(\text{number of steps}\right) &= P\left(\text{Not done after 0 steps}\right) + P\left(\text{Not done after 1 step}\right) + P\left(\text{Not done after 2 steps}\right) + ... \end{align*} \]

With some changes, we have:

\[ \begin{equation*} \tag{1} E\left(\text{number of steps}\right) = \sum_{i=0}^\infty {P\left(\text{Not done after i steps}\right)} \\ \end{equation*} \]

Let us try to find probabilities that the objective is not done after `i` steps.

Inclusion-exclusion

We can see that probability as the probability that, after `i` steps, at least one row or column is not painted.

It is possible to calculate the probability each single row is painted or not after `i` steps. Let `s` be the probability we pick the row (or column) in a single step. The probability we do not ever pick it after `i` steps is ` ( 1 - s )^i `. We can add these probabilities together for each column or row. However ,we would be double-counting any situation in which there are two rows, two columns or a row and a column that are not painted after `i` steps. We can calculate, for each pair of two columns or rows, the probability `t` either of them are picked in a single step. The probabilities ` (1-t)^i ` are now the probability that neither of the two columns/rows/(column and row) will be picked after two steps, so we can subtract them from the original sum. But we would be subtracting some cases more times than necessary - The cases in which there are 3 or more columns or rows that are not painted. In short, we can use the inclusion-exclusion principle:

\[ \begin{align*} P\left(\text{Not done after } i \text{ steps}\right) &= \sum_{ x \in \left( \text{rows } \cup \text{ columns} \right) } { \left(1 - P(x) \right)^i } \\ &- \sum_{ \left\{x_0,x_1\right\} \subseteq \left( \text{rows } \cup \text{ columns} \right) } { \left(1 - P\left\{x_0,x_1\right\} \right)^i } \\ &+ \sum_{ \left\{x_0,x_1,x_2\right\} \subseteq \left( \text{rows } \cup \text{ columns} \right) } { \left(1 - P\left\{x_0,x_1,x_2\right\} \right)^i } \\ &- \sum_{ \left\{x_0,x_1,x_2,x_3\right\} \subseteq \left( \text{rows } \cup \text{ columns} \right) } { \left(1 - P\left\{x_0,x_1,x_2,x_3\right\} \right)^i } \\ &+ ... \\ \\ \tag{2} P\left(\text{Not done after } i \text{ steps}\right) &= \sum_{ s \subseteq \left( \text{rows } \cup \text{ columns} \right) , \left|s\right| \text{ is odd} } \left( 1 - P\left(s\right) \right) ^ i \\ &- \sum_{ s \subseteq \left( \text{rows } \cup \text{ columns} \right) , \left|s\right| \text{ is even} } \left( 1 - P\left(s\right) \right) ^ i \end{align*} \]

For each subset \(s\) of rows and columns, we calculate its probability \( P\left(s\right) \) that any of the rows and columns are picked in a single step. Then we subtract or add \( \left( 1 - P\left(s\right) \right)^i \) to the total, depending on the parity of the number of elements in the set.

Putting it together

Combining equations \( (1) \) and \( (2) \) together:

\[ \begin{align*} E\left(\text{number of steps}\right) &= \sum_{i=0}^\infty { \left( \sum_{\text{odd }s } {\left( 1 - P\left(s\right) \right) ^ i} - \sum_{\text{even }s}{\left( 1 - P\left(s\right) \right) ^ i} \right)} \\ E\left(\text{number of steps}\right) &= \sum_{\text{odd }s } {\left( \sum_{i=0}^\infty \left( 1 - P\left(s\right) \right) ^ i \right) } - \sum_{\text{even }s}{\left( \sum_{i=0}^\infty \left( 1 - P\left(s\right) \right) ^ i \right)} \end{align*} \]

The last trick we used was to distribute the outer summations inside the inner summations. For each set \(s\) of columns and rows, we just need to calculate \( \left( 1 - P\left(s\right) \right) ^ i \). Depending on the number of elements in the set, we add or subtract the resulting value from the total.

Let us find \( \left( 1 - P\left(s\right) \right) \), that is the probability that none of the cells that belong to a column or row in the set are picked. Let \(x\) be the sum of the values outside the set, the probability is then : \( (x / T) \). Where \(T\) is the total sum of all the cells in the board. The sum of all \( \left( x / T \right) ^ i \) is the sum of a geometric progression. Therefore:

\[ \sum_{i=0}^\infty \left( \frac{x}{T} \right)^i = \frac{T}{T - x} \]

For each set of columns and rows with a sum of cells outside it equal to \( x \), if the number of elements of the set is odd then add \( \frac{T}{T - x} \) else subtract \( \frac{T}{T - x} \). That is our result. Note that when \( x = T \), the formula is invalid. This can only happen if the set cannot be picked in any step, which is not true, because any row and column will always have at least one non-zero cell.

Count the sets

Since the value that is added/subtracted to the result depends only on \( x \), the sum of the cells outside the set, we can just count the number of even and odd sets for each value of \( x \). In fact, the result is to just multiply (number of odd sets for \(x\)) - (number of even sets for \(x\)) and \( \frac{x}{T - x} \) together. So we only need to find the total subtraction between the number of odd and even sets for each \( x \).

The sum \( (w + h) \) can be as large as 140, so the number of subsets of rows and columns can be insanely high. Remember though that \(w \leq 12\), so the number of sets of columns is not going to be very high. Assume that the set of columns included in the set \( s \)is already fixed, so we only need to decide which rows to add. If we decide to include the i-th column, then we already know the total sum of the cells outside the set - It is equal to the cells in the i-th row that do not belong to any of the picked columns. We can pre-calculate this sum after picking the set of columns.

Dynamic programming

Let mask be the fixed subset of columns. We precalculate \( b_i \), for each row \( i \), as the sum of the cells not included in any of the picked columns. Find the subtraction (number of odd sets) - (number of even sets) that is possible for every value \(x\) of the sum of cells outside sets of columns and rows that already include mask. Let \( f(i, x) \) be a function that solves this problem for the first \(i\) columns. If the \(i\)-th column is added then \(x\) remains the way it is, and the parity of the number of elements in the set changes. If \(i\)-th column is not added, \(x \) is increased by \(b_i\) and the parity does not change. From this we can make a dynamic programming solution. For each \( f(i, x) \), subtract \(f(i,x)\) from \(f(i+1,x)\) and add \( f(i,x) \) to \( f(i+1,x + b_i ) \). The base case is \( f(0,0) \) which is equal to -1 (If the number of columns in mask is even) or 1 (If the number of elements in mask is odd). After running this dynamic programming approach, we can accumulate the values we have found for each \(x\) given the mask.

For a complexity analysis, we repeat a \( O\left( h \times T \right) \) dynamic programming approach for \( O\left(2 ^ w \right) \) subsets of columns. In total, \( O\left(2^w \times h \times T \right) \). Remember that \( w \leq 12 \) and by constraints \(T \leq 1350\), so this solution can run comfortably under 2 seconds in TopCoder's servers.

Solution

// Assume we already processed the prob String[], we made an
// int[][] a, with the translated cell values and we rotated
// it in case w > h, so w is now guaranteed to be at most 12:
public double expectedScoresSmallWidth(int[][] a){
int h = a.length, w = a[0].length, T = 0;

for (int i=0; i<w; i++) {
for(int j=0; j<h; j++) {
T += a[j][i];
}
}
// cnt[x] : Subtraction between the number of odd sets and even sets that
// have a sum of x for the outside cells.
long cnt[] = new long[T+1];

for (int mask=0; mask<(1<<w); mask++) {
int sign = -1;
for (int i=0;i<w;i++) {
if ( (mask&(1<<i)) != 0) {
sign = -sign;
}
}
// base case: (if mask has an even number of elements -1, else 1)
long f[][] = new long[h+1][T+1];
f[0][0] = sign;

// precalculate b[i]: Elements in column i outside of the mask subset:
int b[] = new int[h];
for (int i=0; i < h; i++) {
for(int j=0; j < w; j++) {
if ((mask&(1<<j)) == 0) {
b[i] += a[i][j];
}
}
}

for(int i=0; i<h; i++) for(int x=0; x<=T; x++) {
f[i+1][x] -= f[i][x];
if (x + b[i] <= T) {
f[i+1][x + b[i]] += f[i][x];
}
}

for (int x=0; x < T; x++) {
cnt[x] += f[h][x];
}
}

double ans = 0.0;
for (int x=0; x<T; x++) {
ans += (double)cnt[x] * T / (T-x);
}
return ans;
}

No comments :