Sunday, July 28, 2013

SRM 586 division 2 editorial preview

Another day, another editorial preview. Here are the division 2 editorials for SRM 586: https://apps.topcoder.com/wiki/display/tc/SRM+586
SRM 586 was the first match to include Python support in TopCoder. I really wanted to give it a try as it's concise style can be helpful when writing editorials. The small issue is that their wiki system doesn't support python syntax highlighting. It actually doesn't even have c++ highlighting either, I always use the Java one for c++. The Java one is not so great either, it highlights keywords inside comments.
I decided to solve this by using a new javascript library on top of the wiki system topcoder uses. In the past, I already added mathjax support as I usually need to write formulas. I also already used Modernizr so my SVG images can also have a PNG fallback. The slippery slope continues as I decided to add a syntax highlighter library..
Anyway, here is a preview of one of the explanations for fun:

The hard problem - string weight.

The weight of a string is defined as follows: For each letter present in the string, find L and R, the minimum and maximum positions of the letter. The weight is equal to the sum of all R - L for each of the letters.
Given L `(1 <= L <= 1000)`, count the number of strings of minimal weight of that length.

What makes a minimum-weight string?

Before we get into counting these strings, we should understand their properties.
Like the first two examples show. In cases with small L, there are ways to make letters contribute a flat 0 to the weight. This is the smallest possible contribution. By definition: `(R >= L)` therefore: `(R - L >= 0)`. Since we want to minimize the total weight, we should try to make most if not all letters contribute 0.
It is possible to make all letters cost 0. Simply make all letters in the string unique: "acwh". Then for each letter: `(R = L)`. However, we only have 26 available letters. Therefore, we can only make each letter in the string unique when `(L <= 26)`. In that case, we have to use `L` different letters.
When `(L > 26)`, things will change. Let us begin with the smallest of such cases: `(L = 27)`:
  • If all letters were equal, then for this letter, `(R = 26)` and `(L = 0)`. The total weight is 26.
  • Perhaps it is better to have as many cost-0 letters as possible? As many letters that appear only once. If we use 25 unique letters, there are two remaining positions that must be used by a repeated letter. We are in a interesting situation: 25 letters contribute exactly 0, regardless of their position and we have two positions used by the same letter. We can pick L and R as the two positions used by this letter. The final weight will be `R-L`, so we want them to be as close as possible, let us make them consecutive , this means that the cost is 1. As long as 25 letters appear exactly once and the remaining letter appears twice but in consecutive positions. The minimum string weight is 1.
But how about larger cases? For L = 40. We can try the same logic, make 25 letters appear exactly once and a final letter appear exactly 15 times. The positions of the 25 letters do not matter, so we only care about L and R for the letter that appears 15 times. We should make the difference `(R - L)` as small as possible. Once again, it is most convenient to make the 15 letters appear consecutively. As long as we do this, the result will be: 14. If L is the first position, then the 15 positions of the letter will be: `(L, L+1, L+2, ... , L+13, R = L+14)`.
But we should be critical, although this strategy seems to minimize the total weight, is it the only strategy that manages to do it? In the previous idea, we had 25 letters that appear exactly once and one letter that appears 15 times. However, what about a small change? What about having 24 letters that appear once, 1 letter that appears twice and one letter that appears 14 times? The sum is still 40. Once again, we can ignore the positions of the 24 letters (0 weight). The positions of the other letters are also not important as long as the repetitions are consecutive. If we make sure they are consecutive, the cost of the letter that appears twice will be 1: `(L, R = L+1)`. For the other letter, the cost is 13: `(L, L+1, L+2, ..., L+12, R = L+13)`. The total weight is interestingly, 14: 13 + 1. This is also the weight we assumed to be minimum!
It seems that making sure 25 letters appear exactly once is not necessary to minimize the weight. However, making sure that repetitions are always consecutive still seems important. Let us put some detail.
Assume that the only thing we know about a sequence is that it has 26 different letters and repetitions are always consecutive. Let us calculate the weight of the string. Let us say the first letter appears `K_1` times, then it will fill positions `(0, 1, ... , K_1-2, K_1 - 1)`. It will contribute `(K_1 - 1)` to the weight. The second letter appears `K_2` times: `(K_1, K_1+1, ... , K_1 + K_2 - 2 , K_1 + K_2 - 1)` and therefore contributes `(K_2 - 1)` to the weight. And so and so. The total weight will be: `(K_1 - 1) + (K_2 - 1) + ... + (K_26 - 1)`. The total amount of letters is `L` , therefore: `(K_1 + K_2 + ... + K_26 = L)`. If we tweak things, you can find that the total weight will be: `(L - 26)`. This formula does not depend on the actual number of times each letter appears in the string. And it seems to be the minimum. This also shows that not using all 26 available letters will only yield a higher weight. E.g: if we use only 20 of the available letters the weight will be: `(L - 20)`.
In order to show that this is the minimum, let us prove that not keeping the repetitions consecutive will always yield a larger weight. If we wish to place `K` letters `a` in the string, making them consecutive will make it contribute `(K-1)` to the weight. If they are not consecutive, then there is at least one other letter `b` that will be between the two extremes of the letter in question. This will increase the weight contributed by letter `a`. The weight contributed by letter `b` will either stay equal (if all its instances are consecutive and between the `a` letters) or also worsen. This means that keeping all repetitions consecutive will yield the minimum weight. That this weight is `(L - 26)` and that any other strategy will increase the weight.

How to count the minimum-weight strings

Simple case

If `(L <= 26)`, then each letter in the string must be unique. So what is the cost of making a string of `L` unique letters, where there are 26 available letters in the alphabet? For the first position, we have 26 options. The second position has 25 options (we cannot use the same letter as the first position). Third position has 24 options and so and so. Therefore, the number of strings is:
`26 * 25 * 24 * ... * (26 - L + 1)`

L > 26

In this case, we need to count the number of strings that follow some properties:
  • Each letter in the alphabet (26 options) must appear at least once in the string.
  • All the repetitions in the string must be consecutive.
How can we count the number of strings?

Dynamic programming

Let us define `f(a, L)` as the number of ways to build a string of length `L` that follows those properties using an alphabet of `a` different letters.
A base case: `(L = 0)`, all the `a` letters must be inserted at least once. If `(a > 0)`, then there is a letter in the alphabet that we cannot use, because there are 0 spaces left, the result is 0. If `(a = 0)`, then there are no more letters to include and no spaces, the total numbers to do it is 1.
In another case, `(L > 0)` and `(a = 1)`, there is only one letter available, we better fill all the `L` positions with it. There is only one way to do this.
When `(L > 0)` and `(a > 1)` , then we have `a` choices for the letter to use in the first positions of the string. The number of those positions can be any `i` : `(1 <= i <= L)`. After using this letter in the first `i` positions, we still need to fill the remaining positions. There are `L-i` remaining positions and we cannot use the letter that we already used. So the partial result is: `a * f(a - 1, L - i)`. We should add together the results for each i, so:
`f(a > 1, L > 0) = sum_(i = 1)^(i <= L)( a * f(a - 1, L - i) )`
There are 27 possible values for `a` and `O(L)` values for `L`. Each time we call the function, we might need `O(L)` steps to perform the sum. If we use dynamic programming or simple memoization to ensure that each argument combination for the function is evaluated at most once, then the algorithmic complexity will be: `O(A * L * L)` where `A` is the size of the alphabet, 26 in this case. This complexity is low enough for the time limit.
static const int MOD = 1000000009;

long dp[27][1001];
long onceConsecutive(int a, int L)
{
    long & res = dp[a][L];
    if (res == -1) {
        if (L == 0) {      // base case
            res = (a == 0);
        } else if (a == 1) {
            res = 1;
        } else {
            // calculate the sum. (We need to use modular arithmetic)
            res = 0;
            for (int i=1; i<=L; i++) {
                res += (a * onceConsecutive(a-1, L-i) ) % MOD;
            }
            res %= MOD;
        }
    } 
    return res;
}

int countMinimums(int L)
{
    memset(dp, -1, sizeof(dp));
    long res = 1;
    if ( L <= 26 ) {
        // The simple case:
        //26 * 25 * 24 * ...
        for (int i=0; i < L; i++) {
            res = (res * (26 - i)) % MOD;
        }
    } else {
        // each letter must appear at least once, all instances
        // of each letter must be consecutive.
        res = onceConsecutive(26, L); //Use the recurrence relation
    }

    return res;
}

Math

An alternative solution. We will separate the decision in two parts: a) The order in which the 26 letters are used and b) The number of positions each letter (In the order picked by a) ) will take in the string.
The order of letters is a permutation. The total number of ways to pick the order is: `26!`. Where `n!` is notation for the factorial of `n`.
Now that the order of the letters is picked, we decide the number of positions each letter will take. The total number of positions is `L` . How many solutions are there to the equation: `(K_1 + K_2 + ... K_25 + K_26 = L)` where each `K_i > 0`.
As you will see later, that question is easier to solve when `K_i >= 0`. We can do the conversion though Let `r_i = K_i - 1`, each `r_i` is the additional number of times the letter is picked.
`K_1 + K_2 + ... + K_25 + K_26 = L`
`r_1 + 1 + r_2 + 1 ... + r_25 + 1 + r_26 + 1 = L`
`r_1 + r_2 + ... + r_25 + r_26 = L - 26`
Since each `r_i` can be 0, we can use the following trick: We have to place `(L - 26)` stones in 26 boxes. We are able to leave some boxes empty. We can picture this as placing separators between some of the stones. For example:
oooo|ooo|oo||oo|o
Is the same as placing 4 stones in the first box, 3 stones in the second, 2 stones in the third, 0 stones in the fourth, 2 stones in the fifth box and 1 stone in the sixth box.
For 26 boxes, there will be 25 separators. How many ways to combine the separators and stones exist? There are `(25 + L - 26)` positions of which we pick 25 for the separators. The value is: `C(L - 1, 25)`, where `C()` is the binomial coefficient
.
The final result is therefore: `26! * C(L - 1, 25)`.
# define function f() as the factorial:
from math import factorial as f

def C(n, k):
    return f(n) / f(k) / f(n - k)
    
#Calculates weight without the modulo
def Weight(L):
    if L <= 26:
        # 26 * 25 * 24 * ... 
        # the same as saying 26!  / ( (26 - L)! )
        return f(26) / f(26 - L)
    else:
        # 26! * C(L-1, 25)
        return f(26) * C(L - 1, 25)

class StringWeightDiv2:
    def countMinimums(self, L):
        return Weight(L) % 1000000009 # Takes the remainder from the result

No comments :