Another day, another editorial preview. Here are the division 2 editorials for SRM 586:
https://apps.topcoder.com/wiki/display/tc/SRM+586
SRM 586 was the first match to include Python support in TopCoder. I really wanted to give it a try as it's concise style can be helpful when writing editorials. The small issue is that their wiki system doesn't support python syntax highlighting. It actually doesn't even have c++ highlighting either, I always use the Java one for c++. The Java one is not so great either, it highlights keywords inside comments.
I decided to solve this by using a new javascript library on top of the wiki system topcoder uses. In the past, I already added
mathjax support as I usually need to write formulas. I also already used
Modernizr so my SVG images can also have a PNG fallback. The slippery slope continues as I decided to add a
syntax highlighter library..
Anyway, here is a preview of one of the explanations for fun:
The hard problem - string weight.
The weight of a string is defined as follows: For each letter present in the string, find L and R, the minimum and maximum positions of the letter. The weight is equal to the sum of all
R - L for each of the letters.
Given
L `(1 <= L <= 1000)`, count the number of strings of minimal weight of that length.
What makes a minimum-weight string?
Before we get into counting these strings, we should understand their properties.
Like the first two examples show. In cases with small
L, there are ways to make
letters contribute a flat 0 to the weight. This is the smallest possible contribution. By definition:
`(R >= L)` therefore: `(R - L >= 0)`. Since we want to minimize the total weight, we should
try to make most if not all letters contribute 0.
It is possible to make all letters cost 0. Simply make all letters in the string unique: "acwh". Then
for each letter: `(R = L)`. However, we only have 26 available letters. Therefore, we can only
make each letter in the string unique when `(L <= 26)`. In that case, we have to use `L` different letters.
When `(L > 26)`, things will change. Let us begin with the smallest of such cases: `(L = 27)`:
- If all letters were equal, then for this letter, `(R = 26)` and `(L = 0)`. The total weight is 26.
- Perhaps it is better to have as many cost-0 letters as possible? As many letters that appear only once. If we use
25 unique letters, there are two remaining positions that must be used by a repeated letter. We are in a interesting situation: 25 letters contribute exactly 0, regardless
of their position and we have two positions used by the same letter. We can pick L and R as the
two positions used by this letter. The final weight will be `R-L`, so we want them to be as close as possible,
let us make them consecutive , this means that the cost is 1. As long as 25 letters appear exactly once and
the remaining letter appears twice but in consecutive positions. The minimum string weight is 1.
But how about larger cases? For
L = 40. We can try the same logic, make 25 letters appear exactly once and a final
letter appear exactly 15 times. The positions of the 25 letters do not matter, so we only care
about
L and
R for the letter that appears 15 times. We should make the difference `(R - L)`
as small as possible. Once again, it is most convenient to make the 15 letters appear consecutively. As long
as we do this, the result will be: 14. If
L is the first position, then the 15 positions of
the letter will be: `(L, L+1, L+2, ... , L+13, R = L+14)`.
But we should be critical, although this strategy seems to minimize the total weight, is it the
only strategy that manages to do it? In the previous idea, we had 25 letters that
appear exactly once and one letter that appears 15 times. However, what about a small change? What about having
24 letters that appear once, 1 letter that appears twice and one letter that appears 14 times? The sum is still 40. Once
again, we can ignore the positions of the 24 letters (0 weight). The positions of the other letters
are also not important
as long as the repetitions are consecutive. If we make sure they
are consecutive, the cost of the letter that appears twice will be 1: `(L, R = L+1)`. For the
other letter, the cost is 13: `(L, L+1, L+2, ..., L+12, R = L+13)`. The total weight is interestingly,
14: 13 + 1. This is also the weight we assumed to be minimum!
It seems that making sure 25 letters appear exactly once is not necessary
to minimize the weight. However, making sure that repetitions are always consecutive still seems important. Let us put some detail.
Assume that the only thing we know about a sequence is that it has 26 different letters and repetitions are always consecutive. Let us calculate the weight of the string.
Let us say the first letter appears `K_1` times, then it will fill positions `(0, 1, ... , K_1-2, K_1 - 1)`. It will
contribute `(K_1 - 1)` to the weight.
The second letter appears `K_2` times: `(K_1, K_1+1, ... , K_1 + K_2 - 2 , K_1 + K_2 - 1)` and therefore
contributes `(K_2 - 1)` to the weight. And so and so.
The total weight will be: `(K_1 - 1) + (K_2 - 1) + ... + (K_26 - 1)`.
The total amount of letters is `L` , therefore: `(K_1 + K_2 + ... + K_26 = L)`. If we tweak things,
you can find that the total weight will be: `(L - 26)`. This formula does not depend on
the actual number of times each letter appears in the string. And it seems to be the minimum. This
also shows that not using all 26 available letters will only yield a higher weight. E.g: if we use
only 20 of the available letters the weight will be: `(L - 20)`.
In order to show that this is the minimum, let us prove that not keeping the repetitions consecutive will
always yield a larger weight. If we wish to place `K` letters `a` in the string, making them
consecutive will make it contribute `(K-1)` to the weight. If they are not consecutive, then there is
at least one other letter `b` that will be between the two extremes of the letter in question. This will
increase the weight contributed by letter `a`. The weight contributed by letter `b` will either stay equal (if all its
instances are consecutive and between the `a` letters) or
also worsen. This means that
keeping all repetitions consecutive will yield the minimum weight. That this weight is `(L - 26)` and
that any other strategy will increase the weight.
How to count the minimum-weight strings
Simple case
If `(L <= 26)`, then each letter in the string must be unique. So what is the cost of
making a string of `L` unique letters, where there are 26 available letters in the alphabet? For
the first position, we have 26 options. The second position has 25 options (we cannot use
the same letter as the first position). Third position has 24 options and so and so. Therefore,
the number of strings is:
`26 * 25 * 24 * ... * (26 - L + 1)`
L > 26
In this case, we need to count the number of strings that follow some properties:
- Each letter in the alphabet (26 options) must appear at least once in the string.
- All the repetitions in the string must be consecutive.
How can we count the number of strings?
Dynamic programming
Let us define `f(a, L)` as the number of ways to build a string of length `L` that follows those properties
using an alphabet of `a` different letters.
A base case: `(L = 0)`, all the `a` letters must be inserted at least once. If `(a > 0)`, then
there is a letter in the alphabet that we cannot use, because there are 0 spaces left, the result is 0. If `(a = 0)`, then
there are no more letters to include and no spaces, the total numbers to do it is 1.
In another case, `(L > 0)` and `(a = 1)`, there is only one letter available, we better fill
all the `L` positions with it. There is only one way to do this.
When `(L > 0)` and `(a > 1)` , then we have `a` choices for the letter to use in the first positions
of the string. The number of those positions can be any `i` : `(1 <= i <= L)`. After using this
letter in the first `i` positions, we still need to fill the remaining positions. There are `L-i` remaining
positions and we cannot use the letter that we already used. So the partial result is: `a * f(a - 1, L - i)`. We should add together
the results for each
i, so:
`f(a > 1, L > 0) = sum_(i = 1)^(i <= L)( a * f(a - 1, L - i) )`
There are 27 possible values for `a` and `O(L)` values for `L`. Each time we call the
function, we might need `O(L)` steps to perform the sum. If we use
dynamic programming
or simple
memoization to ensure that each argument combination for the function is evaluated
at most once, then the algorithmic complexity will be: `O(A * L * L)` where `A` is the size of the alphabet, 26 in this case. This complexity
is low enough for the time limit.
static const int MOD = 1000000009;
long dp[27][1001];
long onceConsecutive(int a, int L)
{
long & res = dp[a][L];
if (res == -1) {
if (L == 0) { // base case
res = (a == 0);
} else if (a == 1) {
res = 1;
} else {
// calculate the sum. (We need to use modular arithmetic)
res = 0;
for (int i=1; i<=L; i++) {
res += (a * onceConsecutive(a-1, L-i) ) % MOD;
}
res %= MOD;
}
}
return res;
}
int countMinimums(int L)
{
memset(dp, -1, sizeof(dp));
long res = 1;
if ( L <= 26 ) {
// The simple case:
//26 * 25 * 24 * ...
for (int i=0; i < L; i++) {
res = (res * (26 - i)) % MOD;
}
} else {
// each letter must appear at least once, all instances
// of each letter must be consecutive.
res = onceConsecutive(26, L); //Use the recurrence relation
}
return res;
}
Math
An alternative solution. We will separate the decision in two parts: a) The order
in which the 26 letters are used and b) The number of positions each letter (In the order picked by a) ) will take in the string.
The order of letters is a
permutation. The
total number of ways to pick the order is: `26!`. Where `n!` is notation for the
factorial of `n`.
Now that the order of the letters is picked, we decide the number of positions each letter will
take. The total number of positions is `L` . How many solutions are there to the equation:
`(K_1 + K_2 + ... K_25 + K_26 = L)` where each `K_i > 0`.
As you will see later, that question is easier to solve when `K_i >= 0`. We can do the conversion though
Let `r_i = K_i - 1`, each `r_i` is the
additional number of times the letter is picked.
`K_1 + K_2 + ... + K_25 + K_26 = L`
`r_1 + 1 + r_2 + 1 ... + r_25 + 1 + r_26 + 1 = L`
`r_1 + r_2 + ... + r_25 + r_26 = L - 26`
Since each `r_i` can be 0, we can use the following trick: We have to place `(L - 26)` stones in 26 boxes. We
are able to leave some boxes empty. We can picture this as placing separators between some of the stones. For example:
oooo|ooo|oo||oo|o
Is the same as placing 4 stones in the first box, 3 stones in the second, 2 stones in the third, 0 stones
in the fourth, 2 stones in the fifth box and 1 stone in the sixth box.
For 26 boxes, there will be 25 separators. How many ways to combine the separators and stones exist? There are
`(25 + L - 26)` positions of which we pick 25 for the separators. The value is: `C(L - 1, 25)`, where `C()` is the
binomial coefficient
.
The final result is therefore: `26! * C(L - 1, 25)`.
# define function f() as the factorial:
from math import factorial as f
def C(n, k):
return f(n) / f(k) / f(n - k)
#Calculates weight without the modulo
def Weight(L):
if L <= 26:
# 26 * 25 * 24 * ...
# the same as saying 26! / ( (26 - L)! )
return f(26) / f(26 - L)
else:
# 26! * C(L-1, 25)
return f(26) * C(L - 1, 25)
class StringWeightDiv2:
def countMinimums(self, L):
return Weight(L) % 1000000009 # Takes the remainder from the result