Wednesday, December 28, 2011

SRM 528: System tests delayed again

Update: Results are up. I got 104 new rating points, it is still less than I lost last match...

I think both my solutions are correct, too bad the system tests had to be delayed. This time I had a lot of spare time I tried to use to solve the 1000 points problem, with no success.

Div1 500: SPartition
Given a string of up to 40 characters that are 'o' or 'x', count the total number of ways to partition the string in two subsequences, such that the two subsequences are equal.

The low constraint sort of led me to the solution. Another important aspect is the small alphabet. Usually, that the string can only have two characters means there is something that really depends on the characters. First of all, for the two strings to be equal, they need to have an equal number of characters, so the strings will have length n/2. More so, they need to have an equal number of 'o' and 'x' characters. Thus let us say nx is the half of the total number of 'x' in the original string and no is the half of the total number of 'o's. The subsequences will have nx 'x' characters and no 'o' characters.

Let the alphabet come into play. How many strings of nx 'x's and no 'o's exist? You will see that with n=40, the maximum result would be 10 'x's and 10 'y's in each half. That gives Binomial(20, 10) different strings that can form the subsequence. That number is actually small.

Second, given the string that we want the subsequences to be equal to, let us count the total number of ways to partition the original string so that both subsequences follow the condition. Let the wanted string be want. The solution is a simple dynamic programming algorithm. Let us define a recurrence F(a,b), where we have already taken the first a+b characters of the original string. a characters were correctly assigned to the first subsequence and b characters to the second subsequence. Note that if s[a+b] is equal to want[a], then we can assign the (a+b)-th character of the original string to the first subsequence. Similarly, if s[a+b] is equal to want[b], we can assign that character for the second subsequence. Finally, if both subsequences already have n/2 characters, then there is exactly one valid way to complete them. The recurrence is thus :


F(a,b) {
res = 0
if (a==n/2 && b==n/2) {
return 1
}
if ( want[a] == s[a+b] ) {
res += F(a+1,b)
}
if ( want[b] == s[a+b] ) {
res += F(a+1,b)
}
}


The dynamic programming algorithm is n^2 and the rest is O(C(n/2, n/4) ) this solution is fast enough.

struct SPartition 
{
string s;
int n;

char want[20];

long long mem[21][21];

long long rec(int a, int b)
{
long long & res = mem[a][b];
if (res == -1) {
if ( (a==n/2) && (b==n/2) ) {
//final
res = 1;
} else {
res = 0;
if ( (a < n/2) && s[a+b]==want[a] ) {
res += rec(a+1,b);
}
if ( (b < n/2) && s[a+b]==want[b] ) {
res += rec(a,b+1);
}
}
}
return res;
}

// Iterate all strings of nx 'x' characters and no 'o' characters.
// for each of them, call the dynamic programming algorithm to
// count the number of ways to find that string.
long long backtrack(int p, int nx, int no)
{
if (nx + no == 0) {
//done
//solve the dp.
memset(mem,-1,sizeof(mem));
return rec(0,0);

} else {
long long res = 0;
if(nx) {
want[p] = 'x';
res += backtrack(p+1, nx-1, no);
}
if(no) {
want[p] = 'o';
res += backtrack(p+1, nx, no-1);
}
return res;
}
}

long long getCount(string s)
{
this->s = s;
n = s.size();
int nx = count(s.begin(), s.end(), 'x');
int no = n - nx;
if ( (nx % 2 == 1) || (no % 2 == 1) ) {
return 0;
}
nx /= 2;
no /= 2;
return backtrack(0, nx, no);
}
};


Div1 250:
We have up to 50 "eels" of length eelLength[i] each. We can make at most maxCuts of integer length. What is the maximum amount of length-10 eels we can have?

During the match, I used dynamic programming, but the following greedy approach works too.

Notice that usually each cut you make will generate one eel of length 10. There is one exception and it is when the length of the original eel is a multiple of 10. In that case, you can make (length/10 - 1) cuts to get length/10 parts. That means that for each eel with length that is a multiple of 10, you have the chance to save up one cut.

Thus the optimal strategy is to first try to cut all the eels that are multiples of 10. And then focus on the remaining eels. This way, we might save up some cuts.

After the order has been decided, for each eel that is a multiple of 10, find if it is possible to split it into exactly length/10 parts. If it is not cut min(maxCuts, length/10) parts, else add the extra eel.

Edit: Actually, do notice that when picking the multiples of 10 first, you also would like to pick the lowest values first, else you might consume your cut limit by cutting a large multiple of 10 and then become unable to gain the extra cuts in the other numbers.


PS: I have no idea what an eel is.

Sunday, December 25, 2011

It's unnamed holiday day

Merry whatever you celebrate to whoever is reading.

Don't forget what Christmas is all about: http://www.calamitiesofnature.com/archive/?c=470

Monday, December 19, 2011

The minifig problem


Did I ever mention I am an AFOL, which is french for Adult Fan of LEGO?. Well, I am. You know, I have a problem with LEGO and it is that just in 2010 they have begun selling crack inside of blind bags: Collectible mini-figures. As you can expect from LEGO, the minifigure bags are not inexpensive, they are also not evenly distributed. They come in boxes of 60 bags that each contains a specific distribution of minifigure bags. The random distribution in series 6 (Yes, already 96 different figures have been released in the last two years) is in the picture.

So, a common question is what is the expected number of random bags I would need to buy before I get a complete set. And since some minifigs are worth having many of while other are not really worth having, another more common question is what is the expected number of bags you need to get , say 5 romans.

S, how to solve it? Let us first begin with a slightly simpler problem. In some forums some people were wondering what is the probability to get a complete set after you buy a given number of bags. Like 60 or 80. This is slightly easier to understand and helps explaining the expected one.

Please note that the queries can be quite varied, 10 romans, a complete set. 2 aliens and 2 space girls. So we will need to represent it in a special way. I chose to represent the input as an array of N elements. Where N is the number of different figures. To each figure, I assigned a id and a weight (according to the random distribution). Thus the array of N elements, will contain in its i-th position the number of minifigs of kind i you want to get.

Just about 50% of the regular readers of the blog already know the answer they would use to this during a contest - dynamic programming. But let me explain it. For dynamic programming to work, we need a recurrence relation. Let us say we have a function F(wanted, bags). wanted is the array of values that I described, bags is the number of bags we will buy. F returns the probability in that setting.

The recurrence works as follows. First we will work the easiest cases, imagine if wanted had a value of 0 for each minifig kind. Then we want nothing... No matter how many bags we get, we already have "nothing". Thus the probability is 1. The second-easiest case is when we will buy 0 bags, then if we do want one figure, the probability is 0, we cannot get any figure by buying 0 bags.

In other case, we will get a new figure i. This event will have a probability p. The number of bags, will be reduced by 1. If minifigure i had a non-zero amount in wanted, then that means it is one of those figures we wanted, but we can then think that the number of minifigures of that kind that we need will be reduced by one. We can thus make a new vector wanted' that will be identical to wanted but with one less of that minifigure. This is the part where the recurrence comes into play. We can call F(wanted', bags-1) and we will know the probability to reach the object given that we got minifig i in this current step. In the case the minifig was not wanted, we do not modify the vector, thus wanted' = wanted. If we multiply p * F(wanted', bags-1), we get the probability to get that minifigure in this step and complete the wanted set. The total sum p_i * F(wanted'_i, bags-1) for all is will be the solution to the recurrence.

The problem with this is the high number of states. We only have 16 kinds of minifigs which allows us to use such a slow algorithm. But still, note that the number of states of the function is equal to (number of different ways to have the wanted vector) x (number of bags + 1). If the initial wanted vector is (1,1, ... 1) (A complete set), the number of ways is 2N - there are two choices 0,1 for each minfigure. So, if we made (10,10,...) the first vector, the result would be 11N, thus we need to be careful here.

The expected
A slightly faster way to deal with the problem is asking for the expected number of ways you need to buy before getting a certain set. We will use exactly the same wanted vector. But this time the recurrence is different and there is no bags argument. A base case is when wanted is full of zeros, again we have completed the set, but this time the result G(wanted) is the expected number of bags we need to reach the goal. If the goal has already been reached, the expected value is 0.

The recurrence is as follows: Once again, we will have a minifig i and a probability p to get it. Even more, once again the wanted vector might get modified or not. This time G(wanted') will yield the expected number of bags after the wanted vector was modified. Note that since we already bought a bag, the returned expected number shall be increased by one. Thus we will have the sum: ( p_i * (1 + G(wanted_i)) ) for each minifig i. This can be converted to 1 + (sum of p_i*G(wanted_i) ).

However, this time there is an extra complication. Remember there is a possibility wanted_i is equal to the original wanted. Always remember: the key rule of dynamic programming is that the recurrence should not have cycles. We do not want cycles in the recurrence. The solution is to think about equations. At the end, we will have a value sump as a sum of all the parts of the recurrence that are not factors of G(wanted) and a px equal to the probability that the vector stays the same. We have:

G(wanted) = sump + px * G(wanted).

Which can be converted to:

G(wanted) = sump / (1 - px)

But this recurrence will no longer contain cycles and thus it solves the problem.

Implementing This time I decided to use python, I would probably need some flexibility in the input of the problem to try different combinations of wanted minifigures. It was a little hard getting used to dict() and its shenanigans like not admitting a list as key. Oh well.

# Minifigs series 6 distribution:
FIGURES = [ ('surgeon',3), ('sleepy', 3), ('butcher', 3), ('genie', 3),
('robot', 3), ('roman', 3), ('ladyliberty',3),
('flamenco',4), ('mechanic',4), ('spacegirl',4), ('skater', 4),
('alien',4), ('leprechaum',4),
('minotaur',5), ('highlander',5), ('bandit',5) ]

#.............


#finds Expected value using dynamic programming on a very abstract argument...
mem = dict()
N = len(FIGURES)
TOTAL = 0.0
for x in FIGURES:
TOTAL += x[1]


def probabilityToGet( wanted, bags ):
key = (tuple(wanted), bags)
res = mem.get(key, None)
if ( res == None) :
res = 0.0
if (wanted == [0]*N):
#if we do not want any more bags, the probability to get them is 1
res = 1.0
elif (bags == 0) :
#if we won't buy more bags, and we still have figs left to get,
#it is impossible
res = 0.0
else:
res = 0.0
for i in range(0,N):
p = FIGURES[i][1] / TOTAL
if (wanted[i] > 0) :
#make a copy of the wanted vector, with less of this figure
vec = [x for x in wanted ]
vec[i] -= 1
res += p * probabilityToGet(vec, bags-1)
else:
#Nothing changes, just reduce the bags
res += p * probabilityToGet(wanted, bags-1)

mem[key] = res

return res

def expectedBags( wanted ):
key = tuple(wanted)
res = mem.get(key, None)
if ( res == None) :
res = 0.0
if (wanted == [0]*N):
res = 0.0
else:
px = 0
sump = 1.0
for i in range(0,N):
p = FIGURES[i][1] / TOTAL
if (wanted[i] > 0) :
#make a copy of the wanted vector, with less of this figure
vec = [x for x in wanted ]
vec[i] -= 1
sump += p * expectedBags(vec)
else:
#Nothing changes, add to px
px += p
res = sump / (1 - px)
mem[key] = res

return res

# Convert a list of wanted figures as used bellow into a vector for our functions
def formatWanted(argwant):
wanted0 = [0]*N
for w in argwant:
for i in range(0,N):
if (FIGURES[i][0] == w):
wanted0[i] += 1
if ( (len(w)==2) and (FIGURES[i][0] == w[0]) ):
wanted0[i] += w[1]

return wanted0

#--------------------------------
print "Expected number of bags before we complete a set:"
WANTED_FIGURES = [ 'surgeon', 'sleepy', 'butcher', 'genie', 'robot', 'roman',
'ladyliberty', 'flamenco', 'mechanic', 'spacegirl', 'skater',
'alien', 'leprechaum',
'minotaur', 'highlander', 'bandit' ]
wanted0 = formatWanted(WANTED_FIGURES)
print expectedBags(wanted0)

#----------------------------------
print "Expected number of bags before we get 10 Roman soldiers"
WANTED_FIGURES = [ ('roman',10) ]
wanted0 = formatWanted(WANTED_FIGURES)
print expectedBags(wanted0)

#----------------------------------
print "Expected number of bags before we get 5 aliens, 1 space girl and 3 robots:"
WANTED_FIGURES = [ ('alien',5), ('robot',3), ('spacegirl',3) ]
wanted0 = formatWanted(WANTED_FIGURES)
print expectedBags(wanted0)

#----------------------------------
print "Probability to get a complete set after buying only 16 bags"
WANTED_FIGURES = [ 'surgeon', 'sleepy', 'butcher', 'genie', 'robot', 'roman',
'ladyliberty', 'flamenco', 'mechanic', 'spacegirl', 'skater',
'alien', 'leprechaum',
'minotaur', 'highlander', 'bandit' ]
NUMBER_OF_BAGS = 16
wanted0 = formatWanted(WANTED_FIGURES)
print probabilityToGet(wanted0, NUMBER_OF_BAGS)

#----------------------------------
print "Probability to get a complete set after buying only 60 bags (wait for more than a minute)"
WANTED_FIGURES = [ 'surgeon', 'sleepy', 'butcher', 'genie', 'robot', 'roman',
'ladyliberty', 'flamenco', 'mechanic', 'spacegirl', 'skater',
'alien', 'leprechaum',
'minotaur', 'highlander', 'bandit' ]
NUMBER_OF_BAGS = 60
wanted0 = formatWanted(WANTED_FIGURES)
print probabilityToGet(wanted0, NUMBER_OF_BAGS)



Saturday, December 17, 2011

SRM 527: Slooow

This was a very slow week, with no contests, I felt I was out of shape. Anyway, it seems I did sort of well.

Div1 450: P8XMatrixRecovery
You are given the rows of a binary matrix in correct order, some cells are ?, which means they can be 0 or 1. You are also given the columns, in the same format but this time not necessarily in the correct order. Basically, you can try any permutation of columns. It is guaranteed that there will be an answer, return the lexicographically-first matrix that follows the condition.

I noticed that the constraint on the dimensions was 30x30. It sort of led me to think that there was quite some freedom in the solution. When asked to build the lexicographically-first thing and the alphabet is reduced (there are only 0 or 1 as choices), it is possible to try cell by cell from top-left to bottom-right and try to find the lowest possible value in each position given the previous values found.

For example, in this case, imagine we knew of a function that took the rows and the columns and returned true if such matrix is possible. To build the lex-first matrix, just try each ? cell, and ask the question "Is it possible to have a matrix in which this position has 0 ? ". If the answer to that question is true, then the lex-first answer should have a 0 there, else it must have a 1 there. Make sure to use the newly-found values in subsequent iterations.

We just need to know how to answer that question. It is not so difficult. We have a set of columns and we want to assign a unique position from 0 to n-1 to each of the columns. Now, notice that some columns are compatible with some positions and some are not. We can imagine this as a bipartite graph between columns and positions. There is an edge connecting a column with a position if they are compatible. Then use bipartite matching to solve this. Which in turn shall use Max flow. If a perfect matching exists, there is a valid matrix This logic solves the problem, and you can verify that it needs O(n^5) time complexity in total. That is where the 30x30 constraint comes into place...


struct P8XMatrixRecovery 
{
//The network class is just a interface to solve max flow.

bool doMatch(const vector<string> & rows, const vector<string> & columns)
{
int r = rows.size();
int c = columns.size();

network G;
for (int i = 0; i < 2*c; i++) {
G.addVertex();
}
G.sink = G.addVertex();
G.source = G.addVertex();
for (int i=0; i<c; i++) {
G.addEdge(G.source, i, 1);
G.addEdge(i+c, G.sink, 1);
}

// Use bipartite matching between columns and positions.
for (int i=0; i<c; i++) { //O(n^3)
for (int j=0; j<c; j++) { //O(n^4)
// O(n^5)
bool can = true;
for (int k=0; (k<r) && can; k++) { //O(n^5)
char a = rows[k][j];
char b = columns[i][k];
if ( a != '?' ) {
if (b != '?') {
can &= (a == b);
}
}
}
if (can ) {
G.addEdge(i, j + c, 1);
}
}
}
int f = G.maxFlow();
return (f == c); //O(n^3)
}

vector <string> solve(vector <string> rows, vector <string> columns)
{
int r = rows.size();
int c = columns.size();
vector<string> res = rows;

// For each unknown cell.
for (int i=0; i<r; i++) {
for (int j=0; j<c; j++) { //O(n^2)
if (rows[i][j] == '?') {
//Is it possible to place 0 here?
res[i][j] = '0';
if (! doMatch(res, columns)) {
// If not, place a 1.
res[i][j] = '1';
}
}
}
}

return res;
}
};



Div1 275: P8XGraphBuilder
So, the problem scores would suggest that this problem was about as hard as the medium. I found it to be harder :). Well, you are given an array of N-1 elements: scores . You have to build a connected graph with N nodes and N-1 edges such that the sum (scores[degree of (vertex i) - 1]) for all i is as large as possible.

I was headed to the right answer from the beginning: A N vertices graph with N-1 edges that is connected is just a fancy way to say "Tree". Trees are quite compatible with dynamic programming, because you can consider each node a root of its own subgraph. In an undirected tree, You can also consider any node of the tree to be a root.

At first I had some issues giving a body to the solution. I eventually came to this: Imagine the state F(current_degree, available_children). You want to return the maximum score a single 'root' will have given its current_degree and the number of children available to add as children or grand-children to this root.

If available_children is 0, then we cannot do anything, the score is score[Current_degree - 1].

If available_children is not 0, then we are forced to add a new child to the 'root'. This is the recursive part. Let us name k the number of vertices that will be children for the new node. Then the total result will be the sum of F(current_degree+1, available_children - 1 - k) + F(1, k). Why? Ok, the degree of the "root" will increase by 1. The number of available children for the remaining nodes, decreases by 1+k. But the new child will also have a score of its own, its degree is initially 1, because of its connection to the root, and it has k nodes available for him.

We can iterate all the values of k, and select the one that maximizes the result.

struct P8XGraphBuilder 
{
int N;
vector<int> scores;
int mem[100][100];
int rec(int currentdeg, int avchildren)
{
int & res = mem[currentdeg][avchildren];
if (res == -1) {
res = 0;
if (avchildren == 0) {
if( currentdeg != 0) {
res = scores[currentdeg-1];
}
} else {
//how many grandchildren for the next child?
for (int k=0; k<avchildren; k++) {
res = std::max(res, rec(currentdeg+1, avchildren-1-k)
+ rec(1, k) );
}
}
}
return res;



}

int solve(vector <int> scores)
{
this->scores = scores;
int N = scores.size() + 1;
memset(mem,-1,sizeof(mem));
return rec(0, N-1);
}
};


Div1 1050
I think this problem is all about Lucas' theorem. Mostly because of the low modulo value and some other aspects. But I have no further idea.

---
It seems I did well, the only risks would be a silly typo in some code. My performance was slower than many other coders that solved both problems though.