Sunday, November 27, 2011

SWERC 2011, D, F, G, H and almost E

I had to wake at 5:15 AM for this, and it was still 65 minutes late. I messed up this time because I didn't know the NWERC problem set was also going to become a contest, but at spoj (which is more updated language-wise than uva) and 1 hour later (which would have been perfect for me).

At the end, I solved 4 problems at the 9-th position.

Problem D - Distributed ballot boxes
By the time I arrived, there were already stats , so I picked the problem that was solved the most in the stats. It was D.

You have to assign at least one box to each city, which means all the people will have to use that box. If the city has more than one box then you can decide who in the city gets which box, (but it is better to divide the quantities, because you want to minimize the maximum number of people to use the same box).

At first I thought the problem was just to accumulate all the populations and divide. (mind you, I was still waking up). Then I figured you cannot do that, you cannot move citizens across cities.

I thought of various things, I eventually settled for this. The trick is : What if the maximum for another city was already X? Then there is no point in using our effort trying to get a value smaller than X for another city.

That idea can be updated to a binary search. Let us change the problem into can the largest amount of citizens that use a single box in all the cities be less than or equal to ha?. If some value for ha is not possible, then smaller values won't be possible either. If a large value is possible, then any larger value will be possible too.

So, to answer the above question, we just need to find the minimum number of boxes needed by each city so that no box holds more than ha citizens. Note that if the sum of all the minimums exceeds b (The number of boxes we have to assign). Then ha is not possible.

For the minimum number of boxes needed by a city, know that all boxes should better have about the same number of citizens assigned. The best way to do this is with just a division.


//========================================================= 
// program:
//
// (initially, parse the input so that N,B and the population[] hold it).
int N, B;
int population[500000];

// is it possible to assign the boxes to the cities such that no box holds more
// than mx citizens?
bool possible(int mx)
{
int minb = 0;
for (int i=0; i<N; i++) {
//least amount of boxes we need to split population[i] in
// parts no larger than mx.
int x = population[i]/mx + (population[i]%mx != 0);

minb += x;
if (minb > B) {
return false;
}
}
return true;
}

int solve()
{
// Binary search for
// the minimum x such that possible(x).
int lo = 0; // We know possible(0) is false
int hi = 5000000; // We know possible(hi) is true (constraints).
while ( hi > lo + 1 ) {
int ha = (lo + hi)/2;
if (possible(ha)) {
hi = ha;
} else {
lo = ha;
}
}
return hi;

}



Problem H - Peer review
It seems that with ACM regionals there are always at least 3 or 4 problems that are way too straightforward and mostly annoying implementation. This is one of them.

Something I had issues with was that you can only count at most one mistake per paper. So, what exactly to do when a single scientist reviews the same paper twice? Somehow the 'per paper' part was confusing when scientists could be the ones with problems. At the end I concluded that in that case, we should count the mistake once, because it is a single paper that is being used twice by the same scientist.

A trick here is to convert the input. The problem is based on papers, yet the input is sorted by authors. It is easy to generate an array with the papers as indexes.

Then do exactly what it says on the tin. For each paper, verify that exactly K scientist review it. That nobody with the same institution as the paper's author reviews it and that each of the scientists reviewing it is distinct.
int K, N; //(From the input) 
string institution[1000]; //(From input, institution of the i-th author)
int review[1000][5]; // author x reviews paper #review[x][i] for (i < K)

int reviewed[1000][5]; // paper x is reviewed by scientist #review[x][i]
int reviewedn[1000]; // number of reviewers (shall be K at the end).

int paperHasProblem[1000]; //Allows us to count at most a problem per paper.


int countProblems()
{
fill(reviewedn, reviewedn + N, 0);
fill(paperHasProblem, paperHasProblem + N, 0);

// convert (from authors as indexes to papers as indexes).
for (int i = 0; i < N; i++) {
for (int j = 0; j < K; j++) {
int p = review[i][j];
int &t = reviewedn[p];
if (t == K) {
paperHasProblem[p] = true;
} else {
reviewed[p][ t++ ] = i;
}

}
}
// handle it
for (int i = 0; i < N; i++) {
// Exactly K reviewers, no less no more
if ( reviewedn[i] < K ) {
paperHasProblem[i] = true;
}
if (paperHasProblem[i]) {
continue;
}
for (int j = 0; j < K; j ++) {
// No collaborators
if ( institution[i] == institution[ reviewed[i][j] ] ) {
paperHasProblem[i] = 1;
break;
}
}
// no repeats
for (int j = 0; j < K; j ++) {
if (paperHasProblem[i]) {
break;
}
for (int s = j+1; s < K; s++) {
if (reviewed[i][j] == reviewed[i][s]) {
paperHasProblem[i] = 1;
break;
}
}
}

}
return accumulate(paperHasProblem, paperHasProblem + N, 0);
}



Problem F - Guess the numbers
It seems that in every ACM-related contest there is a problem that is about evaluating a parenthesis expression. And every time the complicated part is to evaluate them. But if you know the words recursion or stack it is not so hard...

So, really, there are n unknowns. And n is at most 5. You have n values available, and you have to assign them to the unknowns. You can just try all the Factorial(5) ways to do it. Then all that is needed is to evaluate the expression and see if it is equal to m.

At first I was not sure if I would have to be worried about overflows. Then I noticed that 50*50*50*50*50 is the largest factor you can have.

Note that the statement never says that the unknowns will be in sequence. So , we cannot be for sure that for n=3, the unknowns will be a,b,c. Knowing ACM problems, it is likely there were cases with n=3 and , for example, p , z, x. I do not know this for sure because I didn't try submitting without handling this corner case. But I bet that's true.

int n, m; //(From the input) 
int values[5]; // The available values, also from the input.
string expression; //The expressiong (Guess where it comes from).

// Evaluate the parenthesis expression starting at a.
// b will be the first index greater than a that is NOT part of the expression.
int evaluate(int a, int& b)
{
// The expression is an atom, just return the assigned value.
if (expression[a] != '(') {
b = a + 1;
return values[ expression[a] - '0' ];
}
// expression[a] is a (.
// then it follows another complete parenthesis expression
int p = evaluate(a+1, b);
// then an operator
char op = expression[b];
// and then another expression
int q = evaluate(b+1, b);
assert( expression[b] == ')' );
// The end of the main expression comes just after the end of the second
// sub-expression.
b++;
int r;
switch(op) {
case '+':
r = p + q;
break;
case '-':
r = p - q;
break;
case '*':
r = p * q;
break;
}
return r;


}

bool isPossible()
{
int t = 0;
// First of all, let us update the expression so that the unknowns
// are represented as fixed numbers instead of arbitrary letters.
map<char, int> assign;
for (int i=0; i<expression.size(); i++) {
char ch = expression[i];
if (ch >= 'a' && ch <= 'z') {
if (assign.count( ch )) {
expression[i] = ('0' + assign[ch] );
} else {
expression[i] = ('0' + t);
assign[ch] = t++;
}

}
}
// Try all permutations of values, evaluate and if it is m, return true.
t = expression.size();
sort(values,values+n);
do {
int t;
if ( evaluate(0, t) == (int)m ) {
return true;
}
} while (next_permutation(values, values+n) );
return false;
}


Problem G - Non-negative partial sums
Finally a more interesting problem. However, there was something I did not like about it, will explain bellow.

Ok... so the number of elements is 1000000. That's... high. You would certainly need O(n) or O(log(n)*n) solution to solve this. And that does not sound easy because you have to cycle the sequence n times and calculate the sums n times!

Due to the trauma from SRM 524 (That very special medium problem). I tried to keep things like accumulated sums in mind. Turns out they are partially helpful here.

The trick is to try your own sequence. How about: 7,-3,4,-5. Let us try the sums: (7, 4, 8, 3). These are the partial sums for the first rotation (no rotation at all). As you can see, there are no negative values here. What is interesting here, is what happens after a single rotation: (7,-3,4,-5) will become (-3,4,-5,7), but what happens to the accumulated values? They become (-3, 1, -4, 3). We now have negatives, so this rotation does not count.

We need to know how to convert the accumulated results of the first rotation into the second one. It is not actually very hard. From the first sums (7,4,8,3), we will remove the first element (a[0] = 7). If we remove the first element, the sums will become (4 - 7, 8 - 7, 3 - 7). However, we also need to add the first element, but to the last position. The new element will be (3 - 7 + 7). Because the last accumulated sum is (3 - 7) and we add 7.

Note that we do not really need the whole sequence of accumulated sums. What we do need is to know if any is less than 0. Note that if the minimum is not negative, then none of the elements in the sequence of sums will be negative. So we only need the minimum. When we remove a[i], it will make the values in the sequence of sums drop a[i] units. We will then remove a[i-1], that means to once again subtract a[i-1] from all sums. We can just keep a variable offset which will be the current number that we subtracted from all the sums.

Now, for each rotation, we just need a) The minimum value among all the current sums. b) The offset variable. If the (minimum + offset) is less than 0, then the rotation contains a negative.

We do need to update the minimum every time we remove and add elements from the sequence. And this better be fast. You can use a data structure like a tree to do this. Did you know that the STL's std::set is actually a very good tree structure (a red-black tree to be honest). It can find the minimum in the set in O(log(n)) time (using the .begin() method). It can also remove elements in O(log(n)) time, but you would need some sort to find them. And here I propose a pair, that way you store (acumulated_value[i], i) in the set. You can erase the i-th sum looking for (acumulated_value[i], i).

The first code I submitted looked like this:

int n;           //from the input 
int a[1000000]; // the values from the input
int acu[1000000]; //acu[i] will hold sum(a[j], for j<=i).

int solve()
{
// Build the first sequence of sums, add to our "tree"
set< pair<int,int> > tree;
int res = 0;
acu[0] = a[i];
tree.insert( acu[0], i);
int last = acu[0];
for (int i=1; i<n; i++) {
acu[i] = acu[i-1] + a[i];
tree.insert( make_pair(acu[i],i) );
last = acu[i];
}
res = ( tree.begin()->first >= 0); //if we found a negative acu, first cycle is wrong.

int offset = 0;
//The remaining rotations
for (int i=1; i<n; i++) {
// We remove a[i-1]
tree.erase( tree.find( make_pair(acu[i-1], i-1) ) );
offset -= a[i-1];

// We add last + a[i-1]
int id = i + n;
tree.insert( make_pair(last + a[i-1], id) );
// We update last:
last += a[i-1];

//verify the minimum
if (b->first + offset >= 0) {
res ++;
}

}
return res;
}


Unfortunately, it timed out. Which is very lame because it is clearly O(n*log(n)). It seems that as usual uva wants you to care about the constant factor. Which is something I don't like much (Because it teaches us the terrible lesson of not using libraries, when instead using libraries (a good practice) should be rewarded and not the other way around). I eventually came up with a solution that uses the same logic, but not a tree, instead we just sort the first sequence of sums. It has the same complexity but less overhead related to set and it passes in time.

int n;           //from the input 
int a[1000000]; // the values from the input

pair<int,int> acumvec[1000000];

int solve()
{
int res = 0;
// Build the first sequence of sums, then sort the vector
int acu = 0;
int last = 0;
for (int i=0; i<n; i++) {
acu += a[i];
acumvec[i] = make_pair(acu,i);
last = acu;
}
sort(acumvec, acumvec+n);
res = ( acumvec[0].first >= 0 ); //if we found a negative acu, first cycle is wrong.

int t = 0;

int offset = 0;
//remaining cycles
int ultramin = 2000000000; //The idea is that we will never have to remove
//the indexes greater than (n-1), so we do not need
// them in a tree or anything.
// This also means that we do not really need to
// update the first vector, so the order will
// always be the same and thus we only need to sort once.
for (int i=1; i<n; i++) {

int id = i + n;
// We 'remove' i-1
while( (t < n) && ( acumvec[t].second < i) ) {
t++; //Note that we will visit each t at most once IN TOTAL, so it is O(n)
// but outside the for(i=1;i<n;i++) loop.
}

offset -= a[i-1];
// We "add" last + a[i-1]
ultramin = min(ultramin, last + a[i-1]);
// We update last:
last += a[i-1];

// If the minimum + offset < 0, there are negative sums.
// we need to consider both minimums...
if ( (ultramin+offset < 0) || ( (t < n) && (acumvec[t].first+offset < 0) ) ) {

} else {
res++;
}
}
return res;


}



Problem E - Game, set and match
I tried many other problems but it was already quite late and didn't have promising ideas. Around 20 minutes before the end, I opened E. I think this problem is really not as hard as the statistics would tell you. But it does need you to implement 4 different subproblems and that takes quite some time.

My initial idea is dynamic programming with some handling of special cases. I only solved the game part so far. Here is an explanation about how to solve games.

Let us define f(a,b), the probability that the player will win a game if his score is a, and the other player's score is b. Note that the scores defined in the statement are 0,15,30,40, but we can consider them as (0,1,2,3). We will add a new score 4, that means a player won. The player that has score a has a p probability to win a point and we want to calculate his probability to win.

Let us find f(a,b) when they are both are 3, this is the most interesting case (equivalent to 40, 40 in the statement) as they are in a "deuce". We are interested in seeing the player win 2 consecutive times. But if the players score two alternate points, we are back to 40,40. That is important. Let us put things into an equation:

f(3,3) = p*p + p*(1-p)*f(3,3) + (1-p)*p*f(3,3) + (1-p)*(1-p)*0.

- If the player scores two consecutive points, that has probability p*p, and he wins. If the other player scores two consecutive points, that has probability (1-p)*(1-p) and the first player loses (thus the 0). Else, in the other two cases each of the players scores once in the two next turns. This takes us back to the starting point.

The equation is just that, an equation. You can solve it and you have f(3,3).

Now, once you have f(3,3), you can solve the remaining cases of f(a,b) . When a is 4, the player won, that is probability 1 to win. When b is 4, the other player won, so the first player has 0 probability to win. Else there is p probability that the game will become f(a+1,b) and (1- p) probability that it will become f(a,b+1). We just need some good-old memoization to solve the f recurrence.

The other parts of the statement are solved using similar methods, use memoization and solve equations to solve cycles. But it should take long to deduct everything and code.
// Does not solve all the problem, only finds the probability to win a game. 
bool rgame[5][5];
double game[5][5];

double gameProbability(double p, int a=0, int b=0)
{
double & res = game[a][b];
if (!rgame[a][b]) {
rgame[a][b] = true;
if (a==4) {
res = 1;
} else if (b==4) {
res = 0;
} else if ( (a==3) && (b==3) ) {
//how to win a deuce
// f = p*p + (1-p)*p*f + p(1-p)*f
res = (p*p)/(1 - 2*p + 2*p*p);
} else {
res = p * gameProbability(p, a+1, b)
+ (1-p) * gameProbability(p, a, b+1);
}
}
return res;
}

//=========================================================
// I/O:
//
int main()
{
double p;
while ( (cin >> p) && (p > -0.1) ) {
memset(rgame,0,sizeof(rgame));
memset(rset,0,sizeof(rset));
cout.precision(11);
cout.setf(ios::fixed, ios::fixed);
cout << gameProbability(p) <<endl;
}
return 0;
}

5 comments :

Anonymous said...

Some comments:
1. If you are looking for an ACM regional without "3 or 4 straightforward problems", check out e.g. SWERC 2010. But with 9 or 10 problems for five hours, 3 or 4 of them should be relatively straightforward even if the teams are good, as the disastrous scoreboard of SWERC 2010 shows.

Also if it's hard problems you're looking for, what about problems A, B and C this year?

2. The statement for "Peer Review" was crystal clear. I agree it's easy to make a mistake here, but there is no ambiguity if you pay attention to the statement. It's not meant to be tricky.

3. No, not every regional has a parsing problem. But parsing in problem H was particularly easy, as there is not even parenthesis removal.

4. Of course there are cases with n = 3 and p, z, x (say). If they weren't the statement should say so. Again this is not tricky at all as you seem to be implying. And how does this have anything to do at all with it being an ACM Regional?

5. The O(n log n) solution would have been accepted at the real contest. It's UVA that sets too tight time limits (in particular one of the references solutions for C timed out badly at the online contest). The times at SPOJ should be more lenient.

Anonymous said...

By the way, I forgot to say. None of the problems in this contest had an annoying implementation. I wonder why you would say that.

vexorian said...

1. Is there a particular reason to have 9 or 10 problems? When 4 are really filler? Could easily go with 7 problems and keep 2 straightforward ones. I think 2 straightforward problems are good enough to avoid ties with 0 (which is a good and fine objective).

Of course there are hard problems in these regionals and never stated otherwise, but the point was that there are always so many small problems, and the purpose of it has always evaded me, to say the least.

What I dislike about them is that they are a nice way to lose time before you get to try the more interesting problems (That the scoring system is designed to punish you for not solving the easiest problems first does not help).

2. Indeed it was clear and I eventually made the right conclusion, mentioned it mostly because it was my own confusion.

"Crystal clear" is not a compound adjective I would feel is fair to use.

3. I never said every regional has one, but just that every single time I try a problem set in an uva contest there is one of these problems. Admitely this is the first ACM problem set in a while I try, but I can go through my contest history and show you that for some reason there is always a problem like this.

And not just any kind of parsing problem, but a problem about evaluating arithmetic expressions with parenthesis.

4. I didn't said it was tricky, or wrong to have that corner case. Though now that you make me think about it, as a corner case it is just quite worthless.

5. I never said it was ACM's fault. I explicitly said it was an UVA thing, and I know that goes on UVA's shoulders.

"It seems that as usual uva wants you to care about the constant factor."

vexorian said...

Annoying is so subjective. I think something like Peer review and Guess the numbers are that because they are mostly implementation.

I did enjoy the contest, and the problems were good practice for me because I have been slow in implementation lately and the problems that are not Guess the numbers and Peer review did actually need me to think things. If I sounded overly negative it is probably because I really think the ACM contest format has aged a bit too much and needs updating in comparison to things like CF or GCJ.

Anonymous said...

1. Sure, there are some easy problems. And yes, there is a reason. They are needed to let weaker teams solve a couple of them, and to have few teams solving none. In other words, to differenciate among the weaker teams. The contest is not all about the top ontestants. On the other hand I do not really agree that there was any ``filler'', even if some are easy or known.

Also, do you know any programming contests that do not contain some problems like that? Topcoder for example is much worse in this respect, there are only three problems, and the 250 is *always* filler, unless it's the final of an onsite contest or something. (And then you need to be quite fast to solve the other two even if you know how). IOI used not to have any easy problems, but now they are including them (although in this particular case I think they went too far).

2. You did say it was confusing (not that you were confused).

3. You certainly wrote it: "It seems that in every ACM-related contest there is a problem that is about evaluating a parenthesis expression".

4. As I said, it is not meant to be a corner case, so I don't care if it's worthless or not. But you did imply it was a corner case: "Knowing ACM problems, it is likely there were cases with .... I bet that's true". What's the point of that sentence then? And how would that have been different in a non-ACM contest?

5. I know what you said, I was letting you know that solution was expected to be accepted. On the other hand, for a slightly harder problem it wouldn't have been such a bad idea to let Omega(n log n) solutions time out if it were possible to tell them apart, since there are several O(n) solutions.

And I repeat, *none* of the problems in this set had a complicated implementation. One can write the simplest of parsers for F. It takes 50 lines to solve the whole problem with no prior knowledge. Even if the parser were more difficult I do not know what would be wrong with that, parsing is algorithmically interesting too.

Finally, I do find the Google Code Jam format the most enjoyable of all as a contestant, but that's another matter. ACM comes a close second to me. I never tried CodeForces though.