Friday, May 24, 2013

SRM 579 Editorial preview: RockPaperScissors

You know what? Because of a glitch I can no longer edit editorials directly in TopCoder. Also, since minuses indicate people don't like incomplete editorials in forums. I am going to use my blog to post WIP editorial previews. How about that?

RockPaperScissors problem

Problem statement. You have up to n dice with different rock, paper , scissors probabilities. Because you are a lunatic and very bored, you are playing against your n dice. Each time you decide a gesture (rock, paper, scissors) to play against the next die you pick. A die is picked uniformly at random from all the dice inside the bag, then it is thrown and removed from the set of bags. For every victory you get 3 points, for every tie 1 point. You play until you run out of dice.

I must thank ffao for telling me a much needed solution that is viable to explain. As you can see, my job is to make short explanations much longer.

The strategy

In the top level, the problem involves the following logic: At each turn, you will remember a list of the results so far: E.g: (Rock - Scissors - Scissors - Rock - Paper - ... ). Based on the past results, you need to find out the probability of the next die result being Rock, Scissors or Paper. We will call these P(Rock), P(Scissors), P(Paper)

If you pick your play to be "Paper", the expected score would be: 3*P(Rock) + 3*P(Paper). Because there is a P(Rock) probability you will earn 3 points and a P(Paper) probability for a tie. Similarly, playing scissors has an expected score of ( 3*P(Paper) + P(Scissors) ) and playing rock yields (3*P(Scissors) + P(Rock) ). The best decision is the one that has the best expected score, if we follow this decision in every turn, the expected total score will be maximized.

Past results

The strategy is currently based on the sequence of past results. The problem is that with n <= 50, there are far too many possible different such sequences. The work around involves to notice that the probabilities P(Rock), P(Scissors) and P(Paper) are based on the probability that each die would be picked in the next turn.

P(Rock) = Sum( For each die i, P( Die i is picked in the next throw) * P(Throwing Die i yields Rock) )

Initially, the probabilities to pick each die are 1/n. These probabilities will change after a sequence of moves. Depending on the results so far, some dice could be more likely than others.

A key observation is that, given a sequence of past results, the actual order of the results does not change the probabilities that each die that could still be inside the bag. For example, consider the results sequences : Rock-Rock-Scissors and Scissors-Rock-Rock. Each combination of 3 dice that can give the first result can also give the second result. Thanks to this observation, we only need to remember the number of Rock, Scissors and Paper results we have seen. We will call these numbers r, s and p, respectively.

Expected turn probability

There are O(n3) triples (r, s, p). We could make a solution based on recursion: F(r, s, p) gives us the expected score after we have seen a total of r rocks, s scissors and p papers. There is another approach though that works nicer in this problem. Note that your decision only has an effect on your score, but it does not have an effect on which dies are thrown and what face they show. The expected value is a linear map. The expected total score is equal to the expected score for each turn.

It is turn #m, there were m turns before and each turn yielded a result. This means that r + s + p = m. There is a probability P(r, s, p) that the number of results for each gesture were (r, s, p). The expected score for the m-th turn is:

E(m-th turn) = Sum( For each (r, s, p = m), P(r,s,p)*Score(r,s,p))

Score(r,s,p) is the expected score when the past results where (r,s,p). We need to find P(Rock | (r,s,p)), P(Scissors | (r,s,p)) and P(Paper | (r,s,p)), the conditional probability the next dice yields each of the gestures assuming the past results were (r,s,p). Then we pick the best result: 3*(something) + 1*(something else).

The gesture probabilities depend on the remaining dice. Each die will have a conditional probability to be picked: P(Die i | (r,s,p)). For example:

P(Rock | (r,s,p)) = Sum(for each die i, P(Die i | (r,s,p)) * P(Die i rolls Rock) )

There are currently two probabilities that we need to know how to calculate: P(r,s,p) and P(Die i | (r,s,p)).

Die probability

One method is to think: What is the probability that die i is still in the bag? P(i in bag | (r,s,p)). There should be (n - r - s - p) dies in the bag. The probability that die i will be picked is then: P(i in bag | (r,s,p)) | (n - r - s - p).

Consider the probability for the event ( (r,s,p) AND (die i is not picked ) ), we will call it P(i,r,s,p), this is the total probability that the die is in the bag AND (r,s,p). By the definition of conditional probability:

P( Die i is in the bag | (r,s,p) ) = P(Die in the bag AND (r,s,p)) / P(r,s,p)
                                   = P(i,r,s,p) / P(r,s,p)

Another method is with Bayes' theorem:

P(Die i | (r,s,p)) = ( P( (r,s,p) | Die i) * P(Die i) ) / P(r,s,p)

It does need a work in interpretation P(Dice i), is the probability Die i is picked in the m-th turn this is 1 / (n - r-s-p), because that is the number of remaining dice. P( (r,s,p) | Die i) is the probability (r,s,p) were the results before m-th turn given that Die i is picked next. This is the same as the probability to pick (r,s,p) without using Die i = P(i,r,s,p). Using both methods, the result is the same:

P(Die i | (r,s,p)) = ( P(i,r,s,p) / P(r,s,p) ) / (n - r-s-p)

Getting rid of P(r,s,p)

Instead of calculating P(r,s,p) we will show we do not need to calculate it. This is a interesting although not very significant simplification. Remember that the objective is to find the expected score for the m-th turn:

Sum( For each (r+s+p = m) : P(r,s,p) * Score(r,s,p) )

Score(r,s,p) is equal to 3*(a probability) + 1*(another probability). The probabilities are in the form:

P(die i | (r,s,p)) = (P(i,r,s,p) / P(r,s,p)) / (n-r-s-p) 
P(Rock | (r,s,p))  = Sum(For each die i : P(die i | (r,s,p)) * P(die i rolls Rock) ) 

Therefore, Score(r,s,p) is equal to a number divided by P(r,s,p), since we calculate the product P(r,s,p) * Score(r,s,p) we can just eliminate P(r,s,p).

P'(die i | (r,s,p)) = P(i,r,s,p) / (n-r-s-p)

P'(Rock | (r,s,p))     = Sum(For each die i : P'(die i | (r,s,p)) * P(die i rolls Rock) )
P'(Scissors | (r,s,p)) = Sum(For each die i : P'(die i | (r,s,p)) * P(die i rolls Scissors) )
P'(Paper | (r,s,p))    = Sum(For each die i : P'(die i | (r,s,p)) * P(die i rolls Paper) )

Score'(r,s,p) = max( 3*P'(Rock | (r,s,p)) + P'(Paper | (r,s,p))
                     3*P'(Scissors | (r,s,p)) + P'(Rock | (r,s,p))
                     3*P'(Paper | (r,s,p)) + P'(Scissors | (r,s,p)) )

Calculating P(i,r,s,p)

We can use dynamic programming. Let f(t, i,r,s,p) be the probability to pick r rocks, s scissors and p papers in (r+s+p) without picking die i, when considering only the first t dice.

  • For t = 0, there are no dice, whenever r+s+p = 0, the probability is exactly 1.0 else it is 0.0. This is the base case.
  • For t > 0. We can consider die #(t-1). We are calculating probability based on (r+s+p) picks and there are t dies available. The probability die (t-1) will be picked in any of these throws is (r+s+p) / t. If the die is picked, there is a probability it will roll a Rock, Scissors or Paper. For example, If rock is picked, then we need to calculate f(t-1, i, r-1,s,p) to find the probability the rest works out. If the die is not picked then we need to calculate f(t-1, i, r,s,p). An exceptional case is when i=t-1. In this case, we need to ignore the case where the die is picked.

In total we will need O(1) calculations for each of the O(n5) possible states for the function. An issue is that naively implementing the O(n5) states of the dynamic programming needs around 2.5 GB of memory (TopCoder has a 64MB limit). We still have work to do.

In less memory

For the main solution, we do not need to remember all O(n5) states of the function. We only need those where t = n (To consider all the dice available) . Each state is dependent on cases states with smaller (r+s+p) or with identical (r,s,p). It is possible to implement the dynamic programming using only O(n4) memory. See the code for more details.

Code

double P[50][51][51][51];// P[x][r][s][p]
// probability to get r rocks, s scissors and p papers
// without throwing die #x.

// This array is barely behind the memory limit:
// 50*51*51*51*8 bytes ~= 50.60 MB

double bestScore(vector <int> rockProb, vector <int> paperProb, vector <int> scissorsProb)
{
int n = rockProb.size();
memset(P, 0, sizeof(P));
for (int i=0; i<n; i++) {
P[i][0][0][0] = 1.0; //Base case. The other cases for t=0 are 0.0
// We are calculating f(t,i,r,s,p):
for (int t=1; t<=n; t++) {
// Initially, P[i][r][s][p] contains the results for (t-1)
// As we update P[i][r][s][p] with results for (t), we
// need to do it in an order that does not override a
// result for (t-1) that we might still need later
for (int m=t; m>=0; m--) {
// Each state depends on those with smaller m
// or identical (r,s,p). So we update in decreasing order of m:
for (int r=0; r <= m; r++) {
for (int s=0; s + r <= m; s++) {
int p = m - s - r;
// Probability die t-1 is picked:
double q = m / (double)t;
double qr = rockProb[t-1] / 300.0;
double qs = scissorsProb[t-1] / 300.0;
double qp = paperProb[t-1] / 300.0;

double sum = 0;
if (r > 0) {
// Since r-1+s+p < r+s+p, then P[i][r-1][s][p] currently
// holds the result for f(t-1, i, r-1,s,p)
sum += qr * P[i][r-1][s][p];
}
if (s > 0) {
sum += qs * P[i][r][s-1][p];
}
if (p > 0) {
sum += qp * P[i][r][s][p-1];
}
// Note that P[i][r][s][p] currently holds f(t-1,i,r,s,p):
if (i == t - 1) {
// Die t-1 is the one to ignore. Ignore its pick:
P[i][r][s][p] = (1-q) * P[i][r][s][p];
} else {
P[i][r][s][p] = q*sum + (1-q) * P[i][r][s][p];
}
}
}
}
}
}
double res = 0;

for (int m = n-1; m >= 0; m--) {
// How many points can you expect at the m-th throw?
for (int r=0; r<=m; r++) {
for (int s=0; s+r<=m; s++) {
// Score'(r,s,p) =
int p = m - s - r;
double prock = 0;
double pscissors = 0;
double ppaper = 0;
for (int i=0; i<n; i++) {
// Probability to pick die i:
double cond = P[i][r][s][p] / (n - m);
// Add to the rock, scissors and paper probabilities:
prock += cond * rockProb[i] / 300.0;
pscissors += cond * scissorsProb[i] / 300.0;
ppaper += cond * paperProb[i] / 300.0;
}
// = Best of:
double x[3] = { 3*prock + ppaper,
3*pscissors + prock,
3*ppaper + pscissors };
res += *max_element(x,x+3); // Pick the maximum value:
}
}

}
return res;
}

1 comment :

RA said...

waiting to read your story on srm 580 .. make it fast please :)