You know what? Because of a glitch I can no longer edit editorials directly in TopCoder. Also, since minuses indicate people don't like incomplete editorials in forums. I am going to use my blog to post WIP editorial previews. How about that?

## RockPaperScissors problem

Problem statement. You have up to **n** dice with different rock, paper , scissors probabilities. Because you are a lunatic and very bored, you are playing against your **n** dice. Each time you decide a gesture (rock, paper, scissors) to play against the next die you pick. A die is picked uniformly at random from all the dice inside the bag, then it is thrown and removed from the set of bags. For every victory you get 3 points, for every tie 1 point. You play until you run out of dice.

I must thank **ffao** for telling me a much needed solution that is viable to explain. As you can see, my job is to make short explanations much longer.

## The strategy

In the top level, the problem involves the following logic: At each turn, you will remember a list of the results so far: E.g: (Rock - Scissors - Scissors - Rock - Paper - ... ). Based on the past results, you need to find out the probability of the next die result being Rock, Scissors or Paper. We will call these P(Rock), P(Scissors), P(Paper)

If you pick your play to be "Paper", the *expected* score would be: 3*P(Rock) + 3*P(Paper). Because
there is a P(Rock) probability you will earn 3 points and a P(Paper) probability for a tie. Similarly,
playing scissors has an *expected* score of ( 3*P(Paper) + P(Scissors) ) and playing rock yields
(3*P(Scissors) + P(Rock) ). The best decision is the one that has the best expected score, if we follow this
decision in every turn, the expected total score will be maximized.

## Past results

The strategy is currently based on the sequence of past results. The problem is that with **n** <= 50, there
are far too many possible different such sequences. The work around involves to notice that the probabilities
P(Rock), P(Scissors) and P(Paper) are based on the probability that each die would be picked in the next turn.

P(Rock) = Sum( For each die i, P( Die i is picked in the next throw) * P(Throwing Die i yields Rock) )

Initially, the probabilities to pick each die are 1/**n**. These probabilities will change after
a sequence of moves. Depending on the results so far, some dice could be more likely than others.

A key observation is that, given a sequence of past results, the actual order of the results does
not change the probabilities that each die that could still be inside the bag. For example, consider
the results sequences : Rock-Rock-Scissors and Scissors-Rock-Rock. Each combination of 3 dice that
can give the first result can also give the second result. Thanks to this observation, we only need to
remember the number of Rock, Scissors and Paper results we have seen. We will call these numbers
**r**, **s** and **p**, respectively.

## Expected turn probability

There are O(**n**^{3}) triples (**r**, **s**, **p**). We could make a solution
based on recursion: F(**r**, **s**, **p**) gives us the expected score after we have seen
a total of **r** rocks, **s** scissors and **p** papers. There is another approach
though that works nicer in this problem. Note that your decision only has an effect on your score, but it
does not have an effect on which dies are thrown and what face they show. The expected value
is a linear map. The expected total score
is equal to the expected score for each turn.

It is turn #**m**, there were **m** turns before and each turn yielded a result. This means
that **r** + **s** + **p** = **m**. There is a probability P(**r, s, p**) that the
number of results for each gesture were (**r, s, p**). The expected score for the **m**-th turn is:

E(m-th turn) = Sum( For each (r, s, p=m), P(r,s,p)*Score(r,s,p))

Score(**r,s,p**) is the expected score when the past results where (**r,s,p**). We
need to find P(Rock | (**r,s,p**)), P(Scissors | (**r,s,p**)) and P(Paper | (**r,s,p**)), the
conditional probability the next dice yields each of the gestures assuming the past results were
(**r,s,p**). Then we pick the best result: 3*(something) + 1*(something else).

The gesture probabilities depend on the remaining dice. Each die will have a conditional probability
to be picked: P(Die **i** | (**r,s,p**)). For example:

P(Rock | (r,s,p)) = Sum(for each die i, P(Diei| (r,s,p)) * P(Dieirolls Rock) )

There are currently two probabilities that we need to know how to calculate: P(**r,s,p**) and
P(Die i | (**r,s,p**)).

## Die probability

One method is to think: What is the probability that die **i** is still in the bag? P(**i** in bag | (**r,s,p**)).
There should be (**n - r - s - p**) dies in the bag. The probability that die **i** will
be picked is then: P(**i** in bag | (**r,s,p**)) | (**n - r - s - p**).

Consider the probability for the event ( (**r,s,p**) AND (die **i** is not picked ) ), we will
call it P(**i,r,s,p**), this is the total probability that the die is in the bag AND (**r,s,p**). By the definition
of conditional probability:

P( Die i is in the bag | (r,s,p) ) = P(Die in the bag AND (r,s,p)) / P(r,s,p) = P(i,r,s,p) / P(r,s,p)

Another method is with Bayes' theorem:

P(Die i | (r,s,p)) = ( P( (r,s,p) | Die i) * P(Die i) ) / P(r,s,p)

It does need a work in interpretation P(Dice i), is the probability Die **i** is picked
*in the m-th turn* this is 1 / (**n - r-s-p**), because that is the
number of remaining dice. P( (**r,s,p**) | Die i) is the probability (**r,s,p**) were
the results before **m**-th turn given that Die **i** is picked next. This is the same
as the probability to pick (**r,s,p**) without using Die i = P(**i,r,s,p**). Using
both methods, the result is the same:

P(Die i | (r,s,p)) = ( P(i,r,s,p) / P(r,s,p) ) / (n - r-s-p)

## Getting rid of P(r,s,p)

Instead of calculating P(**r,s,p**) we will show we do not need to calculate it. This is a interesting although not very significant simplification. Remember that the objective
is to find the expected score for the **m**-th turn:

Sum( For each (r+s+p = m) : P(r,s,p) * Score(r,s,p) )

Score(**r,s,p**) is equal to 3*(a probability) + 1*(another probability). The probabilities
are in the form:

P(diei| (r,s,p)) = (P(i,r,s,p) / P(r,s,p)) / (n-r-s-p) P(Rock | (r,s,p)) = Sum(For each die i : P(diei| (r,s,p)) * P(die i rolls Rock) )

Therefore, Score(**r,s,p**) is equal to a number divided by P(**r,s,p**), since we calculate
the product P(**r,s,p**) * Score(**r,s,p**) we can just eliminate P(**r,s,p**).

P'(diei| (r,s,p)) = P(i,r,s,p) / (n-r-s-p) P'(Rock | (r,s,p)) = Sum(For each die i : P'(diei| (r,s,p)) * P(die i rolls Rock) ) P'(Scissors | (r,s,p)) = Sum(For each die i : P'(diei| (r,s,p)) * P(die i rolls Scissors) ) P'(Paper | (r,s,p)) = Sum(For each die i : P'(diei| (r,s,p)) * P(die i rolls Paper) ) Score'(r,s,p) = max( 3*P'(Rock | (r,s,p)) + P'(Paper | (r,s,p)) 3*P'(Scissors | (r,s,p)) + P'(Rock | (r,s,p)) 3*P'(Paper | (r,s,p)) + P'(Scissors | (r,s,p)) )

## Calculating P(i,r,s,p)

We can use dynamic programming. Let
f(**t, i,r,s,p**) be the probability to pick **r** rocks, **s** scissors and **p** papers in (**r+s+p**)
*without* picking die i, when considering only the first **t** dice.

- For
**t**= 0, there are no dice, whenever**r+s+p**= 0, the probability is exactly 1.0 else it is 0.0. This is the base case. - For
**t**> 0. We can consider die #(**t**-1). We are calculating probability based on (**r+s+p**) picks and there are**t**dies available. The probability die (**t**-1) will be picked in any of these throws is (**r+s+p**) /**t**. If the die is picked, there is a probability it will roll a Rock, Scissors or Paper. For example, If rock is picked, then we need to calculate f(**t**-1,**i, r-1,s,p**) to find the probability the rest works out. If the die is not picked then we need to calculate f(**t**-1,**i, r,s,p**). An exceptional case is when**i**=**t**-1. In this case, we need to ignore the case where the die is picked.

In total we will need O(1) calculations for each of the O(**n**^{5}) possible states
for the function. An issue is that naively implementing the O(**n**^{5}) states of
the dynamic programming needs around 2.5 GB of memory (TopCoder has a 64MB limit). We
still have work to do.

### In less memory

For the main solution, we do not need to remember all O(**n**^{5}) states of the
function. We only need those where **t** = **n** (To consider all the dice available) . Each
state is dependent on cases states with smaller (**r+s+p**) or with identical (**r,s,p**). It
is possible to implement the dynamic programming using only O(**n**^{4}) memory. See the
code for more details.

## Code

`double P[50][51][51][51];// P[x][r][s][p]`

// probability to get r rocks, s scissors and p papers

// without throwing die #x.

// This array is barely behind the memory limit:

// 50*51*51*51*8 bytes ~= 50.60 MB

double bestScore(vector <int> rockProb, vector <int> paperProb, vector <int> scissorsProb)

{

int n = rockProb.size();

memset(P, 0, sizeof(P));

for (int i=0; i<n; i++) {

P[i][0][0][0] = 1.0; //Base case. The other cases for t=0 are 0.0

// We are calculating f(t,i,r,s,p):

for (int t=1; t<=n; t++) {

// Initially, P[i][r][s][p] contains the results for (t-1)

// As we update P[i][r][s][p] with results for (t), we

// need to do it in an order that does not override a

// result for (t-1) that we might still need later

for (int m=t; m>=0; m--) {

// Each state depends on those with smaller m

// or identical (r,s,p). So we update in decreasing order of m:

for (int r=0; r <= m; r++) {

for (int s=0; s + r <= m; s++) {

int p = m - s - r;

// Probability die t-1 is picked:

double q = m / (double)t;

double qr = rockProb[t-1] / 300.0;

double qs = scissorsProb[t-1] / 300.0;

double qp = paperProb[t-1] / 300.0;

double sum = 0;

if (r > 0) {

// Since r-1+s+p < r+s+p, then P[i][r-1][s][p] currently

// holds the result for f(t-1, i, r-1,s,p)

sum += qr * P[i][r-1][s][p];

}

if (s > 0) {

sum += qs * P[i][r][s-1][p];

}

if (p > 0) {

sum += qp * P[i][r][s][p-1];

}

// Note that P[i][r][s][p] currently holds f(t-1,i,r,s,p):

if (i == t - 1) {

// Die t-1 is the one to ignore. Ignore its pick:

P[i][r][s][p] = (1-q) * P[i][r][s][p];

} else {

P[i][r][s][p] = q*sum + (1-q) * P[i][r][s][p];

}

}

}

}

}

}

double res = 0;

for (int m = n-1; m >= 0; m--) {

// How many points can you expect at the m-th throw?

for (int r=0; r<=m; r++) {

for (int s=0; s+r<=m; s++) {

// Score'(r,s,p) =

int p = m - s - r;

double prock = 0;

double pscissors = 0;

double ppaper = 0;

for (int i=0; i<n; i++) {

// Probability to pick die i:

double cond = P[i][r][s][p] / (n - m);

// Add to the rock, scissors and paper probabilities:

prock += cond * rockProb[i] / 300.0;

pscissors += cond * scissorsProb[i] / 300.0;

ppaper += cond * paperProb[i] / 300.0;

}

// = Best of:

double x[3] = { 3*prock + ppaper,

3*pscissors + prock,

3*ppaper + pscissors };

res += *max_element(x,x+3); // Pick the maximum value:

}

}

}

return res;

}

## 1 comment :

waiting to read your story on srm 580 .. make it fast please :)

Post a Comment