Did I ever mention I am an AFOL, which is french for Adult Fan of LEGO?. Well, I am. You know, I have a problem with LEGO and it is that just in 2010 they have begun selling crack inside of blind bags: Collectible mini-figures. As you can expect from LEGO, the minifigure bags are not inexpensive, they are also not evenly distributed. They come in boxes of 60 bags that each contains a specific distribution of minifigure bags. The random distribution in series 6 (Yes, already 96 different figures have been released in the last two years) is in the picture.

So, a common question is what is the expected number of random bags I would need to buy before I get a complete set. And since some minifigs are worth having many of while other are not really worth having, another more common question is what is the expected number of bags you need to get , say 5 romans.

S, how to solve it? Let us first begin with a slightly simpler problem. In some forums some people were wondering what is the

*probability*to get a complete set after you buy a given number of bags. Like 60 or 80. This is slightly easier to understand and helps explaining the expected one.

Please note that the queries can be quite varied, 10 romans, a complete set. 2 aliens and 2 space girls. So we will need to represent it in a special way. I chose to represent the input as an array of

**N**elements. Where N is the number of different figures. To each figure, I assigned a id and a weight (according to the random distribution). Thus the array of N elements, will contain in its

**i**-th position the number of minifigs of kind

**i**you want to get.

Just about 50% of the regular readers of the blog already know the answer they would use to this during a contest - dynamic programming. But let me explain it. For dynamic programming to work, we need a recurrence relation. Let us say we have a function F(wanted, bags). wanted is the array of values that I described, bags is the number of bags we will buy. F returns the probability in that setting.

The recurrence works as follows. First we will work the easiest cases, imagine if wanted had a value of 0 for each minifig kind. Then we want nothing... No matter how many bags we get, we already have "nothing". Thus the probability is 1. The second-easiest case is when we will buy 0 bags, then if we do want one figure, the probability is 0, we cannot get any figure by buying 0 bags.

In other case, we will get a new figure

**i**. This event will have a probability

**p**. The number of bags, will be reduced by 1. If minifigure

*i*had a non-zero amount in wanted, then that means it is one of those figures we wanted, but we can then think that the number of minifigures of that kind that we need will be reduced by one. We can thus make a new vector

**wanted'**that will be identical to

**wanted**but with one less of that minifigure. This is the part where the recurrence comes into play. We can call F(

**wanted'**,

**bags**-1) and we will know the probability to reach the object

*given*that we got minifig

*i*in this current step. In the case the minifig was not wanted, we do not modify the vector, thus

**wanted'**=

**wanted**. If we multiply p * F(

**wanted'**,

**bags**-1), we get the probability to get that minifigure in this step and complete the wanted set. The total sum p_i * F(

**wanted'_i**,

**bags**-1) for all

*i*s will be the solution to the recurrence.

The problem with this is the high number of states. We only have 16 kinds of minifigs which allows us to use such a slow algorithm. But still, note that the number of states of the function is equal to (number of different ways to have the wanted vector) x (number of bags + 1). If the initial wanted vector is (1,1, ... 1) (A complete set), the number of ways is 2

^{N}- there are two choices 0,1 for each minfigure. So, if we made (10,10,...) the first vector, the result would be 11

^{N}, thus we need to be careful here.

**The expected**

A slightly faster way to deal with the problem is asking for the expected number of ways you need to buy before getting a certain set. We will use exactly the same

*wanted*vector. But this time the recurrence is different and there is no

*bags*argument. A base case is when

*wanted*is full of zeros, again we have completed the set, but this time the result G(wanted) is the expected number of bags we need to reach the goal. If the goal has already been reached, the expected value is 0.

The recurrence is as follows: Once again, we will have a minifig

*i*and a probability

**p**to get it. Even more, once again the

**wanted**vector might get modified or not. This time G(

**wanted'**) will yield the expected number of bags after the wanted vector was modified. Note that since we already bought a bag, the returned expected number shall be increased by one. Thus we will have the sum: (

**p_i*** (1 + G(

**wanted_i**)) ) for each minifig

**i**. This can be converted to 1 + (sum of

**p_i***G(

**wanted_i**) ).

However, this time there is an extra complication. Remember there is a possibility

**wanted_i**is equal to the original

**wanted**. Always remember: the key rule of dynamic programming is that the recurrence should not have cycles. We do not want cycles in the recurrence. The solution is to think about equations. At the end, we will have a value

**sump**as a sum of all the parts of the recurrence that are not factors of G(

**wanted**) and a

**px**equal to the probability that the vector stays the same. We have:

G(wanted) = sump + px * G(wanted).

Which can be converted to:

G(wanted) = sump / (1 - px)

But this recurrence will no longer contain cycles and thus it solves the problem.

**Implementing**This time I decided to use python, I would probably need some flexibility in the input of the problem to try different combinations of wanted minifigures. It was a little hard getting used to dict() and its shenanigans like not admitting a list as key. Oh well.

#Minifigsseries6distribution:

FIGURES=[('surgeon',3),('sleepy', 3),('butcher', 3),('genie', 3),

('robot', 3),('roman', 3),('ladyliberty',3),

('flamenco',4),('mechanic',4),('spacegirl',4),('skater', 4),

('alien',4),('leprechaum',4),

('minotaur',5),('highlander',5),('bandit',5)]#.............#findsExpectedvalueusingdynamicprogrammingonaveryabstractargument...

mem=dict()

N=len(FIGURES)

TOTAL=0.0forxinFIGURES:

TOTAL+=x[1]defprobabilityToGet(wanted, bags):

key=(tuple(wanted), bags)

res=mem.get(key,None)

if(res==None):

res=0.0

if(wanted==[0]*N):

#ifwedonotwantanymorebags,theprobabilitytogetthemis1

res=1.0

elif(bags==0):

#ifwewon'tbuymorebags,andwestillhavefigslefttoget,

#itisimpossible

res=0.0

else:

res=0.0

foriinrange(0,N):

p=FIGURES[i][1]/TOTAL

if(wanted[i]>0):

#makeacopyofthewantedvector,withlessofthisfigure

vec=[xforxinwanted ]

vec[i]-=1

res+=p*probabilityToGet(vec, bags-1)

else:

#Nothingchanges,justreducethebags

res+=p*probabilityToGet(wanted, bags-1)

mem[key]=res

returnresdefexpectedBags(wanted):

key=tuple(wanted)

res=mem.get(key,None)

if(res==None):

res=0.0

if(wanted==[0]*N):

res=0.0

else:

px=0

sump=1.0

foriinrange(0,N):

p=FIGURES[i][1]/TOTAL

if(wanted[i]>0):

#makeacopyofthewantedvector,withlessofthisfigure

vec=[xforxinwanted ]

vec[i]-=1

sump+=p*expectedBags(vec)

else:

#Nothingchanges,addtopx

px+=p

res=sump/(1-px)

mem[key]=res

returnres#ConvertalistofwantedfiguresasusedbellowintoavectorforourfunctionsdefformatWanted(argwant):

wanted0=[0]*N

forwinargwant:

foriinrange(0,N):

if(FIGURES[i][0]==w):

wanted0[i]+=1

if((len(w)==2)and(FIGURES[i][0]==w[0])):

wanted0[i]+=w[1]

returnwanted0#--------------------------------"Expectednumberofbagsbeforewecompleteaset:"

WANTED_FIGURES=['surgeon','sleepy','butcher','genie','robot','roman',

'ladyliberty','flamenco','mechanic','spacegirl','skater',

'alien','leprechaum',

'minotaur','highlander','bandit']

wanted0=formatWanted(WANTED_FIGURES)(wanted0)#----------------------------------"Expectednumberofbagsbeforeweget10Romansoldiers"

WANTED_FIGURES=[('roman',10)]

wanted0=formatWanted(WANTED_FIGURES)(wanted0)#----------------------------------"Expectednumberofbagsbeforeweget5aliens,1spacegirland3robots:"

WANTED_FIGURES=[('alien',5),('robot',3),('spacegirl',3)]

wanted0=formatWanted(WANTED_FIGURES)(wanted0)#----------------------------------"Probabilitytogetacompletesetafterbuyingonly16bags"

WANTED_FIGURES=['surgeon','sleepy','butcher','genie','robot','roman',

'ladyliberty','flamenco','mechanic','spacegirl','skater',

'alien','leprechaum',

'minotaur','highlander','bandit']

NUMBER_OF_BAGS=16

wanted0=formatWanted(WANTED_FIGURES)(wanted0, NUMBER_OF_BAGS)#----------------------------------"Probabilitytogetacompletesetafterbuyingonly60bags(waitformorethanaminute)"

WANTED_FIGURES=['surgeon','sleepy','butcher','genie','robot','roman',

'ladyliberty','flamenco','mechanic','spacegirl','skater',

'alien','leprechaum',

'minotaur','highlander','bandit']

NUMBER_OF_BAGS=60

wanted0=formatWanted(WANTED_FIGURES)(wanted0, NUMBER_OF_BAGS)

## No comments :

Post a Comment