Monday, December 19, 2011

The minifig problem


Did I ever mention I am an AFOL, which is french for Adult Fan of LEGO?. Well, I am. You know, I have a problem with LEGO and it is that just in 2010 they have begun selling crack inside of blind bags: Collectible mini-figures. As you can expect from LEGO, the minifigure bags are not inexpensive, they are also not evenly distributed. They come in boxes of 60 bags that each contains a specific distribution of minifigure bags. The random distribution in series 6 (Yes, already 96 different figures have been released in the last two years) is in the picture.

So, a common question is what is the expected number of random bags I would need to buy before I get a complete set. And since some minifigs are worth having many of while other are not really worth having, another more common question is what is the expected number of bags you need to get , say 5 romans.

S, how to solve it? Let us first begin with a slightly simpler problem. In some forums some people were wondering what is the probability to get a complete set after you buy a given number of bags. Like 60 or 80. This is slightly easier to understand and helps explaining the expected one.

Please note that the queries can be quite varied, 10 romans, a complete set. 2 aliens and 2 space girls. So we will need to represent it in a special way. I chose to represent the input as an array of N elements. Where N is the number of different figures. To each figure, I assigned a id and a weight (according to the random distribution). Thus the array of N elements, will contain in its i-th position the number of minifigs of kind i you want to get.

Just about 50% of the regular readers of the blog already know the answer they would use to this during a contest - dynamic programming. But let me explain it. For dynamic programming to work, we need a recurrence relation. Let us say we have a function F(wanted, bags). wanted is the array of values that I described, bags is the number of bags we will buy. F returns the probability in that setting.

The recurrence works as follows. First we will work the easiest cases, imagine if wanted had a value of 0 for each minifig kind. Then we want nothing... No matter how many bags we get, we already have "nothing". Thus the probability is 1. The second-easiest case is when we will buy 0 bags, then if we do want one figure, the probability is 0, we cannot get any figure by buying 0 bags.

In other case, we will get a new figure i. This event will have a probability p. The number of bags, will be reduced by 1. If minifigure i had a non-zero amount in wanted, then that means it is one of those figures we wanted, but we can then think that the number of minifigures of that kind that we need will be reduced by one. We can thus make a new vector wanted' that will be identical to wanted but with one less of that minifigure. This is the part where the recurrence comes into play. We can call F(wanted', bags-1) and we will know the probability to reach the object given that we got minifig i in this current step. In the case the minifig was not wanted, we do not modify the vector, thus wanted' = wanted. If we multiply p * F(wanted', bags-1), we get the probability to get that minifigure in this step and complete the wanted set. The total sum p_i * F(wanted'_i, bags-1) for all is will be the solution to the recurrence.

The problem with this is the high number of states. We only have 16 kinds of minifigs which allows us to use such a slow algorithm. But still, note that the number of states of the function is equal to (number of different ways to have the wanted vector) x (number of bags + 1). If the initial wanted vector is (1,1, ... 1) (A complete set), the number of ways is 2N - there are two choices 0,1 for each minfigure. So, if we made (10,10,...) the first vector, the result would be 11N, thus we need to be careful here.

The expected
A slightly faster way to deal with the problem is asking for the expected number of ways you need to buy before getting a certain set. We will use exactly the same wanted vector. But this time the recurrence is different and there is no bags argument. A base case is when wanted is full of zeros, again we have completed the set, but this time the result G(wanted) is the expected number of bags we need to reach the goal. If the goal has already been reached, the expected value is 0.

The recurrence is as follows: Once again, we will have a minifig i and a probability p to get it. Even more, once again the wanted vector might get modified or not. This time G(wanted') will yield the expected number of bags after the wanted vector was modified. Note that since we already bought a bag, the returned expected number shall be increased by one. Thus we will have the sum: ( p_i * (1 + G(wanted_i)) ) for each minifig i. This can be converted to 1 + (sum of p_i*G(wanted_i) ).

However, this time there is an extra complication. Remember there is a possibility wanted_i is equal to the original wanted. Always remember: the key rule of dynamic programming is that the recurrence should not have cycles. We do not want cycles in the recurrence. The solution is to think about equations. At the end, we will have a value sump as a sum of all the parts of the recurrence that are not factors of G(wanted) and a px equal to the probability that the vector stays the same. We have:

G(wanted) = sump + px * G(wanted).

Which can be converted to:

G(wanted) = sump / (1 - px)

But this recurrence will no longer contain cycles and thus it solves the problem.

Implementing This time I decided to use python, I would probably need some flexibility in the input of the problem to try different combinations of wanted minifigures. It was a little hard getting used to dict() and its shenanigans like not admitting a list as key. Oh well.

# Minifigs series 6 distribution:
FIGURES = [ ('surgeon',3), ('sleepy', 3), ('butcher', 3), ('genie', 3),
('robot', 3), ('roman', 3), ('ladyliberty',3),
('flamenco',4), ('mechanic',4), ('spacegirl',4), ('skater', 4),
('alien',4), ('leprechaum',4),
('minotaur',5), ('highlander',5), ('bandit',5) ]

#.............


#finds Expected value using dynamic programming on a very abstract argument...
mem = dict()
N = len(FIGURES)
TOTAL = 0.0
for x in FIGURES:
TOTAL += x[1]


def probabilityToGet( wanted, bags ):
key = (tuple(wanted), bags)
res = mem.get(key, None)
if ( res == None) :
res = 0.0
if (wanted == [0]*N):
#if we do not want any more bags, the probability to get them is 1
res = 1.0
elif (bags == 0) :
#if we won't buy more bags, and we still have figs left to get,
#it is impossible
res = 0.0
else:
res = 0.0
for i in range(0,N):
p = FIGURES[i][1] / TOTAL
if (wanted[i] > 0) :
#make a copy of the wanted vector, with less of this figure
vec = [x for x in wanted ]
vec[i] -= 1
res += p * probabilityToGet(vec, bags-1)
else:
#Nothing changes, just reduce the bags
res += p * probabilityToGet(wanted, bags-1)

mem[key] = res

return res

def expectedBags( wanted ):
key = tuple(wanted)
res = mem.get(key, None)
if ( res == None) :
res = 0.0
if (wanted == [0]*N):
res = 0.0
else:
px = 0
sump = 1.0
for i in range(0,N):
p = FIGURES[i][1] / TOTAL
if (wanted[i] > 0) :
#make a copy of the wanted vector, with less of this figure
vec = [x for x in wanted ]
vec[i] -= 1
sump += p * expectedBags(vec)
else:
#Nothing changes, add to px
px += p
res = sump / (1 - px)
mem[key] = res

return res

# Convert a list of wanted figures as used bellow into a vector for our functions
def formatWanted(argwant):
wanted0 = [0]*N
for w in argwant:
for i in range(0,N):
if (FIGURES[i][0] == w):
wanted0[i] += 1
if ( (len(w)==2) and (FIGURES[i][0] == w[0]) ):
wanted0[i] += w[1]

return wanted0

#--------------------------------
print "Expected number of bags before we complete a set:"
WANTED_FIGURES = [ 'surgeon', 'sleepy', 'butcher', 'genie', 'robot', 'roman',
'ladyliberty', 'flamenco', 'mechanic', 'spacegirl', 'skater',
'alien', 'leprechaum',
'minotaur', 'highlander', 'bandit' ]
wanted0 = formatWanted(WANTED_FIGURES)
print expectedBags(wanted0)

#----------------------------------
print "Expected number of bags before we get 10 Roman soldiers"
WANTED_FIGURES = [ ('roman',10) ]
wanted0 = formatWanted(WANTED_FIGURES)
print expectedBags(wanted0)

#----------------------------------
print "Expected number of bags before we get 5 aliens, 1 space girl and 3 robots:"
WANTED_FIGURES = [ ('alien',5), ('robot',3), ('spacegirl',3) ]
wanted0 = formatWanted(WANTED_FIGURES)
print expectedBags(wanted0)

#----------------------------------
print "Probability to get a complete set after buying only 16 bags"
WANTED_FIGURES = [ 'surgeon', 'sleepy', 'butcher', 'genie', 'robot', 'roman',
'ladyliberty', 'flamenco', 'mechanic', 'spacegirl', 'skater',
'alien', 'leprechaum',
'minotaur', 'highlander', 'bandit' ]
NUMBER_OF_BAGS = 16
wanted0 = formatWanted(WANTED_FIGURES)
print probabilityToGet(wanted0, NUMBER_OF_BAGS)

#----------------------------------
print "Probability to get a complete set after buying only 60 bags (wait for more than a minute)"
WANTED_FIGURES = [ 'surgeon', 'sleepy', 'butcher', 'genie', 'robot', 'roman',
'ladyliberty', 'flamenco', 'mechanic', 'spacegirl', 'skater',
'alien', 'leprechaum',
'minotaur', 'highlander', 'bandit' ]
NUMBER_OF_BAGS = 60
wanted0 = formatWanted(WANTED_FIGURES)
print probabilityToGet(wanted0, NUMBER_OF_BAGS)



No comments :