Tuesday, December 04, 2012

Topcoder SRM 562 editorial preview (div1 1000)

I do not seem to finish this editorial. I am already one day after the deadline (first problems were difficult and explanations turned out be image intensive) and I still have to finish div1 250 and div2 500 (will need more images, that takes time). So here is a draft of the div1 1000 editorial. At least that should help anyone who needs it. Note that this is still a draft, and thus I guarantee you that it has far more mistakes than usual (and that is saying a lot).


Let us quickly quickly explain the problem statement: We are given a tree. Count the number of ways to assign labels 1,2,...n to each node in the tree such that the sets of vertices {1,2,..k,}, {2,3,...,k+1}, ... {n-k+1,...,n} are each connected components. This editorial will call these sets 'target' sets.

Small k versus large k

A good starting point is to notice that the problem can change dramatically depending on how large the value of k is. The main difference between small and large values of k is that when k is large enough, there will exist a group of labels that are common to every target set. For example, with n=10 an k = 7:

1 2 3 4 5 6 7
  2 3 4 5 6 7 8
    3 4 5 6 7 8 9
      4 5 6 7 8 9 10

The labels {4,5,6,7} exist in each of the target sets. Which does not happen with small values of k, for example n=6 and k=3:

1 2 3
  2 3 4
    3 4 5
      4 5 6

There is no vertex that is present in all of the sets. The requirement for us to consider a value of k large is that the set of the k smallest values intersects with the set of the k largest values. In other words: 2*k > n. This small difference makes a case much harder than the other. Let us begin with the small case, which is the simplest.

2*k <= n

In order to understand this case, let us think about what is needed to transition from the first target set to the next one making sure they stay connected. Imagine that we already found out the connected component of the first target set, the nodes with the k smallest labels. Since we are dealing with a tree and its subtrees, these labels will form a tree. Like in the following two examples for k=3:

The tree can so far have any shape, but with k connected elements. It is interesting what happens when we transition to the next target set. From {1,2,...k} to {2,...,k+1}. What it means is that the new tree will have {1} removed and {k+1} added, but it still must be connected. This has a consequence, in two of the examples above, there are only two super tree of 4 elements that allows the condition. The third example is simply not correct, if we add a {4} and remove {1} the tree will not remain connected.

Try making more transitions. Also consider that at the opposite part of the tree, the labels with larger values, the structure works the same when moving on from the largest k labels and removing {n} and adding {n-k}... At the end you will find that the tree must follow a specific structure. For example, when k=4 and n=10:

  • In the red square, we have a formed by the k smallest labels. They must follow a proper hierarchy structure. In every branch, the smallest label must be a label and the parent of each label must always be greater, up until k (4 in this case) which must be the root of the subtree. The reason for the hierarchy is that when transitioning from each target set that contains two of the smallest labels, the smallest label has to be removed, thus this label cannot act as a bridge between any other two larger labels.
  • Inside the green square, a chain of all the nodes from k to n-k+1. This must be a chain, because of how transitions work. When transitioning from {3,4,5,6} to {4,5,6,7}, 3 must be connected to 4. When transitioning from {4,5,6,7} to {5,6,7,8} then 4 must be connected to 5 and so and so...
  • Inside the blue square, also a subtree. This time it is the k largest labels. This subtree must also have a proper hierarchy, this time each parent must be smaller than each of its children.

The solution for this small case is already accessible. Given the tree, we can pick the starting and ending node of the chain. We can tell the chain must contain n - 2*(k-1) elements. Thus the distance between the two picked points must be n - 2*(k-1) - 1. These end points would get labels k and n-+1, respectively. Then we also need to verify that the subtrees rooted at each of the end points contain exactly k elements. If all checks up, the result is equal to the number of ways to assign labels to the first sub-tree multiplied by the number of ways to assign labels to the second sub-tree. This number of ways can be calculated using a dynamic programming approach.

How to assign labels to a sub-tree following a hierarchy

Let us quickly sum up the dynamic programming approach. We are given a subtree rooted at a given vertex. We also need to know its parent so that we understand what direction not to go when calculating the children (For different end points, the parents of the subtree roots change).

The base case is when the subtree contains only the root. Then we can only assign one label to the root (Either the lowest or the highest, depending on which of the subtrees we want to count, but for the sub-problem it does not matter).

The remaining case is when the root has children. We can assume that we already know the ways to assign labels to each children following the hierarchy. For example, when there are two children subtrees one with n1 nodes in total and the other with n2 nodes in total, we can assume that currently the labels that were assigned to each subtree are in the form {1,2,...,n1} and {1,2,....,n2}. We want to assign labels {1,2,...,n1+n2} to the two subtrees at the same time. This is equal to, out of a total n1+n2 labels, picking n1 labels to go to the first subtree. After picking which labels go to each subtree, we replace the original labels {1,2,...n1} and {1,2,...,n2} using the same order to make sure the hierarchies are not lost, so there is only one possible order for the labels. Then we put a label to the root, it must be larger than all the other labels, so there is only one way to assign a label to the root. This means that in total there will be: That is C(n1+n2,n1) * ways(first subtree) * ways(second subtree) ways to assign labels to the larger subtree still following the hierarchy. This approach can be adapted to when there are more than 2 children: First combine the first two subtrees into one hierarchy, then combine this new hierarchy with the third subtree and so and so.

2*k > n

The harder case is harder because of the labels that are common to each of the target sets. Each of the target sets must be a connected component. It is possible to show that this common intersection must also be a connected component. This intersection will have the labels that are not the largest and not the smallest. We will calls this group of labels 'middle'.

Once again we have to see what happens when we transition from a target subset to the next one. This time considering that the middle labels are always connected and remain so. Let us go back to the case with n=10, k=7.

  • The first target set is {1,2,3,4,5,6,7}. {4,5,6,7} must form a connected component. Thus we only need to add {1}, {2}, {3} to whatever structure the connected component has. We can add {1,2,3} as a single subtree. Or in separate partitions. We can connect these subtrees to any of {4,5,6,7} and the structure {1,2,3,4,5,6,7} will be connected.
  • But then we transition from {1,2,3,4,5,6,7} to {2,3,4,5,6,7,8} note that we deleted the smallest label, and added a new one. We once again have a process in which the smallest label is removed when performing transitions. In short, and reusing the knowledge from the previous analysis, all the subtrees of {1,2,3} that we added to {4,5,6,7} must follow a hierarchy.
  • The same will happen for {8,9,10}, all of their subtrees must follow the hierarchy.
  • {1,2,3,4,5,6,7} must be connected without the help of any of the vertices in {8,9,10}, this means that the subtrees of {1,2,3} will not have vertices from {8,9,10}. Similarly, the subtrees of {8,9,10} cannot have vertices from {1,2,3}.
  • Note that {4,5,6,7} can have any shape as long as it is a connected tree. It is not necessarily a chain. And the various subtrees do not necessarily have to connect to 4 or 7.

The following is a possible labelling for a large graph (n=28, k=19):

The white nodes would get middle labels. Let us call the remaining labels small or big depending on whether they are smaller than the middle labels or larger. The red nodes get small labels and the blue nodes get big labels. Thus any label assignment in which the white nodes get middle labels, the red ones get small labels (but in a correct hierarchy) and the blue ones get large labels (also in a correct hierarchy) will work. What is important is that the middle labels can make any shape as long as they are connected and once you assign a big label to a node, all of its children must have big labels too.

One thing is to know how the graph should look after assigning the labels, another is to actually count the ways to do it.

How to count them

It is a complicated process with many decisions. Which group of nodes will receive middle labels? Then which of the children will be small and which big? Then we also need to assign the labels to the big and small subtrees such that they keep a certain hierarchy.

The first simplification comes from just taking each middle node individually. For each vertex, count the number of valid labellings in which that vertex receives a middle label. If we sum these results together the total will contain each of the valid results repeated by the number of middle labels. We can just divide the sum by this number and we get the overall result.

When calculating the number of ways to assign the labels starting at a middle vertex, we can ignore the actual labels we place on middle vertices and then multiply by the factorial of the (number of middle labels).

What we have left is the following recursive logic: Let f(p,x, a,b) be the number of ways to assign a small labels and b big labels to the sub-tree with root x (and the parent of the root is p):

  • In a base case, we have a = (total number of nodes in the sub-tree) and b = 0. This means that the node and all its children must be given a small label. We also know the number of ways to label this following a hierarchy, it is just the same sub-problem we solved for the easier case.
  • Likewise, when a = 0 and b = (total number of nodes) then we have to make the complete sub-tree have a big label.
  • Else there are plenty of options. We can know for sure that the root has to belong to the middle. Some of the children may also belong to the middle. The rest should be big or small. In order to explain the logic, let us say the root has two children y and z, and we already know the results f(x,y, ?,?) and f(x,z, ?,?) for all pairs of combinations of big and small. Then the decision involves deciding how many of the big (b) and small (a) labels to assign to each subtree and combine their hierarchies afterwards (Using the same logic as above, e.g: C(total number of big labels, big labels used in the first child).). We can extent this logic to work with more than two children.


It has been a long journey, but we have finally reached the solution:

http://pastebin.com/4NmjuNnu (sorry, blogger is once again failing to allow me to paste my code.).

No comments :