Saturday, May 19, 2012

Topcoder SRM 543 - The non-fail shock

Here it is, a very late blog post about this. I would not put any confidence in my solutions before system tests, so I did not write this post during the challenge phase as usual.

Div1 250, the one with xor

You have (1 <= L <= R <= 4000000000), return the total xor between all numbers between L and R, inclusive.

So, you can assume that 4*109 is quite large and just simulating the xors is not a great idea.

During the match: Let us say you can have a function f(X,i) that counts the total number of times that bit position #i is turned on (equal to 1) in all numbers between 1 and X, inclusive. Then, for each bit position from 0 to 32, inclusive, then that bit will have a 1 in the result if and only if { f(R,i)-f(L-1,i) is odd }. You can test this assertion by just trying out how xor works between each position. We split the problem into two instances of f(X,i) to just have less variables to handle.

I thought f(X,i) was going to be easy to implement. And it sort of is. But It took me longer than I wanted to come up with it. It is becoming a habit that my 250 is usually surprisingly slow, even though I do not remember really slowing down while implementing it. It is almost as if I am initially calibrate to work slower and the speed increases during the game. I need to come up with a way to start in fast mode.

Anyway, let us say we have a number (in binary) 1101110?1011, the ? marks the bit at position i. Imagine that to the left of ? we placed any number strictly less than 1101110, for example: 110010?---- then ? must be 1 (we are counting those) and the ---- can be any number you want. Thus for each possible value for the left side bellow the current one, there are 2i ways to make the right side.

Imagine that the left side was exactly 1101110 then ? still has to be 1 (but if the i-th bit of X is 0, then it cannot be): 110111011---- , this time, you cannot put just any value in ----. They have to be less than or equal to the right side: 11102 = 14. Thus, if the i-th position is 1, then there are additional (right side + 1) different values)


That approach may have been a little too long to implement. There are various other approaches. For example, you can move the R-(L-1) stuff so that it is done in the main function. Let f(X) the function that calculates the xor between the numbers between 1 and X, inclusive. Then f(R) ^ f(L-1) is equal to the requested result (That is how xor works, it is its own inverse operation).

Then, to calculate f(X), you can do the same business picking each bit position between 0 and 31. Counting the number of times it appears between 1 and X as turned on. And if the number is odd, then the i-th bit position is 1 in the result.

The two approaches are mostly the same, but I like this second one a little better.

long getXor(long X) 
long res = 0;
for (long i=0; i<32; i++) {
long A = X >> (i + 1);
long B = X % (1 << i);
// X = A?B (B has i bits) ? is i-th bit.

// for each number smaller than A, there are
// (1<<i = 2 raised to the i-th power to have a valid
// number with i-th bit = 1:
long p = A * (1 << i);

// if i-th bit is on, then there are B+1 remaining valid numbers.
if ( X & (1LL << i) ) {
// additional B+1 values:
p += B + 1;
// if the number is odd, then this bit is 1
res |= ( (p & 1) << i);
return res;
long getXor(long L, long R)
return getXor(R) ^ getXor(L-1);

There are many other approaches. The most interesting one has to be the one by [b]xOberon[/b]. It is like this:

long long getXor(long long aL, long long aR) 
unsigned int L = aL, R = aR, r = aR;
while (L < R) {
r ^= L;
L ++;
return r;

How come it does not time out? Many were saying that it was unfair that a solution as easy as this one would pass. But honestly, xOberon knew what he was doing. If you were to try simple iteration in C++, it would time out unless you considered the key optimization: The maximum value of R is too large for signed 32 bits integers. But it is still small enough to fit an unsigned 32 bits integer. This means that by using unsigned integers, you can keep the operations in the 32 bits world. Keeping in mind that the topcoder servers are 32 bits, this means that xor operations will need only one instruction if done right. The next component is the g++ compiler optimizations. Topcoder uses +O2, I am sure this code gets optimized so that R is a single register and that allows R ^= L to be a single XOR assembly instruction. Allowing the code to run in 1.7seconds even though it needs 4*109 steps!

Div1 500, the one with traveling through islands

Ok, statement is too complex to abridge. Here is a link to the statement:

The first idea is to see it as a typical dynamic programming. Dynamic programming because it is obvious that you will always move right, up, or right AND up, but will never need to go back down or left, thus the sub-problems are acyclic. dp[x][y] returns the minimum time needed to reach top right, if you are currently standing in the x-th island and the y-th dock in that island. The base case is dp[w][length] = 0 (w = |width|). Because you are already in the objective place. The wanted result is dp[0][0]. For every pair (x,y), then dp[x][y] can either be:

  • 1/walk + dp[x][y+1]: Meaning that you move 1 unit up at walk speed and reach (x,y+1), from then you must still solve the rest of the problem, and the minimum time from that place is dp[x][y+1].
  • EuclideanDistance( (0,y), (width[x],y2) )/speed[x] + dp[x+1][y2]. For every y2 > y. This means that instead of moving up, you decide to cross the river in a diagonal (maybe straight line if y=y2) and reach dock #y2 at island (x+1). You move width[x] units horizontally and y2-y units vertically in a straight line, thus we need the Euclidean distance. But you move at velocity speed[x]. After reaching (x+1,y2), you still need to reach the final objective and the minimum cost is dp[x+1][y2].

If for every dp[x][y] we try all O(length) values for y2, we will likely fail, because length <= 100000 is too large for a O(length*length*w) solution.

I had two ideas initially. The first one was "Maybe binary search?" and the second one was "Maybe you can find the result for dp[x][y] based on the result of dp[x][y+1]?"

The ternary search idea was, for each dp[x][y], simply do the ternary search to find the best y2 that yields the minimum result. The question is whether the function : EuclideanDistance( (0,y), (width[x],y2) )/speed[x] + dp[x+1][y2] is ternary search-friendly for y2. To me, it was intuitive at first that it was. But here is a proof anyway: The key realization is that dp[x][a] will always be greater than dp[x][a+K] for any a, and positive K. In other words, dp[x][a] increases as a decreases. This makes sense because you always end up further than the top position. Thus dp[x+1][y2] will increase as y2 decreases. The EuclideanDistance() function, on the other hand, will decrease as y2 decreases, because the straight line will be closer. The sum between a strictly decreasing function and a strictly increasing function will have only one local minimum.

Assume that dp[x+1][a] makes a ternary-search-friendly curve, then EuclideanDistance( (0,y), (width[x],a) )/speed[x] + dp[x+1][a] will also be ternary-search-friendly; the Euclidean distance part is a single straight line equation if a is the dependent variable; adding a curve with only one local minimum to a straight line, will result in a curve with only one local minimum still. Then, just demonstrate that dp[w][a] makes a good curve (it does because it is a straight line too!)

So, ternary search gives a correct answer. I implemented ternary search and passed examples. But I always test the largest case I can imagine in Topcoder's servers before submitting the solution. Time out!. It seems that O(log(length)*length*w) is too slow!

That is when I panicked. The only other idea I had was that there was somehow a way to optimize the search for the answer for dp[x][y] based on what we found for dp[x][y+1]. But how?

I tried many ideas that were many wrong or did not optimize things too well.

I was about to give up, but around 4 minutes before the end of the coding phase I had an idea!.

Remember that the Euclid() is a straight line when we make a graphic for the values of y2? In fact, Euclid() depends only on the difference between y2 and y. This means that the Euclid used between y2 and y2 + M is the same as the one between y and y + M. Now, if for y2 we know the value of M, then we should not pick a smaller new M for y. And we cannot pick a M larger than M+1.

But why? It is difficult to explain without drawing it:

Well, the two top pictures are the values of y2 that we will try. Let us see why is it not necessary to try the bottom ones.

For the bottom versions. We will simplify notation by defining E(M) - the cost to move in a straight line between (x,y) and (x,y+M) and D(y) which is equal to dp[x+1][y]. So, the total cost to move from (x,y) to (x+1,y+M) will be: E(M)+D(y).

From the ternary search analysis we can see that D(y1) >= D(y2) for (y1 <= y2). But there is a more interesting thing to conclude: D(y - K - 1) - D(y - K) <= D(y) - D(y-1). The increase of cost between D(y) and D(y-1) is greater than or equal to the increase of cost between D(y - K) and D(y - K - 1) (for positive K). In order to verify this, simply notice that the more distance between y and the top position, the higher the straight lines we can use to cross rivers so the increase (The derivative) either is equal or smaller).

Given the optimal M for (y+1), then we have the following in-equation:

E(M) + D(y+1+M) <= E(M-1) + D(y+1+M-1)
E(M) + D(y+1+M) <= E(M-1) + D(y+M)
D(y+M)-D(y+1+M) <= D(y+M-1)-D(y+M)

If we add them together:

E(M) + D(y+1+M) + D(y+M)-D(y+1+M) <=  E(M-1) + D(y+M) + D(y+M-1)-D(y+M)
E(M)            + D(y+M)          <=  E(M-1) +          D(y+M-1)

Which means that picking (x,y) -> (x,y+M) is always at least as good as picking (x,y) -> (x,y+M-1). Invalidating the bottom left idea.

For the bottom right example, consider that E(M+1)-E(M) is smaller than E(M+2)-E(M+1). Once again, the derivative. Just imagine the straight lines, as the height increases, the differences of the lengths of the lines increases. Take the derivative of the Euclidean distance in case of doubt.

 E(M) + D(y+1+M) <= E(M+1) + D(y+1+M+1)
 E(M) + D(y+1+M) <= E(M+1) + D(y+M+2)

 E(M+1)-E(M)     <= E(M+2) - E(M+1)

 E(M+1)+D(y+1+M) <= E(M+2) + D(y+M+2)

Which means that picking (x,y)->(x,y+M+1) will always be at least as good as picking (x,y)->(x,y+M+2). Great, isn't it?

double dp[51][100001]; 
struct EllysRivers
vector<int> width, speed;

double calc(int x1, int y1, int x2, int y2) {
double dx = width[x1];
double dy = y2 - y1;
double d = sqrt(dx*dx + dy*dy);
return (d / speed[x1]) + dp[x2][y2];

double getMin(int length, int walk, vector <int> width, vector <int> speed)
this->width = width;
this->speed = speed;
int w = width.size();
// base case, for the last island ...
dp[w][length] = 0.0;
for (int i=length-1; i>=0; i--) {
dp[w][i] = ((double)1.0)/walk + dp[w][i+1];

for (int x=w-1; x>=0; x--) {
dp[x][length] = dp[x+1][length] + (double)(width[x]) / speed[x];
int M = 0;

for (int y=length-1; y>=0; y--) {
dp[x][y] = ((double)1.0)/walk + dp[x][y+1];
double z1 = calc(x,y, x+1, y + M);
double z2 = calc(x,y, x+1, y + M + 1);
if (z2 < z1) {
M = M + 1;
dp[x][y] = std::min(dp[x][y], std::min(z1,z2) );

return dp[0][0];

I will admit that I did not come even close to a formal proof during the match. Instead I just assumed things would work like this after imagining the straight lines rotating. I was desperate and gave it a try. I submitted with about a minute left before the end of the coding phase.

Challenge phase, outcome, etc

So, I knew two things. My 250 got low score and was more complicated than what other people did. My 500 stood on an idea that I was not incredibly sure about. I really did not feel optimistic.

I opened some 250s, while other people were already challenging the wrong ones. I found a 250 with: if ( L==R ) return 0; Which seemed like a bug to me, since when L=R, the result is L. I read the constraints and "noticed" that L was strictly smaller than R. So, this was not a corner case... Some time later, this submission got challenged. I re-read the constraints once again, and L=R was actually allowed in the constraints... what a waste of 50 points. Though it is strange, because L=R=5 was an example case, so this was a strange mistake to find.

Got 1992 rating. It is always good to re-bounce after a bad match with an even better score than before.


vexorian said...

I actually had quite a mistake in the explanation of 500. The updated paragraph is this:

{For the bottom right example, consider that E(M+1)-E(M) is smaller than E(M+2)-E(M+1). Once again, the derivative. Just imagine the straight lines, as the height increases, the differences of the lengths of the lines increases. Take the derivative of the Euclidean distance in case of doubt.}

It used to say basically the opposite :)

max_ashi said...

I love your explanations.. So clear to understand.. and every time i learn something new about the language as well.. Keep writing the editorials..

Connect4 said...

Another alternative 250 approach: if x is even, then x^(x+1) = 1. Further, if. Is a multiple of 4, then the range from x to x+3 xors to 0. So, removing cases where r-l is small, you have to XOR a maximum of 8 numbers.

Either way, thanks as was said above. These are fantasticd editorials

Boris Dibrov said...

Hi. Thank you for 500 explanation. It was fun to observe there was no same solutions for xor problem. I can explain O(1) for 250. Here it comes...

Consider xor result for sequence [0..N]. While the calculation the last bit would have a period of 4: 011001100110... and reminding bits would have period of 2. That's why we can treat them differently and finally merge to one number.

struct EllysXors {
long long solve(long long n) {
// for last bit calculation we must xor last and last but one bits
long long lastBit = ((n / 2) ^ n) % 2;
// the reminding bits comes to 0 for every odd number in sequence
long long residue = n % 2 == 0 ? n : 0;
// replace last bit in residue by lastBit
return lastBit | (residue & ~1);
long long getXor(long long L, long long R) {
return solve(R) ^ solve(L - 1);