Tuesday, August 30, 2011

SRM 516 - So, I forgot I have a blog...

So, there we go, SRM 516 I was feeling under normal levels of energy today. Hence why I admitted in the chat that I was kind of foreseeing dropping below 2100. If I were to listen my mother I would know that when you feel down you shouldn't do something I would have skipped the match. On the other hand, that's what wusses do. You wouldn't have good SRMs if it wasn't for the bad ones. And so it begins...

Div1 500 - RowsOrdering
(problem statement)

Meh. You know, it was very difficult for me to understand this problem. I had a lot of difficulty parsing the problem so that the examples made sense. Brain was just not working correctly.

15 minutes after the beginning, still have no idea what the problem is about. I know you have to pick some permutations, but I cannot parse how exactly the permutations affect the problem. I open division summary, tons of solutions for 250. I open it sometime after, and 10 guys already have solutions for two problems. It seems that 500 is 'easy'. But I still have no idea what to do. Time passes and more and more people solve it. I know I have to do it.

I eventually, just deconstructed the examples, and took a conclusion about how it works. It turns into a greedy algorithm, a very easy greedy algorithm. I code it, but it is very late, only few minutes left, so if I want to finish the 250 I have to rush. And I have to finish the 250 because everyone seems to have solved two problems... I pass examples and submit. Rush to 250...

This is what I submitted: http://community.topcoder.com/stat?c=problem_solution&rm=309666&rd=14541&pm=11114&cr=22652965 . Bonus points if you find both of the stupid mistakes I made.

Div1 250 - NetworkXOneTimePad
(problem statement)

Out of the scores I know that this should be a very quick problem to solve. I also know that I have to get a good score because my score for the medium is low and I am not sure what I did there. So I open the problem.

Initially confused about int result and no indication of what to do if the result is large. - Of course! That must mean that the result will always fit 32 bits int. But why. I was initially thinking of some approach in which you pick each bit and multiply by 2 if it can be different. After examining the examples, that does not really work.

I eventually notice the obvious thing. Given two fixed plaintext and cipheredtext, the key can be found by a simple xor , (reversing the process). Because of this, and because each ciphered texts must forcefully come from one of the original plaintexts, then there is a potential key for every pair of plain and ciphered texts. We can just pick all those pairs, get the potential key and then try it out.

Some difficulties when implementing: After you have a potential key, you have to convert each plain text to a new ciphered texts. The input ciphered texts must be a subset of the ones you found. There is also the chance you find the same potential key multiple times, so you have to avoid repetition.

I named a function xor and I was getting very strange mistakes. It seems someone in charge of c++ thought it would be hilarious to do a define that replaces the word xor with ^ (xor operator). Fortunately, by then my brain was starting to work, else I don't think I would have managed to understand what was going on. I replaced the xor function with bor.

The result is some ugly, very ugly code: http://community.topcoder.com/stat?c=problem_solution&rd=14541&rm=309666&cr=22652965&pm=10846.

In fact, it over does things. First of all, the constraints forbid repetitions in the plaintext array. And since all of the cipheredtexts must come from then, we can just get the key for each pair ( plaintext[i], cipheredtext[0] ) (Fixing ciphered text 0). This guarantees that potential keys are never tried twice. Also, instead of coding the plaintexts, we can decode each cipheredtext , then check if the decoded result exists in the input.

The following is the refactored code, it is much better:

#define for_each(q, s) for(typeof(s.begin()) q=s.begin(); q!=s.end(); q++)
struct NetworkXOneTimePad
{
// xor between two strings...
string sxor(string a, string b) {
string s= "";
int n = a.size();
for (int i=0; i<n; i++) {
s += ( (a[i]==b[i]) ? '0' : '1' );
}
return s;
}
int crack(vector <string> plaintexts, vector <string> ciphertexts)
{
int cnt = 0;
// plaintexts as a set, so that we can find decoded ciphertexts
// on them.
set<string> plaintextset( plaintexts.begin(), plaintexts.end() );
for_each(p , plaintexts) {
// Find the key.
string key = sxor(*p, ciphertexts[0]);

// Each of the ciphertexts must become one of the plaintexts
// when decoded.
bool good = true;
for_each(q , ciphertexts) {
good &= ( plaintextset.count(sxor(*q, key)) );
}
if (good ){
cnt ++;
}
}
return cnt;
}
};


I was barely able to submit 250, so I have two very slow submissions, the worse was yet to come.

Aftermath
Of course, 500 got challenged. If it wasn't for the dumb mistakes, I could have gotten top 100 even with the slow submissions. It seems many people managed to make mistakes in both problems. Somehow I am still above 2100, 2 points above... Is the worst of this streak done yet?

Personal opinion warning: I did not really like the 500. For an easier than usual problem, the speed becomes very important and when most of the problem solving time comes from decoding the statement, that's not good.

Div2 250 - RowsOrdering, explained
(problem statement)
Now this was more interesting than your average div2 250. The trick is to note that the special property - That each sequence of even length has the same number of x and o - forces a specific pattern.

First, let us thing of the (second) simplest contiguous sequence of even length. There are two possibilities, xo and ox.

Now try a sequence of length four. First of all, it forcefully needs 2 x and 2 o. But something like xxoo is invalid, because the previous rule still forces all of the length-2 subsequences. So, in fact, we need xoxo or oxox. Note that in both of those cases, there is a length-2 sub-sequence in the middle (ox and xo) , the rule still applies.

In fact, the message will forcefully have to be in the pattern xoxox.... or oxoxoxo... . Because the rule for length-2 subsequences will always have to apply, and once it applies, it works for all even length subsequences.

The constraints guarantee exactly one result. So we should just check the two alternating sequences of the same size as the input. xox... and oxo... and pick the one that fits the pattern.

string reconstruct(string message)
{
//s variable decides whether to begin with o or x
for (int s=0; s<2; s++) {
string fixed = message;

// pick the first character
char ch = (s? 'o' : 'x' );

bool bad = false;
// build the alternating message, check against
// the input, if at one point one of the
// known characters in the message contradicts
// the one we are building, then we can't use that
// case.
for (int i=0; i<message.size(); i++) {
if ( message[i] == '?' ) {
fixed[i] = ch;
} else if ( message[i] != ch ) {
bad = true;
}

ch = ( (ch=='x') ? 'o' : 'x');
}
if (! bad ) {
return fixed;
}
}
//constraints guarantee this won't happen.
return "!";
}


Part two is coming soon...

5 comments :

Shuaib said...

Thanks for the explanations vexorian. If only I had noticed for div1-250/div2-500 that checking keys for a single ciphertext is enough, it would have stopped me from making the mistakes that I made trying to do optimizations.

Anyway, I notice that even if you don't do that, and actually go through all the ciphertexts generating keys and then checking if they are valid, produces an algo that is at worst case O(50^5), but such an implementation passed in practice room. I wonder why... whether there wasn't a test case that forced the worst case, or whether 50^5 is ok. Any thoughts?

Shuaib said...

Actually I figured it out. Thanks :)

vexorian said...

Ah sorry, I didn't notice it. 50^5 is not that big for TC servers assuming the constant factor is not very large.

Usually, my heuristic to know if something will pass in TC is to find 50^5 = 312500000 If the number has 10 digits, then it will most likely fail. But if it has 9 digits it may actually pass assuming there is not much overhead.

ibroker said...

Could you explain what the div1-500 problem is?
I can't understand what the problem is over 1 hour. T.T

vexorian said...

I tried in part 2.

I then tried again in the editorial.

But maybe misof's explanation is still the best: http://apps.topcoder.com/forums/?module=Thread&threadID=719177&start=0