Wednesday, February 16, 2011

If all you have is a hammer...

I have not been commenting SRMs as usual but anyway, I did actually write two topcoder editorials since the last time I posted something. Every time I write those editorials, there is something that really, really annoys me. It is that TC wiki does not have a method like the topcoder forums to use a handle tag ([h]vexorian[/h]) and instead you will need to manually do something like <a href='http://www.topcoder.com/tc?module=MemberProfile&cr=22652965' class='coderTextYellow'>vexorian<</a>. Looking for handle codes and colors is very hard and you will need to do this for every handle name you mention in the whole editorial. It is very annoying.

it is very mechanical work, and ever since the first time I kept having fantasies about coding something that would do it for me. After Sunday's editorial, I finally decided to do it. I used python as an excuse to continue learning python. But it is just python 2.0 because I did not have python 3.0 installed in my computer. It will translate every [h]name[/h] instance into TC wiki-friendly HTML. It uses a web connection to download profile pages because I am not sure if there are data feeds for just handle names and not for whole competitions. Maybe there are... Beware that it is very noobish python. I spent most of the coding time googling python's documentation to learn about how to download stuff and use regexes.

For some reason, blogger does not allow something as simple as sharing files that are not images or video, so here is the raw python code.

Update: Oh well, it seems blogspot fails too much for code and does not even bother to add a scroll bar when it is too wide.

New update: Hey! The power of rockCSS has saved me once again...

Update: The new version has an interactive mode, just call the script without arguments, it will allow you to type handle names and will generate an anchor tag for each handle you type.


Update: Fix John Dethridge bug and it now works with the new profile pages.

# Topcoder handle tag script for HTML files you make manually. 
# it is probably a bad idea to use it in anything else because it abuses TC's
# server to find the information.
#
# err let us say it is released under the zlib/libpng license
# http://www.opensource.org/licenses/zlib-license
# (c) Victor Hugo Soliz Kuncar, 2011
#
# Update August 2012,
# Fix John Dethridge bug and make it compatible with new profile pages
#
import urllib, re, sys, os, time

#
# RatingName feature removed because it is no longer trivial to do it.
#

# Rudimentary proxy support
# For example:
# PROXIES = {'http': 'http://7.7.7.7:3128'}
PROXIES = None

# Style picker based on rating. Admins get -1 rating in this script.
RATING_CLASSES = [ (-1,-1,'coderTextOrange'), (0,0,'coderTextBlack'), (1,899,'coderTextGray'), (900,1199,'coderTextGreen'), (1200,1499,'coderTextBlue'), (1500,2199,'coderTextYellow'), (2200,1e100,'coderTextRed') ]

REGEX_RATING = re.compile('<span class="coderText[a-zA-Z]+">\s*(\d+)\s*</span>')

REGEX_CR = re.compile('cr=(\d+)')
REGEX_ADMIN = re.compile('<a\s+href="[^"]*"\s+class="coderTextOrange">\s*([^\s<]+)\s*</a>')

# John Dethridge is the only topcoder that gets to have a space character in his name. Nice!
REGEX_HTAG = re.compile('\[h\]((\S+)|(John Dethridge))\[/h\]')


CACHE = dict()

QUERY_COUNT = 0
def getTCDataNoCache(handle):
# Wait 0.5 seconds between queries, and 2 seconds every 10 queries
# we are not DDoSing TC...
global QUERY_COUNT
if QUERY_COUNT == 10 :
QUERY_COUNT = 0
time.sleep(2.0)
else:
time.sleep(0.5)
QUERY_COUNT += 1

#download the profile using the search form...
params = urllib.urlencode({'module': 'SimpleSearch', 'ha': handle.replace('_','\\_')})
f = urllib.urlopen("http://www.topcoder.com/tc?%s" % params, proxies = PROXIES)

# Extract the cr
m = REGEX_CR.search( f.geturl() )
if m == None :
print 'Note: "%s" profile not found'%(handle)
return (0,0)
cr = int(m.group(1))

html = f.read()

#admin check
m = REGEX_ADMIN.search(html)
if ( m != None and m.group(1) == handle ) :
print 'Note: %s is an admin.'%(handle)
return (cr, -1)

# extract rating (or at least try)
m = REGEX_RATING.search(html)
if m == None :
print 'Note: "%s" profile not understood, assuming 0 rating.'%(handle)
return (cr,0)

rat = m.group(1)

if rat == 'not' :
print 'Note: %s is not rated.'%(handle)
rat = 0
return (cr,int(rat))


def getTCData(handle):
if not handle in CACHE :
CACHE[handle] = getTCDataNoCache(handle)
return CACHE[handle]


def makeLinkForHandle(handle):
(cr, rating) = getTCData(handle)
if cr == 0:
return handle
classname = 'error'
for (x,y,s) in RATING_CLASSES:
if x <= rating and rating <= y :
classname = s
if classname == 'error':
print "Unable to find classname for rating=[%d]" % rating
return '<a href="http://www.topcoder.com/tc?module=MemberProfile&cr=%d" class="%s">%s</a>' % (cr,classname,handle)


def makeLink(match):
print 'tag:',match.group(0)
handle = match.group(1)
return makeLinkForHandle(handle)


if len(sys.argv) == 1:
print 'Interactive mode, type a handle name and it will return the link. Type an empty string to quit.'
s = "#"
while(s != ''):
s = sys.stdin.readline()
if(len(s)>0):
print makeLinkForHandle(s[:-1])
exit(0)


if len(sys.argv) != 3:
print 'Usual arguments are: "origin file" "destination file"'
exit(0)

if os.path.exists(sys.argv[2]):
print '"%s" Already exists. Please delete it manually before callign the script.'%sys.argv[2]
exit(0)


f1 = open(sys.argv[1],'r')
f2 = open(sys.argv[2],'w')



text = f1.read()
text,cnt = REGEX_HTAG.subn(makeLink, text)
print "%d tags processed."%cnt

if re.search('\[/?h\]',text) != None:
print 'Note: Found [/?h] after replace, perhaps a tag is not formatted correctly...'

f2.write(text)