Sunday, September 15, 2013

Attempts to organize my topcoder folder

So when I migrated from KawigiEdit to Greed, the biggest issue was that thanks to KawigiEdit, all my topcoder source code files are in a single folder. Greed on the other hand, by default organizes them into sub-folders with the contest name. If I wanted to use Greed's organization powers, I would first need to organize all the files created by KawigiEdit. Move all the source code to their respective contest folders.

As it turns out, my folder contained code for problems from 360 different topcoder contests! So I needed an automatic method. No other choice but to make a python script. I refurbished some of the work I put in the handle tag script to make it search for problem names in the problem archive and then parse the results to get the contest name, then do the move.

As it turns out, it wasn't an incredibly trivial problem to solve:

  • Some problems were used in multiple contests, which one to choose?
  • Some of my problems couldn't be found in statistics (Round canceled, a bug, match was never added to statistic because it was special).
  • The results have a different format for TCHS matches and other contests.

There is even a problem I couldn't solve in anyway: For many tournament rounds, the contest name differs between practice room and problem archive (booo!).

But anyway, here is my script in case anyone needs it. It most likely only works in Unix-like OS, as it uses / for folder paths and commands like mv and mkdir. You are supposed to run it from the folder that contains all the problem codes.

Updated script to a windows friendly version thanks to Bjarki Ágúst Guðmundsson.

# Topcoder folder organizer
# err let us say it is released under the zlib/libpng license
# (c) Victor Hugo Soliz Kuncar, 2013
import urllib, re, sys, os, time, string, glob
# Rudimentary proxy support
#  For example:
#  PROXIES = {'http': ''}
# Will try all file names with these extensions, and assume they are ClassName.extension :
EXTENSIONS = [".cpp", ".java", ".cs", ".vb", ".py"]
BLACK_LIST = [ "template.cpp", "" ] #put any files that you are pretty sure are not problem source codes
#Special cases, problems that are not in the web site.
#Not by any chance an exhaustive list:
SPECIAL = {  #NetEase rounds are not in statistics, let's put them all in the same folder:
            'UnrepeatingNumbers     ': 'NetEase',
            'MaximumPalindromeScore' : 'NetEase',
            'RobotKing'              : 'NetEase',
            #This qualification round was canceled and renamed 3A, not in stats:
            'ChessTourney' :'TCO08 Qual 3A',
            'PokerSquare'  :'TCO08 Qual 3A',
            'RandomNetwork':'TCO08 Qual 3A',
            # Same with this one:
            'DNAMatching'    : "TCO'10 Qualifier 1A",
            'Palindromize3'  : "TCO'10 Qualifier 1A",
            'MegadiamondHunt': "TCO'10 Qualifier 1A",
            # SRM 377 is not in the stats, and I have no idea why:
            'AlmostPrimeNumbers'  : 'SRM 377',
            'SquaresInsideLattice': 'SRM 377',
            'GameOnAGraph'        : 'SRM 377',
            'AlienLanguage'       : 'SRM 377',
            # Member SRM 471 is gone from statistics, maybe it was cancelled, maybe
            # it is one of those matches that suffer the fushar curse - a bug that
            # removes matches by fushar from statistics.
            'PrimeContainers'   :'Member SRM 471',
            'EllysPlaylists'    :'Member SRM 471',
            'Thirteen'          :'Member SRM 471',
            'PrimeSequences'    :'Member SRM 471',
            'ThirteenHard'      :'Member SRM 471',
            'ConstructPolyline' :'Member SRM 471',
REGEX_CONTEST = re.compile('([a-zA-Z0-9]+)\s*<.A>\s*<.TD>\s*<TD[^>]+>\s*<A\s+HREF="[^=]+[ce]=[a-zA-Z]+ound[a-zA-Z_]+verview&rd=[0-9]+"[^>]+>\s*(.+)\s*</A>')
CACHE = dict()
def better(c1, c2):
    # when a class has multiple contests, pick the better one.
    # Parallel rounds:
    p1 = 'arallel' in c1
    p2 = 'arallel' in c2
    if p1 and not p2:
        return c2
    elif p2 and not p1:
        return c1
    # some problems were used in special contests and also in SRMs, put priority in SRM:  
    s1 = (c1[0:3] == 'SRM')
    s2 = (c2[0:3] == 'SRM')
    if s1 and not s2:
        return c1
    elif s2 and not s1:
        return c2
    s1 = ('SRM' in c1)
    s2 = ('SRM' in c2)
    if s1 and not s2:
        return c1
    elif s2 and not s1:
        return c2
    return c1 #any one
def getProblemContest(handle):
 if handle in CACHE:
    return CACHE[handle]
 #CACHE memoizes the following function:
 def sub(handle):
    if handle in SPECIAL:
        return SPECIAL[handle]
    # Wait 0.5 seconds between queries, and 2 seconds every 10 queries
    # we are not DDoSing TC...
    global QUERY_COUNT
    if QUERY_COUNT == 10 :
        QUERY_COUNT = 0
    QUERY_COUNT += 1
    #download the problem archive search results
    params = urllib.urlencode({'module': 'ProblemArchive', 'class': handle})
    f = urllib.urlopen("" % params, proxies = PROXIES)
    html =
    # extract rating (or at least try)
    m = REGEX_CONTEST.findall(html)
    if type(m) == list:
        m = [x[1] for x in m if x[0] == handle]
    # Some problems were used in test SRMs, with a '2' added to their class name:
    if m == None or len(m) < 1:
        if  len(handle) >= 1 and handle[-1] == '2':
            return getProblemContest(handle[:-1])
        return None
    return reduce( better, m)
 CACHE[handle] = sub(handle)
 return CACHE[handle]
def inFolder():
    for f in os.listdir('.'):
        if os.path.isfile(f) and f not in BLACK_LIST:
            c, ext = os.path.splitext(f)
            if ext in EXTENSIONS:
                s = getProblemContest(c)
                if s != None:
                    s = s.replace('/','-')
                    if not os.path.isdir(s):
                        print ('mkdir "./%s"' % s)
                    print ('mv %s "./%s/"' % (f, s))
                    os.rename(f, os.path.join(s, f))
                    print ('%s: contest not found.' % c)


Bjarki Ágúst Guðmundsson said...

Thanks! This is definitely a handy script to have.

Here is an updated version that should be platform independent.

vexorian said...

I used this folder organization idea for the first time during a real match. And now I figure, it is a very pointless features, I think I will revert back to having all code in a single folder :(

The real killer for this feature is that the contest name is different between the real match and the practice room. For a real match, the contest name is "Single Round Match 591", for the practice room, the name becomes "SRM 591", forcing you to manually move the solutions you were writing during the match. This makes the whole thing more of an annoyance than it is worth.

Ruby Badcoe said...

Organizing your files in folders and sub-folders, also known as folders within folders, is a neat way of doing things. Naming each accordingly will make it easier for you to locate the file you need. Great job, and thank you for sharing this with us!

Ruby @ Williams Data Management