Hosting a 700 Person Programming Contest - ECOO 2021

Posted: May 5, 2021
Updated: July 9, 2023

On May 1st, 2021, the 36th annual Educational Computing Organization of Ontario (ECOO) Programming Contest was held. This contest has been the culmination of over a month of work by Theodore Preduta, Larry Yuan, Keenan Gugeler, Christopher Trevisan, and myself, alongside Valentina Krasteva and David Stermole and the support of ECOO and ECOO-CS. This is my second year involved in the hosting of this event so I figured I would write about the setup and all that went right and wrong.

Background

This programming contest was run on a platform called DMOJ, an AGPL-licensed online judge, for accepting competitor code and rating its correctness. A fork was created with various unnecessary functionalities removed. In a broad sense, DMOJ works in the following manner:

This year’s contest contained six (6) problems ranging in difficulty from complete beginner to the highest level of competitive programming. A big thanks goes to Keenan and Chris for creating these problems and Andrew Qi Tang) for testing them before the competition.

Network

Nearly 700 people signed up to do the contest and we wanted to make sure that the site ran smoothly. Last year’s approach was to use a smaller number (<5) of more powerful site servers behind a loadbalancer to distribute load. This vertical scaling approach worked quite well, but we decided to try a more horizontal-scaling based approach this year with a larger number of less powerful servers. Our entire network was setup on DigitalOcean Droplets with the database and individual sites on 2 core / 2 GB RAM systems and everything else on 1 core / 1 GB RAM. All internal networking was done with DigitalOcean’s provided private networking and formed the following topology:

Our load balancer, central servers, and judges performed fantastically. They maintained low load throughout the entire contest, even during our fuck up (explained soon). Throughout the competition, we deployed roughly fifteen (15) sites running a Django app with uWSGI and ten (10) judges running a Docker container with a Python app using the DigitalOcean API. Generally, it’s best that every judge has identical hardware so that competitor submissions run at consistent speeds and get the same number of points every time. Unfortunately, this is rather difficult to get with a cloud provider because of shared resources and different hardware.¹ For us, we figured the differences weren’t drastic enough to warrant pursuing more complicated setups.

As the contest began, we very quickly noticed horrible latencies and a constant stream of 500 errors. At the very beginning of the contest this was expected — the main contest page that competitors had open automatically refreshes when the contest begins — but several minutes in and the latencies just kept getting worse. Thinking about what was different this year and looking at a browser’s network inspector while on the main scoreboard page revealed what was up. You see, the scoreboard is updated live with the help of an event server which informs connected browsers when to fetch new results. Last year, this websocket connection was disabled for non-organizers for fear of overloading the event server. It turns out that overloading the event server wouldn’t have been an issue, but the subsequent requests from all connections to get the latest results were. Because of how quickly submissions were being made at the start of the contest, the sites became overloaded as each competitor opened several simultaneous connections.

Ideally, this would be fixed by sending the updated results over the websocket connection or, at the very least, caching the updated results that competitors were querying. As a quick fix though, we just dropped the scoreboard update event on our event server and left it at that. From then on, everything ran smoothly until the end of the competition which finished with one competitor completely finishing all the problems and several others getting partial points on the last problem.²

To end off this section, here are some mildly interesting statistics for the runtime of the competitions, including any practice rounds and some setup time. Full reports from GoAccess — an awesome log analysis tool — can also be found below.

Metric	ECOO 2020	ECOO 2021
Registered Competitors	713	692
Total requests	1,335,283	615,575
Peak New Requests/sec³	185	892
Visitors	3,700	6,646
500 Errors	62,664	22,946
Goaccess Report	report-2020.html	report-2021.html

Cheating

Cheating must be considered in any competitive activity, especially those in online environments. This year, all online resources were permitted which we hoped would decrease the usefulness of cheating. However, as this was still an individual contest, cheating by communicating with others was still likely. Luckily, since 2019,⁴ DMOJ has integrated the Stanford Measure of Software Similarity (MOSS) API to automatically submit competitor source code and check for plagiarism. This system compares code similarity but has knowledge of language syntax so simple changes to style or variable naming is useless. As an excellent example, let’s take these three submissions from three different students at the same school (you know who you are!) which MOSS flagged:

Numbers, M, KS = input().split()
output = [-1, 0]*int(Numbers)

for i in range(int(KS)):
    First, Second, Third = map(int, input().split())

    if output[(Second*2)-1] < Third:
        output[(Second*2)-1] = Third
        output[(Second*2)-2] = First
    
for i in range(0, len(output), 2):
    print(output[i], end=" ")

This student clearly likes some verbosity in variable names.

# Inputs
# N =  number of questions
# M = number of professors
# K = number of emails sent

N, M, K = input().split()
output = [-1, 0]*int(N)

for i in range(int(K)):
    # A = the professor  
    # B = question that was asked
    # C = score given to answer
    A, B, C = map(int, input().split())

    if output[(B*2)-1] < C:
        output[(B*2)-1] = C
        output[(B*2)-2] = A
    
for i in range(0, len(output), 2):
    print(output[i], end=" ")

Short variable names which correspond to the question but nice comments to explain what they are.

N, M, K = input().split()
out = [-1, 0]*int(N)

for i in range(int(K)):
    A, B, C = map(int, input().split())
    if out[(B*2)-1] < C:
        out[(B*2)-1] = C
        out[(B*2)-2] = A
    
for i in range(0, len(out), 2):
    print(out[i], end=" ")

The absolute minimum — a standard of competitive programmers.

All three of these students, alongside nearly twenty (20) others have been disqualified: a similar amount of cheaters to last year so nothing too surprising.

Conclusion

Overall, just like last year, I really enjoyed running this contest and setting up some of the infrastructure and systems behind it. While it definitely didn’t run perfectly, I think we did pretty well and provided a good experience to competitors. Unfortunately, apart from the nginx (and other service) logs, we didn’t really collect any kind of system metrics for future analysis. I was planning to setup Prometheus but couldn’t do it in time. Next time, I’ll definitely make it a priority so we can get some better insight into any kinds of bottlenecks and so I can write a much more detailed writeup!

If you’d like to read more on this topic and the solution that DMOJ settled with, see their wonderful blog post. ↩︎
If you’re interested in the scoring/ranking format used, see the Rules. ↩︎
From a quick grep of nginx logs — may not be accurate. ↩︎
First implemented in #913 with better integration in #1118. ↩︎