The first ever Men’s Roller Derby World Cup happened, and I built and managed its website. The website requirements were simple:
- Display news and information in the run-up to the tournament
- Show multiple embedded video feeds and score updates during the tournament
- Handle traffic spikes estimated at between 1,000 and 10,000 concurrent users
All code mentioned in this article is available on GitHub.
It begins pic.twitter.com/8OSP9LKXmR
— John Kershaw (@wardrox) March 13, 2014
The Front Page
The tournament needed a home on the web.
WordPress is the obvious choice, and obvious is good. I’m a fan of MediaTemple’s GridServer cloud hosting (more details here) as it copes well with traffic spikes. In general terms, and remember this is on a cloud shared service designed for spiked usage, this gave us:
- 2,000+ CPUs
- 1TB/s bandwidth
- SSD backed mySQL
To be sure the website would hold up under load I added the WP Super Cache plugin, which causes the site to only serve static content where possible.
Basic stress testing using loader.io showed the site able to handle around 90 requests per second for dynamic content, with an average response time under load of around 500ms. The predetermined worse-case benchmark of 17 requests per second was comfortably passed.
MRDWC.com was built with the knowledge that during the tournament we would be funnelling users away from it where possible, and towards live.MRDWC.com, a sub-domain acting as the hub for all coverage.
This mornings traffic pic.twitter.com/5EL8V6Tob7
— John Kershaw (@wardrox) March 14, 2014
The Live Site
The majority of my time was spent working on live.MRDWC.com. This is the site that can not have downtime during the tournament and is where fans go to get the bulk of the coverage.
It’s got a couple of basic jobs to do; show the video feeds, and show the current and past scores.
The video feeds are simple embed codes so no trouble there. Getting the site to display the latest data was the hard part.
With reliability being the most important part to the site, I designed a basic architecture which would see live.MRDWC.com as a static site, loading the dynamic “state” of the tournament (scores, current games, past games etc.) via a very simple Ajax call from an external source.
This makes the one page that must not go down as simple as possible, and gives us options on where to load the dynamic information. And diversity like this gives us reliability.
For ease of scale I chose Heroku to host the live site. You can scale up and down your resources simply and instantly. Hosting this part of the site on Heroku also puts it in a different data centre to the MediaTemple site (MT is in the US, and the Heroku apps are in the EU).
As live.MRDWC.com is now only serving a very basic single page (using node.js and Express), its benchmarking was astronomical, able to easily handle 10,000 requests per second on just 10% of the available resources.
Real Time Data
One of the features of the site, which hadn’t ever been done before for a derby tournament of this scale, was to stream live data from the scoreboards to viewers’ computers.
The software used by the scoreboards could, assuming you had a rock solid internet connection, post the data to the RDNation server, which grants API access to the data. In theory, this makes it very simple to hook into the API and fetch the data. In practice, getting information from the API was indeed straightforward, however there was a real challenge getting the scoreboard software sending its data on less-than-perfect Wi-Fi.
Once the data had reached the Roller Derby Nation API, my code took over and through the system outlined below, sent the information out to any device that requested it.
The rate at which the scoreboards on viewers’ machines updated could be adjusted remotely allowing me to slow down or speed up the request rate when needed. For most of the weekend the refresh rate was set at around 8 seconds.
This makes the amount of time from an official entering the score into the scoreboard and a user at home seeing it on their device somewhere between around 2 seconds and 18 seconds, plus latency. Given our video feed had a lag of a few seconds, this was fine.
The state was saved in a Mongo database hosted by MongoLab. The MongoLab plan chosen uses lots of nice redundancy for reliability, and as it’s Mongo, allows multiple thousand concurrent database connections.
You’ll see there’s four different services which make up the system. This allows for easy scaling and a division of labour. Here’s a brief description of each module.
This is the static website you see when you visit live.MRDWC.com. It’s super simple and does nothing dynamic. All the data is fetched via a browser ajax call to the next module.
The horizontal scaling of the site all comes from here. It has a simple job; return a JSON representation of the current tournament state, to be displayed on the site. This includes current game stats, as well as brackets, tables, and even the text to display on the alternate language feed options. The module gets the state from a Mongo database (hosted on MongoLab).
Easily the most complex component, this is the admin interface used throughout the tournament via its own URL. If the front-end of the site dies for any reason, the state can still be calculated and served via a back-up process. This module also allows manual entry of game scores if the scoreboard software in the building has issues.
This module loads all the components which make up the tournament state, builds the complete state, and updates the Mongo database.
This simple little module pokes the mrdwc-command module every few seconds and prompts it to rebuild the status.
Distribution and Protection
In the past, roller derby tournaments have had problems when it comes to the reliability of their websites & live video. This put pressure on myself to deliver a service that would stay up no matter what without hiccups. Due to these needs I built it all with two questions in my mind;
- What’s the most reliable solution?
- What if that breaks?
The basic principle I tried to apply to all levels of the site is break down the problem into assorted smaller parts, with each having its own Plan B (and C, and sometimes D). For example (this list is far from exhaustive);
- If MRDWC.com goes down … the live site (live.MRDWC.com) is on a different server so will stay up.
- If Heroku (hosting live.MRDWC.com) goes down … we can switch to a backup on the MediaTemple server.
- If there’s an unforeseen bottle-neck and mrdwc-query goes down … we can switch to a manual backup for the state delivery.
- If the scoreboard in the building breaks … we have an override for the website and backup software in the building.
During the weekend the only backup system that we needed to use was to handle the Wi-Fi breaking for the laptops with the scoreboards. When this happens the score data can’t automatically reach the outside world. I built manual score-entry forms into the control module for the website, so when the Wi-Fi went down, I manually updated the scores.
For The End User
The main page for the tournament worked well, looked clean (thanks Bootstrap), and did exactly what we needed it to do. There was a tab for each track which the viewer could easily switch between.
As well as the two tabs for each track, there’s a third for scores. This lets fans follow the tournament really simply. The data is duplicated manually on MRDWC.com/results so if either MRDWC.com or live.MRDWC.com goes down, the results are still accessible.
There is some basic, nicely abstracted code which reliably fetches the state and populates a page based on jQuery selectors. So when I came to build some auto-populating overlays for the video feed, the logic was already there and reusable. I created two different overlays; one for the tables and one for the knockout stage. The latter was so popular it was released to the public, and widely used.
All of these pages combined to provide a fantastic user experience.
Whilst at the tournament I also saw a lot of people sat watching one game, with the video feed or scoreboard for the other track open on their phone or tablet.
- Number of concurrent devices viewing the final: 3,400
- Live Scoreboard updates: 24 million
- Website uptime: 100.00%
I covered the full cost of the website. This was the first tournament of its type, and was being privately funded with some uncertainty about if it’d break even. Covering the costs myself also gave me more control over what I purchased and in what quantity. the emphasis moved from getting it cheap to getting it done.
Excluding my time, which would arguably be the most significant cost to the project, the rough breakdown of pricing is as follows:
- MediaTemple hosting; $40 ($20 GridService, $20 mySQL Grid Container)
- MongoLab: $40ish
- Heroku: $76 (mrdwc-live $26, mrdwc-query $50)
- Loader.io: $5
Total cost: $161
These were not optimised for price, and in many cases I intentionally provisioned significantly more services than were required to guarantee reliability in the face of uncertainty. As such, the total price is roughly double or triple what I think we could have got away with and this is knowledge I’ll bring forwards to future projects. If the site wasn’t being privately funded by myself, I would almost certainly have been more conservative.
— John Kershaw (@wardrox) March 16, 2014
During the tournament I took a lead in ensuring our data connection out of the building was as solid as possible. This isn’t detailed much in this post, but as I know there was some interest here’s a quick run-down of what we had:
- 3x Satellite connections (roughly 4Mb each) for video
- 4G connection (20Mb) for general use
- 3G connection for media Wi-Fi
- 150m Cat5 cable* from the satellite truck to our switch
- One Wi-Fi access point
- 2x 5G Wi-Fi bridges to connect each scoreboard
*There was a very high error rate on the cable, but upping the speed from 100Mb to 1Gb allowed just enough bandwidth for us to cope.
Things I’d Like For Next Time
Even more automatic backups/redundancy/recovery
The redundancy and backup services we had worked really well over the weekend, handing everything that was thrown at them. However, there’s still scope to improve the way switching from plan A to plan B happens. Whilst there was plenty of monitoring from the outside-in of the system (to ensure viewers had a perfect experience), the internal and third-party systems didn’t have a huge level of exposure to check their health.
When the Wi-Fi to the scoreboards broke, for example, I would only see this manifest in the scores on the live site not updating for a few minutes. A way to monitor these third-party dependencies would be great.
Release the overlays to the public earlier
The overlays produced just for TV would have been fantastic as stand-alone pages to release to the public. The distribution and popularity of the auto-updating knockout page shows this. Even if they’re not perfect (e.g. not mobile friendly) pages like these are still fine to release to the public.
It was also apparent that the information was far easier to get at when at home on the website than when in the building. Having screens up around the building with the website on would have been very helpful.
Choice in scoreboard software
Much of my energy at the weekend went into helping officials use the scoreboard software. The software was chosen because it could send data out the building. Given the issues had with the scoreboard software (there were real bugs, as well as issues sending the data out when the connection isn’t flawless), I’d like to devote some time to building ways for other scoreboard software to send data to remote third parties.