For Tech Tuesday, we asked Anna Chiaretta Lavatelli, the Associate Director of Digital Media at the MCA, to share some details of the work she did to prepare the MCA website for the first day tickets went on sale for David Bowie Is and the wave of intense interest that would go along with it.
on load testing
Passing out David Bowie Is hand fans from the float of the classic rock radio station WXRT at the June 2014 Chicago Pride Parade, I began to realize the scale of adoration for Bowie. As the crowd grabbed up all the swag and swayed to the song “Under Pressure,” it gave me a glimpse of how much Internet traffic we would need to be ready for on the MCA website for the upcoming show about this rock icon—a lot.
Anticipating this, the web team had upgraded the web server. However in July, when MCA member tickets first went on sale, we felt like the technicians who during the infamous 1974 Diamond Dogs tour left Bowie himself stranded in a cherry picker, dangling awkwardly over the audience after he finished singing “Space Oddity.” There we were reloading the page and watching it get slower and slower and then … stop.
The web team sprang into action to bring the site back online. We installed a web server tool called Varnish and saw an immediate improvement in site performance. Varnish serves up web content requests via a cache to reduce the CPU overload caused by many simultaneous users. Getting the site back up and stable was a relief, but it was not a permanent solution. Member traffic was only a fraction of what we could expect once general tickets were on sale. It felt like we were rock concert promoters opening the doors to a massive show, but we only had one small entrance. We needed to get more doors open so that everyone could excitedly but safely rush in at once.
Our next step was to work with Rackspace server support staff, who helped us add what is known as a load balancer to our web and database servers (and also nicknamed me “Major Tom” during the process). This relatively inexpensive add-on distributes requests across many machines when there is a surge in use of a server.
We believed this would properly prepare the website for David Bowie Is, but we needed to be certain. Could we handle the traffic surge that the exhibition would bring? Next up: load testing.
Load testing is the process of putting pressure on a system and measuring the response to the demand, like adding weight to the end of a cable gradually to see when it breaks so that you are able to know how much weight the cable can support.
After calling up colleagues in the field and reading about various load testing companies I settled on working with Load Impact, based out of Sweden. The company creates virtual users (robots!) to visit a website so that you can test out whether your services can handle the crowds.
Load Impact offers a well-designed user interface, including a Chrome extension that records a script as you click through the path you wish to test. You can create a “user scenario” to mirror the way actual people would buy David Bowie Is tickets. I did this, then set up a test in which 2,000 virtual users visited the MCA website.
Why 2,000? The goal was to calculate the average quantity of concurrent users. The number I came up with was a balance between the practical and the aspirational amount of traffic we expect to receive. I used a formula provided by Load Impact (and corroborated by some additional Googling): “concurrent users = (hourlyvisits * visitduration) / 3,600” (the 3,600 comes from the number of seconds in an hour).
Staying up late one night to conduct the load test while the MCA website had very few visitors, I knew I could cancel it if the central processing unit on the server got close to maxing out. I started up the test and cued up the monitoring window. On a world map within the Load Impact interface, the target of Chicago (where our hosting lives) appeared, as well as the locations of the robot generating servers I had selected (Chicago, California, Virginia, and London). Once all the locations were on the map, green lines shot out of the Load Impact partner servers like rockets. The width of the lines corresponded to the amount of bandwidth the servers were pulling.
Below the map, an activity graph tracked the number of virtual users and the web page response time. I kept an eye on the graph as the numbers rapidly increased, ready to press the big red abort button above the rocket map.
Meanwhile, we added a New Relic server monitoring agent to the mix to gather more server performance data. This allowed us to monitor the CPU and memory performance of the database and web servers via the New Relic website.
Fortunately, that night nobody had to climb down from the cherry picker due to machine failure! The website passed the test and our upgrades proved to be a success. Yes, the website slowed down a little bit, but 2,000 virtual users visited simultaneously without crashing the servers. Our design team and developer were even able to fix a bit of code that was slowing down our service.
Preparing the MCA website by testing it, we are now able to handle the large audiences keen to get tickets for David Bowie Is. From here, we can continue to expand functionality and access to our digital content. The show can go on and, at least online, no one will get stuck up in the cherry picker.