Websites can be very useful for HPC to allow easier access. Recently we had an issue with a website we use and learned a few things on the way.
Know your users
A website has been running fine for a number of months – and building up a user base. However we recently hit an issue where a large number of users using the service caused the site to have slow page loads and timeouts. Errors in the Apache logs seemed to suggest some sort of timeout issue so naturally we tried increasing the timeout in Apache as a quick fix until we found the culprit. This did not have an effect and therefore we had to look elsewhere for the issue.
Know your application
One aspect of this service which made me suspicious was the use of file locks when performing many operations which required reading and writing to shared files between processes. The website used CGI scripting with Perl to create the pages and the Perl used
flock to lock files. These lock files may have been causing some issues with lock contention and race conditions when serving many users.
Know your filesystem
Since we were serving the webpages from an NFS mount it seemed the use of NFS may have been causing slower response. Searching the Internet for NFS and file locking (since we suspected file locking as an issue) it seems there are many reports of file locking having issues with NFS, but also reports it should work as well. We have made sure we have allowed all NFS traffic through the firewall and that the NFS server can be resolved by hostname not just IP (and to make sure sync is set in mount options just to be sure).
Know your testing
Now that we have made some changes we decided to try and perform some testing on the site. Some useful tools to perform this type of work was Selenium which automates website users. We could then spawn a few “users” and make sure the system responds correctly. A few issues were hit due to the site dynamically creating content (check Internet about Selenium and AJAX). Using Selenium Builder in Firefox also helped find which calls to make in the Python script.
Know your future
The lessons learnt during this time was very instructive and could be used in many future scenarios. Apache logs are useful but make sure you know what your web application is doing, in this case it was having issues locking shared files between users of the site. Being able to load-test a website is also useful and performing it in Python was very easy to test and adapt when needed. Maybe specific future posts about topics encountered here could be worthwhile.