Archive for May, 2010

5/25/2010: 6:52 pm: Django, DreamHost, Google App Engine

DreamHost ended up at the top of a Lifehacker poll today for best personal web host. Well, not counting “Other”. Given how many web hosts there are and the likelihood that most of the people voting have experience with only one web host, it’s not surprising that Other came out on top. Still, DreamHost got almost 26% of the vote, more than double their closest competitor.

I’ve hosted this site on DreamHost since 2002 and have been very happy with their performance, attitude and sense of humor. I’ve added two more domains since then.

A few months ago I developed and deployed a simple Django app for calculating magic numbers for soccer leagues on shared hosting. The app scrapes league standings off websites and displays a matrix of teams and the magic numbers. DreamHost did a fantastic job of making it really easy to deploy a Django app on shared hosting with Phusion Passenger. I only had to make a few config changes from running it locally on the Django development server.

Recently I moved onto a Private Server and ported my app to run on mod_wsgi. Although mod_wsgi is the preferred way (I think) of deploying Django apps, it required a lot more debugging and config work on my part. I’m working on a blog post to cover everything I had to do to get it working. Part of the problem came from me having previously had it running on shared hosting. Someone starting with a private server would have had fewer issues.

I’m also planning to port the app to run on Google App Engine. Since there is no database backend, it should be very straightforward.

This is all leading up to a much more complicated web app that I plan to run on my DreamHost PS, as well as GAE. The data it will be storing is a natural fit for a relational database, so it will be interesting getting the app working on the App Engine datastore. I’m also going to try MongoDB as the datastore.

5/18/2010: 9:16 pm: Everything Else
During a fun debug session at work, Drew and I tracked down the cause of a misbehaving web app. The problem related to how Apache Tomcat decides to recompile JSPs and how some of our source code was branched and modified. Beware of modified JSPs in branches when you move to newer branches. The problem appeared after a new version was deployed in production. The symptom was that a chunk of HTML on a page in a web app displayed on the QA servers, but not on the production servers. We confirmed it was the same warfile, the same 6.0.16 version of Tomcat and the same 1.5.14 version of the JDK. There were no errors in the app log or in catalina.out. Having seen errors in other apps recently in the localhost log file, I decided to look there. We found a message logged at SEVERE regarding a JspException from not finding a value on an object using operator “.”. So, this obviously indicated  a JSP couldn’t be compiled because it referred to a non-existing field on a Java class. So, then we hunted down the Java class that the Jasper compiler generated from the JSP and compared it between production and QA. We found that the Java code on production had an extra method that related to this missing property. So, even though we deployed the same warfile, the Java code for the JSP on the file system was different. By default, the code generated from JSPs ends up in tomcat/work/Catalina/localhost/{context}/org/apache/jsp/WEB_002dINF/jsp. Reverse engineering it to correlate the Java with the JSP code wasn’t as bad as I expected. Next, we looked at the dependencies of the previous version and found that the class in question contained the field mentioned in the error message. Drew had switched to a newer branch for the new build, so the new file wasn’t strictly newer. And there’s the rub. The timestamp of the class file in the older branch was newer than the timestamp of the class file in the newer branch, because it had been modified after it was branched. When the app was deployed on the QA servers, the previous version had been undeployed. this caused the context directory for this app in the work directory to be deleted. When the app was deployed, Jasper compiled Java from all the JSPs as they were accessed. However, on production, the new version was deployed as a replacement for the previous version. So, the context directory wasn’t deleted. When the JSP was accessed, Jasper saw that the timestamp for the previous version was newer, so it didn’t generate new Java code for the JSP file. However, the Java class the JSP page depended on had changed and the code no longer worked. The quick fix was to undeploy the web app to let Tomcat clean up the work directory, and then to deploy it again. However, it turned out that the change in the older branch was important. So, be sure to also diff the JSPs before moving to a new branch.
5/2/2010: 10:15 pm: Conference, MySQL

Monitoring MySQL with Cacti

  • Cacti, like many other tools (Munin, Cricket, etc.), is a wrapper around RRDTool
  • Baron feels that existing tools (zabbix, ZenOSS, OpenNMS, etc.) that try to do graphing and alerting do at least one of them poorly, though Reconnoiter looks promising.
  • Cacti does graphing well
  • RRDTool does interpolations, so you lose the original data. By default, it doesn’t keep data very long.
  • With MRTG you have to create new graphs. Cacti has simpler approach using templates.
  • Data source, graph and host templates.
  • Problem – templates not always well written. Not always version compatible. Not always well parameterized.
  • Better Cacti Templates project has more than just MySQL templates.
  • Cacti prefers to use SNMP, but often easier to directly connect to mysqld or to use ssh and command line client
  • The wiki for Better Cacti Templates has very detailed installation instructions.

High Throughput MySQL at Facebook

  • Currently upgrading to 5.1.45 and innodb plugin 1.6
  • Perf testing starts with Sysbench. Have a tool called Shadow that allows them to direct all traffic to another server that is heavily instrumented.
  • Across shards – 7ms read time on avg, 13 million reads/sec peak, 370 million rows read/sec peak 210 m avg, 3.5 m rows modified peak 1.9 m avg, InnoDB disk IO/sec 4.4 m peak 3.3m avg
  • Row-based semi-sync replication
  • InnoDB plugin for perf and compression
  • mk-query-digest, mk-slave-prefetch, mk-upgrade
  • They don’t use stored procedures, but they might be helpful to solve some of their network latency issues
  • Get about 1500 IOPS from locally attached storage. 8 disks per server.
  • For 5.1 upgrade, planning to dump and reload and switch to InnoDB file per table (and moving to new hosts)
  • Paging via LIMIT is O(N*N)
  • Do pessimistic concurrency control with select for update and retries
  • Internal requirement for performance critical queries to be index only
  • They improved high concurrency throughput by disabling deadlock detection and depending on lock wait timeout. Reduced wait timeout to 15 sec, but dynamic per session. Dynamically reduce as low as 1 when server extremely busy.
  • They make temporary changes to some MySQL settings to survive extreme peak
  • They use gdb (see posts by Domas) to change variables that normally aren’t dynamic
  • LRU patch protects buffer pool from full table scans (e.g., from mysqldump).
  • Need to also monitor on client side, since server doesn’t see network issues, client timeouts, etc.

Fork me on GitHub