Cacti, like many other tools (Munin, Cricket, etc.), is a wrapper around RRDTool
Baron feels that existing tools (zabbix, ZenOSS, OpenNMS, etc.) that try to do graphing and alerting do at least one of them poorly, though Reconnoiter looks promising.
Cacti does graphing well
RRDTool does interpolations, so you lose the original data. By default, it doesn’t keep data very long.
With MRTG you have to create new graphs. Cacti has simpler approach using templates.
Data source, graph and host templates.
Problem – templates not always well written. Not always version compatible. Not always well parameterized.
Better Cacti Templates project has more than just MySQL templates.
Cacti prefers to use SNMP, but often easier to directly connect to mysqld or to use ssh and command line client
The wiki for Better Cacti Templates has very detailed installation instructions.
High Throughput MySQL at Facebook
Currently upgrading to 5.1.45 and innodb plugin 1.6
Perf testing starts with Sysbench. Have a tool called Shadow that allows them to direct all traffic to another server that is heavily instrumented.
Across shards – 7ms read time on avg, 13 million reads/sec peak, 370 million rows read/sec peak 210 m avg, 3.5 m rows modified peak 1.9 m avg, InnoDB disk IO/sec 4.4 m peak 3.3m avg
Row-based semi-sync replication
InnoDB plugin for perf and compression
mk-query-digest, mk-slave-prefetch, mk-upgrade
They don’t use stored procedures, but they might be helpful to solve some of their network latency issues
Get about 1500 IOPS from locally attached storage. 8 disks per server.
For 5.1 upgrade, planning to dump and reload and switch to InnoDB file per table (and moving to new hosts)
Paging via LIMIT is O(N*N)
Do pessimistic concurrency control with select for update and retries
Internal requirement for performance critical queries to be index only
They improved high concurrency throughput by disabling deadlock detection and depending on lock wait timeout. Reduced wait timeout to 15 sec, but dynamic per session. Dynamically reduce as low as 1 when server extremely busy.
They make temporary changes to some MySQL settings to survive extreme peak
They use gdb (see posts by Domas) to change variables that normally aren’t dynamic
LRU patch protects buffer pool from full table scans (e.g., from mysqldump).
Need to also monitor on client side, since server doesn’t see network issues, client timeouts, etc.