MySQL


6/3/2011: 5:00 pm: Java, MySQL

When you use the MySQL JDBC driver to select rows from a table, the connection will block until the entire ResultSet has been pulled over to the client. In most cases this makes sense, especially if the server is on a different host. Retrieving the entire ResultSet will minimize the number of TCP packets that must be sent from the server.

However, if you are returning a very large ResultSet, the client will have to allocate a lot of memory on the heap. If you end up accessing each row to create an object from the data, then you will need enough heap space for the entire ResultSet plus all of the objects you instantiate.

The driver documentation explains how to force the driver to stream the ResultSet row-by-row.

The first catch is that you must be using a regular Statement object, not a PreparedStatement.

The documentation says you need to add the following non-intuitive code before executing the query:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

though you can actually just use conn.createStatement() since TYPE_FORWARD_ONLY and CONCUR_READ_ONLY are the defaults.

There are a couple caveats in the documentation, though they are fairly obvious. You should process the ResultSet as quickly as possible, since locks will be held as long as the statement (and any transaction it is in) is open.

In addition to being non-intuitive, setting the fetch size to Integer.MIN_VALUE might cause unexpected results if you run your code against a database server other than MySQL.

If you’re willing to go all out in committing to MySQL, you can cast the return value of createStatement() to com.mysql.jdbc.Statement.StatementImpl and then call enableStreamingResults(). That will, at least, make the behavior of your code more obvious.

At work I needed to cache a lot of data from a couple of tables. Using the default behavior caused the heap to grow to over 12.5 GB. That made for trouble when running on my 8 GB laptop. By switching to streaming the ResultSet, the heap maxed out at only 5 GB.

12/29/2010: 10:45 pm: MySQL

If a replication slave gets out of sync with the master, you can bring them back in sync by running statements that don’t execute on every server in the replication chain. There are sane and insane ways to do this.

The right way is to execute SET SESSION sql_log_bin=0; on your current connection before running the statements you don’t want replicated. Then, either execute SET SESSION sql_log_bin=1; or close the connection.

The crazy way is to execute the statements while the default database is set to a database that is not replicated. Many DBAs configure MySQL servers so that the mysql database is not replicated, since it may contain user and host info that is specific to a server instance. When the default database is set to a database that is not replicated, mysqld will not replicate statements affecting any database. Feature or bug, you be the judge.

In my example scenario below, db1 is the master and db2 is the slave.

First, create a table in the test database and verify it is replicated. Here’s an example create statement.

mysql> CREATE TABLE test.rs (a INT);

Then, use the MySQL CLI to connect to the master database server (in this case, db1), set the default database to mysql (-D mysql) and execute an INSERT statement. The short version of this is:

[me@server ~]$ mysql -u me -D mysql -h db1 -e "INSERT INTO test.rs VALUES (1);"

Then, verify that the row was added on the master, but not on the slave.

[me@server ~]$ mysql -u me -h db1 -e "SELECT * FROM test.rs;"
+------+
| a    |
+------+
|    1 |
+------+
[me@server ~]$ mysql -u me -h db2 -e "SELECT * FROM test.rs;"
[me@server ~]$

And if you haven’t already guessed, the insane way is frequently the accidental source of many out-of-sync situations. Use the sane way to fix the damage.

Of course, the easy way to sync up tables on out of sync servers in a replication chain is to use mk-table-sync.

10/4/2010: 2:30 pm: MySQL

Like a lot of software tools, MySQL error messages are sometimes not very precise, and sometimes not even that accurate. Last night during a database schema update, we ran into an issue creating a table with a foreign key constraint. Our tests had run successfully on several other MySQL 5.0 and 5.1 systems, so I wasn’t sure why it failed on this MySQL 5.0 instance. The actual error message was something like:

ERROR 1005 (HY000): Can't create table 'test.b' (errno: 150)
Looking up the error code confirmed that it was an issue with the foreign key constraint.
$ perror 150
MySQL error code 150: Foreign key constraint is incorrectly formed
This suggested to me that there was something wrong with the syntax. But, then why did it work elsewhere?

Oftentimes, the problem is actually with the columns that are referenced in the constraint. A careful comparison of the two tables revealed no differences in the column definitions. However, the column definition doesn’t tell the whole story.

It would be painful if you had to declare the character set each time for every character-based column. So, MySQL lets you define the default character set for the table, and each character column will inherit that specification. The table inherits from the database setting and the database inherits the systemwide setting. So, visually inspecting the CREATE TABLE statement may not be sufficient.

In our case, the first table had been created when we had been using a systemwide default character set of latin1. We had since changed to using utf8. Since the columns that were being referenced in the constraint were varchars, the character set needed to be the same. This would not have been an issue, of course, if the columns had been integers or some other non character-based column.

The statements below can be used to recreate the error message. It’s more obvious in this example, because the default character set is explicitly stated. Default values are great, until they silently sneak up on you and gnaw your hand off.

CREATE TABLE a (
	id VARCHAR(10) NOT NULL,
	PRIMARY KEY (id)
	) ENGINE=InnoDB DEFAULT CHARSET=latin1;
 
CREATE TABLE b (
	id VARCHAR(10) NOT NULL,
	FOREIGN KEY (id) REFERENCES a(id),
	PRIMARY KEY (id)
	) ENGINE=InnoDB DEFAULT CHARSET=utf8;
5/2/2010: 10:15 pm: Conference, MySQL

Monitoring MySQL with Cacti

  • Cacti, like many other tools (Munin, Cricket, etc.), is a wrapper around RRDTool
  • Baron feels that existing tools (zabbix, ZenOSS, OpenNMS, etc.) that try to do graphing and alerting do at least one of them poorly, though Reconnoiter looks promising.
  • Cacti does graphing well
  • RRDTool does interpolations, so you lose the original data. By default, it doesn’t keep data very long.
  • With MRTG you have to create new graphs. Cacti has simpler approach using templates.
  • Data source, graph and host templates.
  • Problem – templates not always well written. Not always version compatible. Not always well parameterized.
  • Better Cacti Templates project has more than just MySQL templates.
  • Cacti prefers to use SNMP, but often easier to directly connect to mysqld or to use ssh and command line client
  • The wiki for Better Cacti Templates has very detailed installation instructions.

High Throughput MySQL at Facebook

  • Currently upgrading to 5.1.45 and innodb plugin 1.6
  • Perf testing starts with Sysbench. Have a tool called Shadow that allows them to direct all traffic to another server that is heavily instrumented.
  • Across shards – 7ms read time on avg, 13 million reads/sec peak, 370 million rows read/sec peak 210 m avg, 3.5 m rows modified peak 1.9 m avg, InnoDB disk IO/sec 4.4 m peak 3.3m avg
  • Row-based semi-sync replication
  • InnoDB plugin for perf and compression
  • mk-query-digest, mk-slave-prefetch, mk-upgrade
  • They don’t use stored procedures, but they might be helpful to solve some of their network latency issues
  • Get about 1500 IOPS from locally attached storage. 8 disks per server.
  • For 5.1 upgrade, planning to dump and reload and switch to InnoDB file per table (and moving to new hosts)
  • Paging via LIMIT is O(N*N)
  • Do pessimistic concurrency control with select for update and retries
  • Internal requirement for performance critical queries to be index only
  • They improved high concurrency throughput by disabling deadlock detection and depending on lock wait timeout. Reduced wait timeout to 15 sec, but dynamic per session. Dynamically reduce as low as 1 when server extremely busy.
  • They make temporary changes to some MySQL settings to survive extreme peak
  • They use gdb (see posts by Domas) to change variables that normally aren’t dynamic
  • LRU patch protects buffer pool from full table scans (e.g., from mysqldump).
  • Need to also monitor on client side, since server doesn’t see network issues, client timeouts, etc.
4/25/2010: 9:05 pm: Conference, MySQL

Replication Tips – slides

  • Single master can typically handle a few dozen slaves
  • If you have a lot of slaves and want to do filtering for all, maybe use blackhole engine to do filtering in one place. Especially if running blackhole slave on same server to limit network bandwidth.
  • Replicate over WAN – slave_net_timeout, master_connect_retry, slave_compressed_protocol

Gearman – slides

  • UDFs allow you to run functions in a wide range of locations on servers other than the database server
  • Synch and asynch jobs
  • gman_sum() provides simple example of map-reduce with gearman
  • - Asynch queues, a.k.a., background tasks
  • e.g., email notifications (avoid blocking on sendmail), full text indexing, stat counters, etc.
  • Can enable you to do batch operations, when you don’t want to block on a large batch
  • Maybe use trigger to call a UDF that does a background task. Allows you to implement much more complex triggers with external side effects (e.g., sending notifications).
  • Putting Gearman workers between app and MySQL can allow you to parallelize queries (scatter-gather).
  • Persistent queues – if job server crashes, it can replay the queue on startup or you start another job server and point it at the queue
  • Queues can be stored in Drizzle, MySQL, PostgresQL, SQLite and even in Memcached.

Mastering the Art of Indexing – slides

  • Lock contention and indexing
    • update/delete by full scan is sequential rw. By index scan is random. So, full scan can be better in some cases.
    • In 5.1 can disable next key locking. use binlog-format=row and transaction-isolation=read-committed. But read committed can perform worse with high concurrency.
  • Deadlock caused by indexing
    • Sometimes single query update or delete has to update secondary index as well as PK table in InnoDB.
    • If 1 statement accesses and modifies sec index before accessing table records while another statement modifies same table record and needs to update same row in secondary index, you can get deadlock.
  • Covering index with range, limit, count
    • Range queries over large ranges often slow, especially if looking in secondary index before using range on table records. Can result in many random reads. Covering index can improve perf by eliminating random reads.
    • Covering index can also help a lot with LIMIT (especially when skipping) if the query would otherwise have to randomly access table records to apply other where clause conditions.
    • Similarly, covering index can help with COUNT, since you don’t have to randomly access table records.
  • Covering index with blob
    • Assuming you have a table with a blob that is rarely selected (e.g., < 10% of time). To avoid blob being loaded into buffer pool, typical optimization is to move blob into another table with 1:1 relationship, though this breaks normalization.
    • For his tests, performance with 2 tables is better until percentage of queries requesting blob (thus requiring two SELECTs) get up to around 45%.
    • Instead, can use covering index to include all desired columns (except the blob, of course) for most queries. To get good perf, initial column should be chronologically order, e.g., auto inc or a timestamp.
    • May have to use FORCE INDEX to get optimizer to use your index.
  • Sorting, indexing and query execution plans
    • If query does ORDER BY on a secondary index that is used for the query, can skip filesort, which is O(nlogn).
    • If not, optimizer chooses btw full scan and using index for sort vs. index scan and a filesort.
    • Index merge can be used to filter records, but not to sort.

Understanding the Role of IO as Bottleneck – slides

  • 7200 RPM drive can generally do about 100 IOPS.
  • HDD disk seek about 10 ms. Reading 1 MB sequentially from HDD is about 20 ms.
  • When you do a read on InnoDB, rows are loaded into buffer pool.
  • Writes to InnoDB go to buffer pool, but also are flushed to log file. Sequential IO to log file is fast. Buffer pool pages (16 KB/page) are marked as dirty until flushed to tablespace.
  • Writes to tablespace are sorted by innodb to be sequential as much as possible.
  • iostat is more useful than vmstat for IO analysis
  • If svctm large (maybe more than a few milliseconds), may have large competing IO requests, e.g., a busy shared SAN.
  • Queue size tells you how much data could be lost if server crashes. Wait time indicates how long ops may be blocking and also how much risk of lost data from power loss with no battery back up (or non-catastropic disk crash).
  • Indexes and archiving old data can reduce your working set. If your server is read-heavy, lots of RAM can really help.
  • RAID 5 good for reads, but worse for writes. RAID 10 good for both.
  • RAID controller with BBWC can reorder some random writes to sequential writes without risk of a power loss losing data.
  • Fusion-IO card can theoretically do up to 100,000 IOPS. Another presenter suggested that InnoDB is currently limited to around 15,000 on these cards.
  • Often cheaper to buy more memory, though that doesn’t help as much with writes.
  • Write options – mount files sytem noatime, compress blobs, reduce number of filesorts on disk.
  • If RAID with no BBWC, use innodb_flush_log_at_trx_commit of 2 (but be aware of possible data loss) or group statements into short transactions.
  • Referred to Facebook talk – they said maximum desirable transaction length for their workload and servers is about 4 seconds.

Mysteries of the Binary Log – slides

  • Safer to purge binary logs by using PURGE BINARY LOG {TO, BEFORE}, since it updates index file.
  • If you use expire_log_days, removal happens at server restart or log flush (which happens during a file rotate).
  • RESET MASTER will delete everything and create new binlog file. Disconnect any slaves before doing this.
  • Common header about 19 bytes. 4 byte timestamp, type, server id, length, file position, flags. File position is the end of the event.
  • SYSDATE() is dangerous, because it actually runs on the slave, so the generated timestamp will be different than on the master.
  • Context events (user defined vars, rand seed, auto_increment, last_insert_id()) are grouped with the query into a binlog event.
  • Since replication user can read binary logs, it can read all changes. Can be security issue.
  • For setting passwords, better to set passwords by setting them into a user variable as a hash and then setting hash directly as the password. Only the already hashed value in the SET statement is replicated.
  • Statements in transaction go into a cache. It is flushed to binary log on commit. It is emptied if a transaction with only transactional statements is rolled back. Complicated if you mix in MyISAM changes.
  • Transaction cache spills to a file if it gets too big. This is one of the reasons large transactions can cause bad performance.
  • DDL statements like CREATE and ALTER cause implicit commits. Changes to mysql databse and LOAD DATA INFILE also are implicitly committed.
4/19/2010: 9:01 am: Conference, MySQL

What’s New in MySQL 5.5? Performance Unleashed! - slides

  • Default storage engine is InnoDB (what had been called the InnoDB plugin, but now is the only version).
  • Many locks and mutexes held back scalability in 5.1.
  • Biggest win was fixing uses of LOCK_open. It was used for many things, some of which made no sense.
  • Introduced MetaData Locking (MDL). Added new mutex to protect MDL hash, but hash is calculated outside mutex.
  • Moved some things to 3 new mutexes, made some things atomic variables and removed useless uses of LOCK_open.
  • LOCK_thread_count used for too many things. Still used to protect list of connections, but changed other uses to atomic variables.
  • LOCK_open and the new MDL hash mutex are the biggest bottlenecks, but they are a problem at 32 cores now, instead of 8 cores.
  • InnoDB had many mutex issues for large number of cores. Buffer pool and log mutexes are worst. Adaptive hash index mutex is global and can be a problem, but can be disabled.
  • InnoDB Kernel mutex started to become a problem again, since other fixes allow you to reach higher number of concurrent transactions.
  • Log mutex held about 75% of time (for Sysbench RW) and buffer pool mutex about 50%.
  • Split the Log mutex into separate mutexes for independent functionality. Need to grab 2nd while holding 1st, but then can release 1st.
  • 1st was hottest, so separating acquisition of it from acquisition of buffer pool index was a big win.
  • Buffer pool, either split mutex functionality into multiple mutexes or split buffer pool into multiple pools.
  • InnoDB plugin with compression made having multiple mutexes much more complex, so decided to split buffer pool. Need 3 locks to get buffer pool page. Adding page hash mutex would have made it 4.
  • New config setting. Defaults to 1. Also split the flush list from buffer pool mutex.
  • Initial testing suggests 8-16 buffer pools gives best performance, but need lots more testing before making recommendations.
  • Rollback segment mutex problem with many connections. Can even perform IO while holding it. Split into 128 mutexes.
  • Purge was done as part of master thread. HIgh transaction rate mean lots of purging. Master thread can end up spending most of its time purging. Moved purging to separate thread.
  • Huge improvement in recovery (e. g., 7 hours to 14 minutes). Reduced algorithmic complexity of 2 impt activities.
  • Also huge improvement to delete performance.
  • Still looking into query cache issues.

MySQL Cluster: an Introduction – slides

  • NDBCluster Engine does not store to local disk. It pushes data to data nodes, which don’t need to be running MySQL.
  • Data nodes are ndbd processes and not necessarily bound to a single server. For testing, you can run multiple on the same server.
  • Data usually stored in 2 data nodes. Storing in 3 is experimental feature, requires more servers and performs worse. Storing in 1 obviously reduces availability, but gives great performance.
  • Storage is transactional.
  • You can have many mysqld instances all accessing the same data nodes. You can also use NDB API to access data nodes.
  • A hash is used to partition data across data nodes.
  • Always at least 1 replica of stored data. Node groups allows you to group replicas. Group selection is done according to order the server is listed in config file.
  • If a node goes down, its replica takes over as primary for the data. When node comes back, if not too far behind, it will sync the changes. If too far behind, it will sync a dump of all data.
  • Or you can bring up a replacement data node server with same IP address. Config is by IP.
  • If 2 data nodes in a group goes down, cluster must halt. Consistency over availability, CAP-wise.
  • SQL Node is a node running MySQL server.
  • Up to 48 data nodes. Very rarely more than 10 in practice, due to network and maintenance issues. Usually just a few.
  • Management node started first and it reads in the config. All data nodes need it in order to start. Also used for monitoring and arbitration.
  • If 2 data nodes can’t communicate, but 1 can still reach management node, that data node will take over. The one that can’t reach the mgmt node will shut down.
  • Mgmt node should be on different physical machine than the data nodes. You generally want two management nodes.
  • Only one will be the arbitrator. If primary fails, the data nodes will elect the 2nd mgmt node as the arbitrator.
  • There are also API nodes. A SQL node is a special case of an API node. API nodes use C++ library. Can also use Cluster/J in 7.1 for Java apps.
  • Only have to create tables on one MySQL server. The table creation will be auto detected by other servers.
  • User and grant info is independent, so that must be done on each server.
  • But, you can set up replication between MySQL servers. This allows you to more easily mix in non-cluster engine tables.
  • Need to handle retries in app. If data node dies during transaction, app must retry so transaction goes to other data node.
  • Table limits: 8052 bytes per row (can use blobs, but perf not good), 128 columns per table.
  • Indexed data limited to available RAM. Non-indexed data can be stored on disk.
  • Features: Online backup, global replication (can use MySQL to replicate clusters), NDB & MGM API, online additions of new data nodes.

EXPLAIN Demystified

  • Estimated query plan. Only for SELECT. Does not generate byte code. Unfortunately, you can’t ask MySQL for the plan it actually used for an executed query.
  • MySQL does everything as nested loop JOINs. Even a single table query is a single iteration loop. Always a left-deep tree, though it may reorder joins within the tree.
  • EXPLAIN output starts with deepest JOIN.
  • If no subquery or UNION, id for each row will be 1.
  • For select_type, ‘simple’ means only one SELECT keyword in entire query.
  • Derived means executed as a temp table, e.g., result of a subquery. If the table column is , the n is the subquery’s id column.
  • Union – rows spooled into temp table then read out with null in a row that says UNION RESULT
  • Be aware that EXPLAIN actually executes subqueries and unions, which can make it run very slowly. Also, can cause side effects if you update variables.
  • For derived, the higher id row is executed first. For UNION, the output tells you the execution order, e.g., .
  • IN and NOT IN queries are poorly optimized. They are converted into a dependent subquery.
  • UNCACHEABLE refers to the item cache. Not well documented. A few non-deterministic things like RAND() or a user variable can cause this.
  • Type is the JOIN type. Really more of an access type. Worst to best is ALL, index, range, ref, eq_ref, const, SYSTEM, null.
  • eq_ref means if there is a match, there will only be one row, perhaps due to a unique constraint on an index
  • Some non-range clauses get labeled as range because of details of the optimizer implementation.
  • ref tells you which columns or constants from preceding tables are used for lookups in the index named in the key column
  • rows is number of estimated rows that must be read. 5.1 tries to use LIMIT clause, but it is still sometimes inaccurate.
  • filtered: # of rows satisfying a condition. Added in 5.1. Not very useful.
  • extras: useful extra info (using index -> covering index; using where -> server post filters rows after using index; using temporary -> implicit temp table but not specified if memory or disk, using filesort -> means some kind of sort, but could be in memory)
  • filesort is an algorithm name. It doesn’t mean a disk-based file was used.
  • Use “pager mk-visual-explain” to pipe output through mk-visual-explain, assuming you installed Maatkit.

Connecting MySQL and Python – slides

  • Standard driver is MySQLdb. Another option is oursql.
  • If using MySQLdb library – db.connect(use_unicode=True, charset=’UTF8′)
  • The oursql driver has same API. Uses mysqlclient. Also implemented in C.
  • Both depend on MySQL client already being installed.
  • MySQL Connector/Python doesn’t depend on MySQL client libraries. Written by Geert.
  • Connector/J can be used with Jython. from java.sql import DriverManager
  • Current version is 0.1.3, but 0.1.4 coming soon.
  • Performance for INSERTs is about twice as slow as for MySQLdb or oursql.
  • Need more benchmark tests, since not clear whether that was including the network roundtrip time, since that should be roughly the same.
  • After the talk I talked to Chromakode about drivers and he recommended oursql. Much better code and documentation. I’ve switched to oursql and I like it.

Better Database debugging for shorter downtimes – slides

  • Tracking changes over time
    • Slowly Changing Dimension type-2. Start with a baseline and only track changes.
    • Need PK of original object, maybe from information_schema. Select attributes to track.
    • Add new row for each new property and when properties change. Use current_flag column to indicate which version is current. Also use effective_date and end_date, which can also tell you the current version.
  • Job logging via stored procedures
    • When they run a job (e.g., crontab entries), they use stored procedures to track when and how long it ran.
    • Really need to review slides and the stored procedures to get much value from this.

Linux Performance Tuning and Stabilization Tips – slides

  • To minimize mem usage, use compact datatypes. Remove unused indexes. Archive data when possible to minimize your working set.
  • Use direct IO -> innodb_flush_method=O_DIRECT.
  • Don’t alloc too much memory to mysqld. Makes filesystem cache too small and causes swapping.
  • If you disable swap & run out of memory, OOM killer will likely kill mysqld. It takes a long time to kill it and it aborts the process, which triggers crash recovery.
  • MyISAM, backup tools & admin tools use the filesystem cache.
  • Copying large files can cause swap. This can happen during backup.
  • set vm.swappiness=0 in /etc/sysctl.conf. Default is 60. This makes OS prefer making filesystem cache small over swapping.
  • tcmalloc faster than malloc. Can config InnoDB to use tcmalloc. Mostly matters for CPU bound loads with large buffer pool.
  • Don’t use too much session memory This can happen if you make per session memory variables to big. read_buffer_size can cause large memory blocks to be allocated.
  • Use BBWC on raid cards. Then, disable write cache on disks.
  • Don’t set write barrier if using BBWC (in ext3, barrier = 0)
  • Appending+fsync much slower than updating+fsync
  • If you use sync-binlog=1, fsync is called on append. Very slow.
  • Set innodb_autoextend_increment to higher than default of 8, maybe use 20.
  • pdflush takes care of async writes for MyISAM, file copy, mysqldump, etc.
  • Facebook moved from ext3 to xfs. ext3 very slow to delete large files. Makes dropping large table slow.
  • Writing to file is serialized.
  • Use dir_index to speed up searching and disable write barrier.
  • With xfs, disable write barrier with “nobarrier”. Concurrent writes possible with O_DIRECT.
  • Fusion SSD cards are 2-3 times faster than Intel SSD cards and 10-20 times faster than HDD. Fusion cards really benefit from xfs.
  • I/O Scheduler default is cfq, but noop or deadline are often better. InnoDB is already doing I/O scheduling itself, so cfq results in useless overhead.
  • Big difference in write performance between cfq and noop when lots of random reads mixed with writes. Recommend noop.
  • I/O scheduler queue size. Default is 128. Setting large like 100000 can greatly improve perf for MyISAM, since it doesn’t sort writes. Doesn’t help InnoDB, since it already sorts them.
  • iostat -x
  • Check r/s, w/s, svctm, %util, though svctm more reliable than %util
  • mpstat gives per core stats, vmstat averages across cores
  • mpstat -P ALL 1
  • SystemTap can be used with MySQL binary builds with DWARF symbols
  • iotop provides per process io stats
4/18/2010: 10:15 pm: MySQL

This year’s MySQL Conference was once again jam packed with very technical presentations. Best of all was the news about all the performance and feature enhancements in MySQL 5.5. The benchmark results for the performance improvements were very impressive. After attending this year’s conference, I’m more confident about MySQL’s future than I’ve been for a very long time. I’ll post my notes from the three days I attended.

It was also a great time to meet up with the friends I’ve made in the MySQL community over the last few years. Some other open source communities have a really poisonous atmosphere of spending a lot of time attacking competing open source products. Fortunately, I see very little of that time-wasting crap within the MySQL community. Well, except for the occasional fork sniping, but that’s easy for me to ignore.

Update: Finally managed to post my notes from all 3 days.

12/29/2009: 5:17 pm: Mac, MySQL, Python

I just updated an old post on using Python with MySQL on OS X. The good news is that you don’t need to create an extra directory of symbolic links anymore. The not really bad news is that you have to be careful about 32-bit versus 64-bit.

Ideally, you’ll first upgrade to a 64-bit version of MySQL 5.1 or later, unless you already have. Then, follow Geert’s simple instructions for building the MySQL Python connector.

11/12/2009: 2:41 pm: MySQL

After upgrading to the newest version Maatkit, the invaluable toolkit for MySQL, I was no longer able to run any of the Maatkit tools. The problem turned out to be caused by my update to MacPorts after upgrading to Snow Leopard.

Note, if you haven’t installed Maatkit, yet, you may need to first install the Perl DBI module. Make sure the MySQL bin directory is in your path so that perl can find mysql_config.

$ sudo perl -MCPAN -e 'install Bundle::DBI'
$ cpan
cpan[1]> get DBD::mysql
cpan[2]> quit
$ cd ~/.cpan/build/DBD-mysql   {use tab completion to get full path}
$ perl Makefile.PL --testuser='root'   {supply valid values for --testuser and --testpassword}
$ make
$ make test
$ sudo make install

By default, Apple provides perl 5.10, which is built in 64-bit mode, though backwards compatible with 32-bit binaries. However, I was seeing a different version.

$ perl --version
 
This is perl, v5.8.9 built for darwin-2level

Trying to run mk-archiver resulted in the following error (again, wrapping some lines for readability):

$ mk-archiver --version
Can't load '/opt/local/lib/perl5/site_perl/5.8.8//darwin-2level/auto/Data/Dumper/Dumper.bundle'
for module Data::Dumper:
dlopen(/opt/local/lib/perl5/site_perl/5.8.8//darwin-2level/auto/Data/Dumper/Dumper.bundle, 1):
no suitable image found.  Did find:
/opt/local/lib/perl5/site_perl/5.8.8//darwin-2level/auto/Data/Dumper/Dumper.bundle:
mach-o, but wrong architecture at
/opt/local/lib/perl5/5.8.9/darwin-2level/XSLoader.pm line 73.
 at /opt/local/lib/perl5/site_perl/5.8.8//darwin-2level/Data/Dumper.pm line 27
Compilation failed in require at /opt/local/bin/mk-archiver line 874.
BEGIN failed--compilation aborted at /opt/local/bin/mk-archiver line 874.

The “wrong architecture” note in the error message led me to eventually find the cause of my troubles. MacPorts added the following to my ~/.bash_profile (I wrapped a line to make it easier to read:

# MacPorts Installer addition on 2009-10-31_at_23:33:12:
# adding an appropriate PATH variable for use with MacPorts.
export PATH=/opt/local/bin:/opt/local/sbin:$PATH
# Finished adapting your PATH environment variable for use with MacPorts.

This caused the 32-bit perl 5.8.9 from MacPorts to be used instead of the 64-bit perl 5.10 provided with Snow Leopard. Commenting out the export line above was the first step to fixing the problem.

I also noticed that I wasn’t adding /usr/bin to my path before I added the MacPorts paths. So, I made the following change to my .bash_profile (leaving out some of the path components for clarity).

PATH_PREFIX=/usr/local/bin:/usr/bin:$JDK_HOME/bin:/opt/local/bin:/opt/local/sbin
PATH_SUFFIX=$ANT_HOME/bin:$MYSQL_HOME/bin:$SCALA_HOME/bin:$GIT_HOME/bin
export PATH=$PATH_PREFIX:$PATH:$PATH_SUFFIX

Using source .bash_profile won’t update the PATH variable properly because of the way it is being constructed above. So, start a new console to get a corrected value for the PATH. Then you should get:

$ perl --version
 
This is perl, v5.10.0 built for darwin-thread-multi-2level

and something like

$ mk-archiver --version
mk-archiver  Ver 1.0.20 Distrib 5014 Changeset 5004
10/26/2009: 12:44 pm: Conference, MySQL

OpenSQL Camp 2009 in Portland, OR

I’ll be in Portland on November 13, 14 and 15 for OpenSQL Camp, an unconference covering open source databases. In addition to learning more about MySQL and Drizzle internals, I’m looking forward to learning a bit about Postgres and some of the NoSQL database alternatives. They aren’t just your father’s high performance key-value stores, anymore.

Next Page »


Fork me on GitHub