MySQL


5/2/2010: 10:15 pm: RobertMySQL

Monitoring MySQL with Cacti

  • Cacti, like many other tools (Munin, Cricket, etc.), is a wrapper around RRDTool
  • Baron feels that existing tools (zabbix, ZenOSS, OpenNMS, etc.) that try to do graphing and alerting do at least one of them poorly, though Reconnoiter looks promising.
  • Cacti does graphing well
  • RRDTool does interpolations, so you lose the original data. By default, it doesn’t keep data very long.
  • With MRTG you have to create new graphs. Cacti has simpler approach using templates.
  • Data source, graph and host templates.
  • Problem – templates not always well written. Not always version compatible. Not always well parameterized.
  • Better Cacti Templates project has more than just MySQL templates.
  • Cacti prefers to use SNMP, but often easier to directly connect to mysqld or to use ssh and command line client
  • The wiki for Better Cacti Templates has very detailed installation instructions.

High Throughput MySQL at Facebook

  • Currently upgrading to 5.1.45 and innodb plugin 1.6
  • Perf testing starts with Sysbench. Have a tool called Shadow that allows them to direct all traffic to another server that is heavily instrumented.
  • Across shards – 7ms read time on avg, 13 million reads/sec peak, 370 million rows read/sec peak 210 m avg, 3.5 m rows modified peak 1.9 m avg, InnoDB disk IO/sec 4.4 m peak 3.3m avg
  • Row-based semi-sync replication
  • InnoDB plugin for perf and compression
  • mk-query-digest, mk-slave-prefetch, mk-upgrade
  • They don’t use stored procedures, but they might be helpful to solve some of their network latency issues
  • Get about 1500 IOPS from locally attached storage. 8 disks per server.
  • For 5.1 upgrade, planning to dump and reload and switch to InnoDB file per table (and moving to new hosts)
  • Paging via LIMIT is O(N*N)
  • Do pessimistic concurrency control with select for update and retries
  • Internal requirement for performance critical queries to be index only
  • They improved high concurrency throughput by disabling deadlock detection and depending on lock wait timeout. Reduced wait timeout to 15 sec, but dynamic per session. Dynamically reduce as low as 1 when server extremely busy.
  • They make temporary changes to some MySQL settings to survive extreme peak
  • They use gdb (see posts by Domas) to change variables that normally aren’t dynamic
  • LRU patch protects buffer pool from full table scans (e.g., from mysqldump).
  • Need to also monitor on client side, since server doesn’t see network issues, client timeouts, etc.
4/25/2010: 9:05 pm: RobertMySQL

Replication Tips – slides

  • Single master can typically handle a few dozen slaves
  • If you have a lot of slaves and want to do filtering for all, maybe use blackhole engine to do filtering in one place. Especially if running blackhole slave on same server to limit network bandwidth.
  • Replicate over WAN – slave_net_timeout, master_connect_retry, slave_compressed_protocol

Gearman – slides

  • UDFs allow you to run functions in a wide range of locations on servers other than the database server
  • Synch and asynch jobs
  • gman_sum() provides simple example of map-reduce with gearman
  • - Asynch queues, a.k.a., background tasks
  • e.g., email notifications (avoid blocking on sendmail), full text indexing, stat counters, etc.
  • Can enable you to do batch operations, when you don’t want to block on a large batch
  • Maybe use trigger to call a UDF that does a background task. Allows you to implement much more complex triggers with external side effects (e.g., sending notifications).
  • Putting Gearman workers between app and MySQL can allow you to parallelize queries (scatter-gather).
  • Persistent queues – if job server crashes, it can replay the queue on startup or you start another job server and point it at the queue
  • Queues can be stored in Drizzle, MySQL, PostgresQL, SQLite and even in Memcached.

Mastering the Art of Indexing – slides

  • Lock contention and indexing
    • update/delete by full scan is sequential rw. By index scan is random. So, full scan can be better in some cases.
    • In 5.1 can disable next key locking. use binlog-format=row and transaction-isolation=read-committed. But read committed can perform worse with high concurrency.
  • Deadlock caused by indexing
    • Sometimes single query update or delete has to update secondary index as well as PK table in InnoDB.
    • If 1 statement accesses and modifies sec index before accessing table records while another statement modifies same table record and needs to update same row in secondary index, you can get deadlock.
  • Covering index with range, limit, count
    • Range queries over large ranges often slow, especially if looking in secondary index before using range on table records. Can result in many random reads. Covering index can improve perf by eliminating random reads.
    • Covering index can also help a lot with LIMIT (especially when skipping) if the query would otherwise have to randomly access table records to apply other where clause conditions.
    • Similarly, covering index can help with COUNT, since you don’t have to randomly access table records.
  • Covering index with blob
    • Assuming you have a table with a blob that is rarely selected (e.g., < 10% of time). To avoid blob being loaded into buffer pool, typical optimization is to move blob into another table with 1:1 relationship, though this breaks normalization.
    • For his tests, performance with 2 tables is better until percentage of queries requesting blob (thus requiring two SELECTs) get up to around 45%.
    • Instead, can use covering index to include all desired columns (except the blob, of course) for most queries. To get good perf, initial column should be chronologically order, e.g., auto inc or a timestamp.
    • May have to use FORCE INDEX to get optimizer to use your index.
  • Sorting, indexing and query execution plans
    • If query does ORDER BY on a secondary index that is used for the query, can skip filesort, which is O(nlogn).
    • If not, optimizer chooses btw full scan and using index for sort vs. index scan and a filesort.
    • Index merge can be used to filter records, but not to sort.

Understanding the Role of IO as Bottleneck – slides

  • 7200 RPM drive can generally do about 100 IOPS.
  • HDD disk seek about 10 ms. Reading 1 MB sequentially from HDD is about 20 ms.
  • When you do a read on InnoDB, rows are loaded into buffer pool.
  • Writes to InnoDB go to buffer pool, but also are flushed to log file. Sequential IO to log file is fast. Buffer pool pages (16 KB/page) are marked as dirty until flushed to tablespace.
  • Writes to tablespace are sorted by innodb to be sequential as much as possible.
  • iostat is more useful than vmstat for IO analysis
  • If svctm large (maybe more than a few milliseconds), may have large competing IO requests, e.g., a busy shared SAN.
  • Queue size tells you how much data could be lost if server crashes. Wait time indicates how long ops may be blocking and also how much risk of lost data from power loss with no battery back up (or non-catastropic disk crash).
  • Indexes and archiving old data can reduce your working set. If your server is read-heavy, lots of RAM can really help.
  • RAID 5 good for reads, but worse for writes. RAID 10 good for both.
  • RAID controller with BBWC can reorder some random writes to sequential writes without risk of a power loss losing data.
  • Fusion-IO card can theoretically do up to 100,000 IOPS. Another presenter suggested that InnoDB is currently limited to around 15,000 on these cards.
  • Often cheaper to buy more memory, though that doesn’t help as much with writes.
  • Write options – mount files sytem noatime, compress blobs, reduce number of filesorts on disk.
  • If RAID with no BBWC, use innodb_flush_log_at_trx_commit of 2 (but be aware of possible data loss) or group statements into short transactions.
  • Referred to Facebook talk – they said maximum desirable transaction length for their workload and servers is about 4 seconds.

Mysteries of the Binary Log – slides

  • Safer to purge binary logs by using PURGE BINARY LOG {TO, BEFORE}, since it updates index file.
  • If you use expire_log_days, removal happens at server restart or log flush (which happens during a file rotate).
  • RESET MASTER will delete everything and create new binlog file. Disconnect any slaves before doing this.
  • Common header about 19 bytes. 4 byte timestamp, type, server id, length, file position, flags. File position is the end of the event.
  • SYSDATE() is dangerous, because it actually runs on the slave, so the generated timestamp will be different than on the master.
  • Context events (user defined vars, rand seed, auto_increment, last_insert_id()) are grouped with the query into a binlog event.
  • Since replication user can read binary logs, it can read all changes. Can be security issue.
  • For setting passwords, better to set passwords by setting them into a user variable as a hash and then setting hash directly as the password. Only the already hashed value in the SET statement is replicated.
  • Statements in transaction go into a cache. It is flushed to binary log on commit. It is emptied if a transaction with only transactional statements is rolled back. Complicated if you mix in MyISAM changes.
  • Transaction cache spills to a file if it gets too big. This is one of the reasons large transactions can cause bad performance.
  • DDL statements like CREATE and ALTER cause implicit commits. Changes to mysql databse and LOAD DATA INFILE also are implicitly committed.
4/19/2010: 9:01 am: RobertMySQL

What’s New in MySQL 5.5? Performance Unleashed! - slides

  • Default storage engine is InnoDB (what had been called the InnoDB plugin, but now is the only version).
  • Many locks and mutexes held back scalability in 5.1.
  • Biggest win was fixing uses of LOCK_open. It was used for many things, some of which made no sense.
  • Introduced MetaData Locking (MDL). Added new mutex to protect MDL hash, but hash is calculated outside mutex.
  • Moved some things to 3 new mutexes, made some things atomic variables and removed useless uses of LOCK_open.
  • LOCK_thread_count used for too many things. Still used to protect list of connections, but changed other uses to atomic variables.
  • LOCK_open and the new MDL hash mutex are the biggest bottlenecks, but they are a problem at 32 cores now, instead of 8 cores.
  • InnoDB had many mutex issues for large number of cores. Buffer pool and log mutexes are worst. Adaptive hash index mutex is global and can be a problem, but can be disabled.
  • InnoDB Kernel mutex started to become a problem again, since other fixes allow you to reach higher number of concurrent transactions.
  • Log mutex held about 75% of time (for Sysbench RW) and buffer pool mutex about 50%.
  • Split the Log mutex into separate mutexes for independent functionality. Need to grab 2nd while holding 1st, but then can release 1st.
  • 1st was hottest, so separating acquisition of it from acquisition of buffer pool index was a big win.
  • Buffer pool, either split mutex functionality into multiple mutexes or split buffer pool into multiple pools.
  • InnoDB plugin with compression made having multiple mutexes much more complex, so decided to split buffer pool. Need 3 locks to get buffer pool page. Adding page hash mutex would have made it 4.
  • New config setting. Defaults to 1. Also split the flush list from buffer pool mutex.
  • Initial testing suggests 8-16 buffer pools gives best performance, but need lots more testing before making recommendations.
  • Rollback segment mutex problem with many connections. Can even perform IO while holding it. Split into 128 mutexes.
  • Purge was done as part of master thread. HIgh transaction rate mean lots of purging. Master thread can end up spending most of its time purging. Moved purging to separate thread.
  • Huge improvement in recovery (e. g., 7 hours to 14 minutes). Reduced algorithmic complexity of 2 impt activities.
  • Also huge improvement to delete performance.
  • Still looking into query cache issues.

MySQL Cluster: an Introduction – slides

  • NDBCluster Engine does not store to local disk. It pushes data to data nodes, which don’t need to be running MySQL.
  • Data nodes are ndbd processes and not necessarily bound to a single server. For testing, you can run multiple on the same server.
  • Data usually stored in 2 data nodes. Storing in 3 is experimental feature, requires more servers and performs worse. Storing in 1 obviously reduces availability, but gives great performance.
  • Storage is transactional.
  • You can have many mysqld instances all accessing the same data nodes. You can also use NDB API to access data nodes.
  • A hash is used to partition data across data nodes.
  • Always at least 1 replica of stored data. Node groups allows you to group replicas. Group selection is done according to order the server is listed in config file.
  • If a node goes down, its replica takes over as primary for the data. When node comes back, if not too far behind, it will sync the changes. If too far behind, it will sync a dump of all data.
  • Or you can bring up a replacement data node server with same IP address. Config is by IP.
  • If 2 data nodes in a group goes down, cluster must halt. Consistency over availability, CAP-wise.
  • SQL Node is a node running MySQL server.
  • Up to 48 data nodes. Very rarely more than 10 in practice, due to network and maintenance issues. Usually just a few.
  • Management node started first and it reads in the config. All data nodes need it in order to start. Also used for monitoring and arbitration.
  • If 2 data nodes can’t communicate, but 1 can still reach management node, that data node will take over. The one that can’t reach the mgmt node will shut down.
  • Mgmt node should be on different physical machine than the data nodes. You generally want two management nodes.
  • Only one will be the arbitrator. If primary fails, the data nodes will elect the 2nd mgmt node as the arbitrator.
  • There are also API nodes. A SQL node is a special case of an API node. API nodes use C++ library. Can also use Cluster/J in 7.1 for Java apps.
  • Only have to create tables on one MySQL server. The table creation will be auto detected by other servers.
  • User and grant info is independent, so that must be done on each server.
  • But, you can set up replication between MySQL servers. This allows you to more easily mix in non-cluster engine tables.
  • Need to handle retries in app. If data node dies during transaction, app must retry so transaction goes to other data node.
  • Table limits: 8052 bytes per row (can use blobs, but perf not good), 128 columns per table.
  • Indexed data limited to available RAM. Non-indexed data can be stored on disk.
  • Features: Online backup, global replication (can use MySQL to replicate clusters), NDB & MGM API, online additions of new data nodes.

EXPLAIN Demystified

  • Estimated query plan. Only for SELECT. Does not generate byte code. Unfortunately, you can’t ask MySQL for the plan it actually used for an executed query.
  • MySQL does everything as nested loop JOINs. Even a single table query is a single iteration loop. Always a left-deep tree, though it may reorder joins within the tree.
  • EXPLAIN output starts with deepest JOIN.
  • If no subquery or UNION, id for each row will be 1.
  • For select_type, ‘simple’ means only one SELECT keyword in entire query.
  • Derived means executed as a temp table, e.g., result of a subquery. If the table column is , the n is the subquery’s id column.
  • Union – rows spooled into temp table then read out with null in a row that says UNION RESULT
  • Be aware that EXPLAIN actually executes subqueries and unions, which can make it run very slowly. Also, can cause side effects if you update variables.
  • For derived, the higher id row is executed first. For UNION, the output tells you the execution order, e.g., .
  • IN and NOT IN queries are poorly optimized. They are converted into a dependent subquery.
  • UNCACHEABLE refers to the item cache. Not well documented. A few non-deterministic things like RAND() or a user variable can cause this.
  • Type is the JOIN type. Really more of an access type. Worst to best is ALL, index, range, ref, eq_ref, const, SYSTEM, null.
  • eq_ref means if there is a match, there will only be one row, perhaps due to a unique constraint on an index
  • Some non-range clauses get labeled as range because of details of the optimizer implementation.
  • ref tells you which columns or constants from preceding tables are used for lookups in the index named in the key column
  • rows is number of estimated rows that must be read. 5.1 tries to use LIMIT clause, but it is still sometimes inaccurate.
  • filtered: # of rows satisfying a condition. Added in 5.1. Not very useful.
  • extras: useful extra info (using index -> covering index; using where -> server post filters rows after using index; using temporary -> implicit temp table but not specified if memory or disk, using filesort -> means some kind of sort, but could be in memory)
  • filesort is an algorithm name. It doesn’t mean a disk-based file was used.
  • Use “pager mk-visual-explain” to pipe output through mk-visual-explain, assuming you installed Maatkit.

Connecting MySQL and Python – slides

  • Standard driver is MySQLdb. Another option is oursql.
  • If using MySQLdb library – db.connect(use_unicode=True, charset=’UTF8′)
  • The oursql driver has same API. Uses mysqlclient. Also implemented in C.
  • Both depend on MySQL client already being installed.
  • MySQL Connector/Python doesn’t depend on MySQL client libraries. Written by Geert.
  • Connector/J can be used with Jython. from java.sql import DriverManager
  • Current version is 0.1.3, but 0.1.4 coming soon.
  • Performance for INSERTs is about twice as slow as for MySQLdb or oursql.
  • Need more benchmark tests, since not clear whether that was including the network roundtrip time, since that should be roughly the same.
  • After the talk I talked to Chromakode about drivers and he recommended oursql. Much better code and documentation. I’ve switched to oursql and I like it.

Better Database debugging for shorter downtimes – slides

  • Tracking changes over time
    • Slowly Changing Dimension type-2. Start with a baseline and only track changes.
    • Need PK of original object, maybe from information_schema. Select attributes to track.
    • Add new row for each new property and when properties change. Use current_flag column to indicate which version is current. Also use effective_date and end_date, which can also tell you the current version.
  • Job logging via stored procedures
    • When they run a job (e.g., crontab entries), they use stored procedures to track when and how long it ran.
    • Really need to review slides and the stored procedures to get much value from this.

Linux Performance Tuning and Stabilization Tips – slides

  • To minimize mem usage, use compact datatypes. Remove unused indexes. Archive data when possible to minimize your working set.
  • Use direct IO -> innodb_flush_method=O_DIRECT.
  • Don’t alloc too much memory to mysqld. Makes filesystem cache too small and causes swapping.
  • If you disable swap & run out of memory, OOM killer will likely kill mysqld. It takes a long time to kill it and it aborts the process, which triggers crash recovery.
  • MyISAM, backup tools & admin tools use the filesystem cache.
  • Copying large files can cause swap. This can happen during backup.
  • set vm.swappiness=0 in /etc/sysctl.conf. Default is 60. This makes OS prefer making filesystem cache small over swapping.
  • tcmalloc faster than malloc. Can config InnoDB to use tcmalloc. Mostly matters for CPU bound loads with large buffer pool.
  • Don’t use too much session memory This can happen if you make per session memory variables to big. read_buffer_size can cause large memory blocks to be allocated.
  • Use BBWC on raid cards. Then, disable write cache on disks.
  • Don’t set write barrier if using BBWC (in ext3, barrier = 0)
  • Appending+fsync much slower than updating+fsync
  • If you use sync-binlog=1, fsync is called on append. Very slow.
  • Set innodb_autoextend_increment to higher than default of 8, maybe use 20.
  • pdflush takes care of async writes for MyISAM, file copy, mysqldump, etc.
  • Facebook moved from ext3 to xfs. ext3 very slow to delete large files. Makes dropping large table slow.
  • Writing to file is serialized.
  • Use dir_index to speed up searching and disable write barrier.
  • With xfs, disable write barrier with “nobarrier”. Concurrent writes possible with O_DIRECT.
  • Fusion SSD cards are 2-3 times faster than Intel SSD cards and 10-20 times faster than HDD. Fusion cards really benefit from xfs.
  • I/O Scheduler default is cfq, but noop or deadline are often better. InnoDB is already doing I/O scheduling itself, so cfq results in useless overhead.
  • Big difference in write performance between cfq and noop when lots of random reads mixed with writes. Recommend noop.
  • I/O scheduler queue size. Default is 128. Setting large like 100000 can greatly improve perf for MyISAM, since it doesn’t sort writes. Doesn’t help InnoDB, since it already sorts them.
  • iostat -x
  • Check r/s, w/s, svctm, %util, though svctm more reliable than %util
  • mpstat gives per core stats, vmstat averages across cores
  • mpstat -P ALL 1
  • SystemTap can be used with MySQL binary builds with DWARF symbols
  • iotop provides per process io stats
4/18/2010: 10:15 pm: RobertMySQL

This year’s MySQL Conference was once again jam packed with very technical presentations. Best of all was the news about all the performance and feature enhancements in MySQL 5.5. The benchmark results for the performance improvements were very impressive. After attending this year’s conference, I’m more confident about MySQL’s future than I’ve been for a very long time. I’ll post my notes from the three days I attended.

It was also a great time to meet up with the friends I’ve made in the MySQL community over the last few years. Some other open source communities have a really poisonous atmosphere of spending a lot of time attacking competing open source products. Fortunately, I see very little of that time-wasting crap within the MySQL community. Well, except for the occasional fork sniping, but that’s easy for me to ignore.

Update: Finally managed to post my notes from all 3 days.

12/29/2009: 5:17 pm: RobertMac, MySQL, Python

I just updated an old post on using Python with MySQL on OS X. The good news is that you don’t need to create an extra directory of symbolic links anymore. The not really bad news is that you have to be careful about 32-bit versus 64-bit.

Ideally, you’ll first upgrade to a 64-bit version of MySQL 5.1 or later, unless you already have. Then, follow Geert’s simple instructions for building the MySQL Python connector.

11/12/2009: 2:41 pm: RobertMySQL

After upgrading to the newest version Maatkit, the invaluable toolkit for MySQL, I was no longer able to run any of the Maatkit tools. The problem turned out to be caused by my update to MacPorts after upgrading to Snow Leopard.

By default, Apple provides perl 5.10, which is built in 64-bit mode, though backwards compatible with 32-bit binaries. However, I was seeing a different version.

$ perl --version
 
This is perl, v5.8.9 built for darwin-2level

Trying to run mk-archiver resulted in the following error (again, wrapping some lines for readability):

$ mk-archiver --version
Can't load '/opt/local/lib/perl5/site_perl/5.8.8//darwin-2level/auto/Data/Dumper/Dumper.bundle'
for module Data::Dumper:
dlopen(/opt/local/lib/perl5/site_perl/5.8.8//darwin-2level/auto/Data/Dumper/Dumper.bundle, 1):
no suitable image found.  Did find:
/opt/local/lib/perl5/site_perl/5.8.8//darwin-2level/auto/Data/Dumper/Dumper.bundle:
mach-o, but wrong architecture at
/opt/local/lib/perl5/5.8.9/darwin-2level/XSLoader.pm line 73.
 at /opt/local/lib/perl5/site_perl/5.8.8//darwin-2level/Data/Dumper.pm line 27
Compilation failed in require at /opt/local/bin/mk-archiver line 874.
BEGIN failed--compilation aborted at /opt/local/bin/mk-archiver line 874.

The “wrong architecture” note in the error message led me to eventually find the cause of my troubles. MacPorts added the following to my ~/.bash_profile (I wrapped a line to make it easier to read:

# MacPorts Installer addition on 2009-10-31_at_23:33:12:
# adding an appropriate PATH variable for use with MacPorts.
export PATH=/opt/local/bin:/opt/local/sbin:$PATH
# Finished adapting your PATH environment variable for use with MacPorts.

This caused the 32-bit perl 5.8.9 from MacPorts to be used instead of the 64-bit perl 5.10 provided with Snow Leopard. Commenting out the export line above was the first step to fixing the problem.

I also noticed that I wasn’t adding /usr/bin to my path before I added the MacPorts paths. So, I made the following change to my .bash_profile (leaving out some of the path components for clarity).

PATH_PREFIX=/usr/local/bin:/usr/bin:$JDK_HOME/bin:/opt/local/bin:/opt/local/sbin
PATH_SUFFIX=$ANT_HOME/bin:$MYSQL_HOME/bin:$SCALA_HOME/bin:$GIT_HOME/bin
export PATH=$PATH_PREFIX:$PATH:$PATH_SUFFIX

Using source .bash_profile won’t update the PATH variable properly because of the way it is being constructed above. So, start a new console to get a corrected value for the PATH. Then you should get:

$ perl --version
 
This is perl, v5.10.0 built for darwin-thread-multi-2level

and something like

$ mk-archiver --version
mk-archiver  Ver 1.0.20 Distrib 5014 Changeset 5004
10/26/2009: 12:44 pm: RobertMySQL

OpenSQL Camp 2009 in Portland, OR

I’ll be in Portland on November 13, 14 and 15 for OpenSQL Camp, an unconference covering open source databases. In addition to learning more about MySQL and Drizzle internals, I’m looking forward to learning a bit about Postgres and some of the NoSQL database alternatives. They aren’t just your father’s high performance key-value stores, anymore.

6/10/2009: 10:05 pm: RobertJava, MySQL

The C3P0 database connection pooling library has worked very well for me, but I recently ran into a problem when I wanted to log the SQL that was being generated for a PreparedStatement if an insert failed. Then, if database problems caused inserts to fail beyond an automated retry period, I could easily harvest the SQL statements from the log file and retry them later. Also, if a data problem caused the inserts to fail, I could harvest the statements from the log, use a script to adjust the data and then retry them.

C3P0 wraps the JDBC driver’s PreparedStatement class with its C3P0ProxyStatement. Unfortunately, it doesn’t override toString() to return something useful. So, you just get the default of a class name and the hash code for the instance. Not helpful.

Unfortunately, there is nothing in the API that gives you an obvious means of getting at the real PreparedStatement, which I can see in the debugger inside a private variable named inner. I knew that the rawStatementOperation() method might be the key, but it wasn’t obvious. However, a bit of googling turned up a solution in the unit test code for C3P0.

Unit tests are awesome. Not just for testing your code, but also for providing example code.

So, here’s some example code. First, I created a simple table in the test database of a local MySQL install.

create table rs (a int);

And here’s some code that connects to the database, sets up a prepared statement and then extracts the SQL from the prepared statement with the parameters bound:

import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.sql.Connection;
import java.sql.PreparedStatement;
import java.sql.SQLException;
 
import com.mchange.v2.c3p0.C3P0ProxyStatement;
import com.mchange.v2.c3p0.ComboPooledDataSource;
 
public class C3P0Demo {
 
  public static void main(String[] args) throws Exception {
 
    // Get a connection
    String url = "jdbc:mysql://localhost:3306/test";
    ComboPooledDataSource ds = new ComboPooledDataSource();
    ds.setDriverClass("com.mysql.jdbc.Driver");
    ds.setJdbcUrl(url);
    ds.setUser("myuser");
    ds.setPassword("mypassword");
    Connection conn = ds.getConnection();
 
    // Prepare the PreparedStatement
    PreparedStatement ps = conn.prepareStatement("insert into rs values (?)");
    ps.setInt(1, 10);
 
    // Extract the SQL
    String sql = "";
    try {
      C3P0ProxyStatement c3p0Stmt = (C3P0ProxyStatement) ps;
      Method toStringMethod = Object.class.getMethod("toString", new Class[] {});
      Object toStr = c3p0Stmt.rawStatementOperation(toStringMethod,
          C3P0ProxyStatement.RAW_STATEMENT, new Object[] {});
      if (sql instanceof String) {
        sql = (String) toStr;
        sql = sql.substring(sql.indexOf('-') + 1).trim() + ";";
        System.out.println(sql);
      }
    } catch (SQLException e) {
      System.out.println("Exception extracting SQL: " + e.getMessage());
    } catch (SecurityException e) {
      System.out.println("Exception extracting SQL: " + e.getMessage());
    } catch (NoSuchMethodException e) {
      System.out.println("Exception extracting SQL: " + e.getMessage());
    } catch (IllegalArgumentException e) {
      System.out.println("Exception extracting SQL: " + e.getMessage());
    } catch (IllegalAccessException e) {
      System.out.println("Exception extracting SQL: " + e.getMessage());
    } catch (InvocationTargetException e) {
      System.out.println("Exception extracting SQL: " + e.getMessage());
    }
  }
}

Inside the C3P0ProxyStatement is a com.mysql.jdbc.ServerPreparedStatement. For my example, ServerPreparedStatement.toString() returns:

com.mysql.jdbc.ServerPreparedStatement[1] - insert into rs values (10)
8/5/2008: 10:27 am: RobertMySQL

This year I attended presentations by the EBay engineering team at the MySQL Conference and JavaOne in which they talked about some very interesting patches that had made to MySQL Server. I lamented the fact that they hadn’t open sourced it yet or contributed all the patches back to MySQL. Google has also announced some great enhancements they have made to MySQL, especially around semi-synchronous replication. Since then I’ve learned a bit more from posts aggregated on Planet MySQL that contributing patches and getting them accepted into a MySQL release that will see the light of day anytime soon is not so easy.

However, a group of some of the most prominent technical employees at MySQL have been joined by community members to create a fork of MySQL Server called Drizzle. Drizzle is “being designed for massive concurrency on modern multi-cpu/core architecture”, partially by stripping away some of the enterprise database features, e.g., stored procedures, triggers, etc., that are much less important for highly scalable web applications.

Since Drizzle doesn’t have to satisfy paying enterprise customers, the team working on the project is better able, at least for now, to take on enhancements like the EBay and Google patches. Currently, the MySQL community edition is very similar to the enterprise edition. However, it seems that it is not common for features to appear in the community edition before appearing in the enterprise edition. In fact, there have been some proposals that it work in the opposite direction.

This is very unlike the relationship between Fedora and Red Hat Enterprise Linux. Fedora is a much more experimental OS than RHEL, in that new features tend to show up sooner in Fedora. Also, just because Fedora adopts or drops a feature doesn’t mean that RHEL will do the same. Fedora is not a beta version of the next release of RHEL.

While Drizzle may range even farther afield of the MySQL community and server editions than does Fedora from RHEL, I think this relationship will greatly benefit MySQL in the long run. Many of the likely users of Drizzle are probably the least likely to be paying customers of MySQL. However, if they need a more traditional enterprise database server for another project, they are more likely to turn to MySQL Server. While not the same feature set, the transition would obviously be a lot less disruptive than going to a totally different database server. Also, Drizzle should provide the kind of testing ground for new features that it seems like MySQL community edition has yet to become.

7/25/2008: 10:09 pm: RobertMySQL

Today I needed to make the same kind of update to a field in a few rows of a large database table. There were too many rows to do it with a separate SQL update statement per row, so I hunted down the MySQL REPLACE function. Okay, it was only 6 rows, but I needed to make the same change in a dev, QA and production database. The example in the documentation is pretty trivial, but it wasn’t too hard to expand it to conditionally replace the substring.

mysql> help replace;
Name: 'REPLACE'
Description:
Syntax:
REPLACE(str,from_str,to_str)

Returns the string str with all occurrences of the string from_str
replaced by the string to_str. REPLACE() performs a case-sensitive
match when searching for from_str.

URL: http://dev.mysql.com/doc/refman/5.1/en/string-functions.html

Examples:
mysql> SELECT REPLACE('www.mysql.com', 'w', 'Ww');
        -> 'WwWwWw.mysql.com'

The UPDATE query below is pretty similar to what I needed to do, i.e., remove some characters from the end of a varchar column. The LIKE clause I’ve used in the query ensures that “cruft” is replaced only if it appears at the end of the string.

mysql> CREATE TABLE tbl (col varchar(20));
Query OK, 0 rows affected (0.41 sec)

mysql> INSERT INTO tbl VALUES ('cruft_front'), ('some_cruft_in_middle'), ('end_cruft');
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

mysql> UPDATE tbl SET col = REPLACE(col, 'cruft', '') WHERE col LIKE '%cruft';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> select * from tbl;
+----------------------+
| col                  |
+----------------------+
| cruft_front          |
| some_cruft_in_middle |
| end_                 |
+----------------------+
3 rows in set (0.00 sec)

Next Page »