Streaming ResultSets with the MySQL JDBC Driver

By | June 3, 2011

When you use the MySQL JDBC driver to select rows from a table, the connection will block until the entire ResultSet has been pulled over to the client. In most cases this makes sense, especially if the server is on a different host. Retrieving the entire ResultSet will minimize the number of TCP packets that must be sent from the server.

However, if you are returning a very large ResultSet, the client will have to allocate a lot of memory on the heap. If you end up accessing each row to create an object from the data, then you will need enough heap space for the entire ResultSet plus all of the objects you instantiate.

The driver documentation explains how to force the driver to stream the ResultSet row-by-row.

The first catch is that you must be using a regular Statement object, not a PreparedStatement.

The documentation says you need to add the following non-intuitive code before executing the query:

stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
              java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);

though you can actually just use conn.createStatement() since TYPE_FORWARD_ONLY and CONCUR_READ_ONLY are the defaults.

There are a couple caveats in the documentation, though they are fairly obvious. You should process the ResultSet as quickly as possible, since locks will be held as long as the statement (and any transaction it is in) is open.

In addition to being non-intuitive, setting the fetch size to Integer.MIN_VALUE might cause unexpected results if you run your code against a database server other than MySQL.

If you’re willing to go all out in committing to MySQL, you can cast the return value of createStatement() to com.mysql.jdbc.Statement.StatementImpl and then call enableStreamingResults(). That will, at least, make the behavior of your code more obvious.

At work I needed to cache a lot of data from a couple of tables. Using the default behavior caused the heap to grow to over 12.5 GB. That made for trouble when running on my 8 GB laptop. By switching to streaming the ResultSet, the heap maxed out at only 5 GB.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.