minimize latency to first row when scrolling huge result set?
Posted by: Nikita Tovstoles
Date: November 12, 2012 05:24PM

IN SHORT

given that:
select id from table limit 1; //table has 100m+ rows; returns in <1s
select id from table limit 1 offset 10,000,000; //returns in < 11s

why does even the 1st row of "select id from table;" query takes minutes to be made available to a client (including when using result set scrolling in JDBC) and what can I do to reduce '1st row sent to client' latency?

IN DETAIL

I need read the contents of a large partitioned INNODB table (100m+ rows) and do some processing on each row (on the client). the order in which rows are returned is unimportant; the velocity with which rows are delivered is relatively unimportant, but I would like to minimize latency with which the first row is set to the client. Ie I'd like 'writing to net' to occur as quickly as possible, so that the client can read some rows and start doing processing.

-using INNODB, mysql 5.5 on a 1.7gb 'small' EC2 instance
-using read-only transaction - tried both READ_COMMITTED and UNCOMMITTED tx iso levels
-using mysql JDBC (5.1.16)-based client

I initially tried using client-side paging (using limit and offset) but saw that response latency grew prohibitevely with offset:

select id from product_views limit 1 offset X;

offset = 100k resulted in <1s response latency;
offset = 10m -> 11s latency;
offset = 40m -> 43s latency;

offset = 100m -> latency in minutes;
offset = 100,000,000 -> 3 min latency (to first row being delivered to client)

Plus, using client-side paging means having to sort result set on the db server, so client-side paging seems like a non-starter.

Then I tried using forward / read-only result set scrolling (code below), and still the client does not get the first row until over 10 minutes after the query is issued. The server is sitting in 'Sending data' state. Is the server building up the entire result set before sending out even the 1st row? If so can I tell it to start streaming results as they become available? Given that 'limit 1' query returns right away, it seems the server can quickly return some rows while compositing the rest of the result set.

thank you,

-nikita

mysql> show create table product_views\G
*************************** 1. row ***************************
Table: product_views
Create Table: CREATE TABLE `product_views` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`dateModified` datetime DEFAULT NULL,
`hibernateVersion` bigint(20) DEFAULT NULL,
`timestamp` datetime NOT NULL,
KEY `timestamp_idx` (`timestamp`)
) ENGINE=InnoDB AUTO_INCREMENT=577265777 DEFAULT CHARSET=latin1
/*!50500 PARTITION BY RANGE COLUMNS(`timestamp`)
(PARTITION p0 VALUES LESS THAN ('2012-05-01') ENGINE = InnoDB,
PARTITION p1 VALUES LESS THAN ('2012-06-01') ENGINE = InnoDB,
PARTITION p2 VALUES LESS THAN ('2012-07-01' etc)


@Test(timeout = 1000 * 60 * 10)
public void testScrollTimeToFirstRow() throws Exception
{
Connection conn = null;
Statement stmt = null;
ResultSet rs = null;
try
{
conn = dataSource.getConnection();
conn.setReadOnly(true);
conn.setClientInfo("query_cache_type", "off");
conn.setTransactionIsolation(Connection.TRANSACTION_READ_UNCOMMITTED);
String query = "select sql_no_cache p.id, p.timestamp from product_views p;"; // limit 1 offset 10000000";
stmt = conn.createStatement(ResultSet.TYPE_FORWARD_ONLY, ResultSet.CONCUR_READ_ONLY);
long now = System.currentTimeMillis();
rs = stmt.executeQuery(query);
// extract data from the ResultSet scroll from top
while (rs.next())
{
Long id = rs.getLong(1);
Date timestamp = rs.getDate(2);

//just want to capture latency to 1st row
System.out.println("first row in " + ((System
.currentTimeMillis() - now) / 1000) + "s; id=" + id + " " +
"timestamp=" + timestamp);
break;
}
} finally
{
// release database resources
if (rs != null) rs.close();
if (stmt != null) stmt.close();
if (conn != null) conn.close();
}
}

Options: ReplyQuote


Subject
Written By
Posted
minimize latency to first row when scrolling huge result set?
November 12, 2012 05:24PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.