MySQL Forums
Forum List  »  Performance

Re: Slow JOIN to convert IP to Country Code
Posted by: Rick James
Date: May 30, 2009 10:47AM

EXPLAIN formatting... How are you getting the tab-delimited output? Can you get to the commandline (mysql)? If so EXPLAIN ...; or EXPLAIN ...\G and not redirecting the output (with the [ code ] tags).

Note that the EXPLAIN says "Using index". So they query is faster? (But possibly not 2x faster.)

Your example time range is exactly 1 day + 1 second (BETWEEN includes both ends), and not on a minute boundaries. Is this typical? What if you could get any range of minutes 10x faster? Would that do? And if it would not suffice, we can arrange for the extra seconds on either end to be handled (via UNION) without too much extra cost.

I am suggesting a summary table, by the minute that is something like this:
CREATE TABLE ip_minute (
    minut TIMESTAMP NOT NULL COMMENT 'truncated to a minute',
    ip_interval  MEDIUMINT UNSIGNED  COMMENT 'IP, right shifted 8',
    code char(2) NOT NULL  COMMENT 'for group by',
    ct SMALLINT UNSIGNED  COMMENT 'COUNT(stat.ip_interval)',
    PRIMARY KEY (minut, ip_interval),
);
And you would build the last minute's worth of stuff every minute. (This assumes the data comes in in a timely manner. If it might be delayed significantly, we need to modify that table definition and take extra steps.) The initial setup would be a lengthly SELECT ... GROUP BY. The incremental updates will be fast.

With that summary table, it is still probably better to join to the other table to get the IP name. But that is something to test out. Also worth experimenting with is a subquery -- do the SUM(ct), in the subquery, then get the name in the outer query. I lean toward this as the 'best'. Something like:
SELECT 
        i.code, i.name,
        SUM(m.sum_ct) AS count
    FROM ( SELECT
            code,
            SUM(ct) AS sum_ct
         FROM ip_minute
         WHERE minut >= DATE_SUB(?, INTERVAL 24 HOUR)
           AND minut <  ?
         GROUP BY code
         ) AS m
    JOIN iplist AS i  ON m.code = i.code
    ORDER BY count DESC
Yuck -- grouping on code. Probably ought to pull code into the summary table. I am getting thoroughly lost in code vs ip_interval vs name -- which of these are 1-1? I am made some assumptions above.

I think the subquery above is important -- it concentrates on the lengthy inner table, consolidates the data into fewer rows, then reaches into the other table only that many times.

Once the data gets bigger than cache, it becomes important to shrink everything you can. I see some INTs -- can they be MEDIUMINT or smaller?

At this point
KEY `dateline` (`dateline`),
can be removed since these cover it:
KEY `ip_interval` (`dateline`,`ip_interval`),
KEY `location_id` (`dateline`,`location_id`)

Options: ReplyQuote




Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.