MySQL Forums
Forum List  »  General

Re: Optimised way to search over 2 milllion poi data in mysql
Posted by: Rick James
Date: September 12, 2014 11:20AM

DECLARE _deg2rad DOUBLE DEFAULT PI()/1800000; -- For scaled by 1e4 to MEDIUMINT
SET @deg2dist := 0.0069172, -- 69.172 for miles; 111.325 for km *** (mi vs km)
@start_deg := 15 / @deg2dist, -- Start with this radius first (15 miles)
@dlat := @start_deg,
@lon2lat := COS(_deg2rad * @my_lat),
SET @dlon := ABS(@dlat / @lon2lat);

That assumes a 15-mile starting square is good. You may need to adjust that. The "@dlat" adjusts that to work in degrees instead of miles. The equivalent "@dlon" is more complex since longitude lines are closer together as you go toward the poles. This is handled by dividing by the cosine of the starting latitude (@my_lat).

The SELECT you are quoting is looking at a "square" around the starting place. Its performance depends on PARTITIONing, InnoDB clustering, and the PRIMARY KEY.

The LOOP enlarges the square until it finds the desired number of items.

Then the final SELECT does the proper GCD ordering, plus the LIMIT.

(I made a few changes to the document today. I further discuss 1 versus 2 tables, plus talk some about benchmarking.)

What to set the 15-mile value to??
First, if you prefer kilometers, change
" @deg2dist := 0.0069172, -- 69.172 for miles; 111.325 for km *** (mi vs km)"
to
" @deg2dist := 0.0111325, -- 69.172 for miles; 111.325 for km *** (mi vs km)"
Then change the 15 in
"@start_deg := 15 / @deg2dist, -- Start with this radius first (15 miles)"
If there are likely to be a lot of entries closer than 15, then shrink it. If the items are scattered a lot, then a bigger number might be better.

15 miles is reasonable for cities in the world; 1 km might be more reasonable for businesses. 0.2 might be good for homes.

Performance suffers some when @start_deg is too small or too big. It's hard to say what the best value is.

Your sample data looks like individual houses or businesses? I suggest you start with 1 (mi or km). After returning from the CALL of the Stored Procedure, you can do
SELECT @iterations;
to see how often the LOOP was traversed.

@iterations usually 1: Consider decreasing @start_deg.
@iterations rarely 1: Consider increasing @start_deg.
@iterations fluctuates with various tests: @start_deg is possibly set to a good value.

Why @start_deg impacts performance:
If @start_deg is too small, the LOOP will iterate more, leading to more SELECTs.
If @start_deg is too large, the first iteration gets more rows than is needed, possibly hitting more PARTITIONs than needed, and the final SELECT calls GCDist more times.

How much does it impact performance? Probably not more than 2x, unless @start_deg is really far off.

Options: ReplyQuote


Subject
Written By
Posted
Re: Optimised way to search over 2 milllion poi data in mysql
September 12, 2014 11:20AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.