Re: Optimised way to search over 2 milllion poi data in mysql
Posted by:
Rick James
Date: September 12, 2014 11:20AM
DECLARE _deg2rad DOUBLE DEFAULT PI()/1800000; -- For scaled by 1e4 to MEDIUMINT
SET @deg2dist := 0.0069172, -- 69.172 for miles; 111.325 for km *** (mi vs km)
@start_deg := 15 / @deg2dist, -- Start with this radius first (15 miles)
@dlat := @start_deg,
@lon2lat := COS(_deg2rad * @my_lat),
SET @dlon := ABS(@dlat / @lon2lat);
That assumes a 15-mile starting square is good. You may need to adjust that. The "@dlat" adjusts that to work in degrees instead of miles. The equivalent "@dlon" is more complex since longitude lines are closer together as you go toward the poles. This is handled by dividing by the cosine of the starting latitude (@my_lat).
The SELECT you are quoting is looking at a "square" around the starting place. Its performance depends on PARTITIONing, InnoDB clustering, and the PRIMARY KEY.
The LOOP enlarges the square until it finds the desired number of items.
Then the final SELECT does the proper GCD ordering, plus the LIMIT.
(I made a few changes to the document today. I further discuss 1 versus 2 tables, plus talk some about benchmarking.)
What to set the 15-mile value to??
First, if you prefer kilometers, change
" @deg2dist := 0.0069172, -- 69.172 for miles; 111.325 for km *** (mi vs km)"
to
" @deg2dist := 0.0111325, -- 69.172 for miles; 111.325 for km *** (mi vs km)"
Then change the 15 in
"@start_deg := 15 / @deg2dist, -- Start with this radius first (15 miles)"
If there are likely to be a lot of entries closer than 15, then shrink it. If the items are scattered a lot, then a bigger number might be better.
15 miles is reasonable for cities in the world; 1 km might be more reasonable for businesses. 0.2 might be good for homes.
Performance suffers some when @start_deg is too small or too big. It's hard to say what the best value is.
Your sample data looks like individual houses or businesses? I suggest you start with 1 (mi or km). After returning from the CALL of the Stored Procedure, you can do
SELECT @iterations;
to see how often the LOOP was traversed.
@iterations usually 1: Consider decreasing @start_deg.
@iterations rarely 1: Consider increasing @start_deg.
@iterations fluctuates with various tests: @start_deg is possibly set to a good value.
Why @start_deg impacts performance:
If @start_deg is too small, the LOOP will iterate more, leading to more SELECTs.
If @start_deg is too large, the first iteration gets more rows than is needed, possibly hitting more PARTITIONs than needed, and the final SELECT calls GCDist more times.
How much does it impact performance? Probably not more than 2x, unless @start_deg is really far off.
Subject
Written By
Posted
September 01, 2014 11:42AM
September 01, 2014 08:04PM
September 01, 2014 09:40PM
September 02, 2014 12:11AM
September 02, 2014 02:11AM
September 02, 2014 07:22PM
September 04, 2014 04:48PM
September 04, 2014 08:58PM
September 06, 2014 10:18PM
September 07, 2014 12:25AM
September 07, 2014 04:09AM
September 07, 2014 10:40AM
September 10, 2014 02:36AM
September 10, 2014 03:42PM
September 12, 2014 04:41AM
Re: Optimised way to search over 2 milllion poi data in mysql
September 12, 2014 11:20AM
September 12, 2014 02:21PM
September 16, 2014 10:15PM
September 16, 2014 11:34PM