MySQL Forums
Forum List  »  General

Re: Optimised way to search over 2 milllion poi data in mysql
Posted by: Rick James
Date: September 07, 2014 12:25AM

> So I just maintain this is it ADD PRIMARY KEY (`poiID`), ADD KEY `lat` (`lat`,`long`)?

No, see the details.

> Why would this two be not helpful ADD KEY `lat_2` (`lat`), ADD KEY `long` (`long`); because we search them where lat = ? and long = ?

MySQL almost never uses two indexes in a single query. Let's say you have 2M records, and there are about 40K in each latitude and about 40K in each longitude. Now, let's say you are looking in a 4-degree range. That would be 160K entries from the lat index and another 160K entries from the long index. The goal is to 'merge' those two indexes to discover the few hundred that match both the lat range and the long range. This is rather impractical, so it is not done (usually).

MySQL will use both parts of a "compound index" (INDEX(a,b)) in some cases. However, for INDEX(lat, long), it won't get past lat, since the WHERE clause needs a 'range' on each field.

> So you mean to say based on my table size of 2 Million data it will take up 200Mb of space.

Based on the CREATE TABLE you provided, I estimated (not knowing how long each VARCHAR would actually be) that the typical record might take about 100 bytes. 2M rows time 100 bytes = 200MB. Similarly I estimated the index sizes, after allowing some for padding and overhead in a BTree structure.

> But the problem is that we have many lat long to be match to find nearest poi.

Yes; you have 2M to test, if you have no index. (full table scan)
You have 160K to test a 4 degree range of latitude. (range scan in the index)
So, you need some trick to cut that down even further.

> About your two table I think I get it now.
> The first table you just purely store lat and long nothing else.

Close... It also need an ID to JOIN with the other table. And if you need any filtering, it needs to be in the first table. (In my example of "finding the 10 nearest pizza parlors", the presumption was that the dataset was of Restaurants or Businesses, and the query needed to restrict it to just Pizza parlors.)

> The second one you store exactly same with first but plus location name and other details am I right?

Well, the lat and lng are not needed in the second table.

The idea is that the hard work is done on the first table -- doing the lat/long computations, searching again of no luck within one degree, etc, plus filtering down to Pizza parlors. When finished with the first table, you have the id(s) for exactly the items. Then reach into the second table for the rest of the info. If you are looking for the 10 nearest Pizza parlors, the code may have to rummage through 40 rows in the first table. But only 10 lookups will be performed on the second table.

The lookups in the first table will be mostly "clustered" (as a side effect of the choice of PARTITIONing and PRIMARY KEY). This means that the 40 rows fetched will come from only a few blocks, not 40. The second table is a different matter, since the rows are likely to be scattered around the table (since they were probably added over time, or are alphabetical, or whatever), so that could well take 10 blocks to load the 10 results. This is the argument saying that, even for huge datasets, the I/O necessary will be "small".

Options: ReplyQuote


Subject
Written By
Posted
Re: Optimised way to search over 2 milllion poi data in mysql
September 07, 2014 12:25AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.