MySQL :: Re: are index scans possible with bitwise comparison

New Topic

Re: are index scans possible with bitwise comparison

Posted by: Jay Pipes
Date: July 22, 2005 12:59PM

Gary Byatt wrote:
> If I add 3 further conditions:
> 1) I fix the number of bits, I currently have 7

see below

> 2) Speed is more important than space

In reality, when speaking of indexes they are one and the same; the smaller the storage space requirements for each record and the smaller the number of records contained by the index, the smaller the storage requirements will be, and thus, the more records can be read into a single index page/block at a time, saving I/O costs. Therefore, they are one and the same when talking of index optimization.

> 3) There is not a 50:50 distribution in the 1 / 0
> values and I know for each of the bits which will
> have the greater probability 1 or 0.

With the solution I outline below, it won't matter what distribution you have. No 0's will be stored at all; only records with the bit turned on will be stored... The distribution we will thus have to be concerned about is the distribution of the primary key, which in this case will be unique, which is excellent.

> Q 1) Then would it be better to simply make the 7
> bits separate indexed TINYINT(1) fields?
> Indexing binary values seems inherently
> inefficient, however so does joining tables.

No, absolutely not. Joining tables is *not* inefficient if done correctly, while having 7 TINYINT fields is a) not proper normalization, b) will not be as flexible, and c) won't be effectively indexed. This last reason is important; it would be silly to set up indexes on 7 TINYINT fields with a cardinality of 2. The index would be pointless, as the optimizer would choose to ignore it on most queries, as the number of found rows would be higher than the usual ~30% threshold that MySQL chooses to instead use an index or table scan.

> Q 2) Does the BIT type, mentioned very little in
> the 4.1 docs as available from 5.05 for InnoDB,
> give any advantage for the type of solution
> proposed in ‘Q 1’?

Off the top of my head, I can't remember the exact version in which MySQL BIT data type is optimized for storing <=8 bits in a single byte, similar to the way MS SQL does, versus the prior versions where BIT was simply an alias for TINYINT... In either case, unless you have an extremely large number of records (> 1B), I don't think you'd see a performance impact. A standard MyISAM non-clustered b-tree index organization (and the b+-tree secondary indexes in InnoDB) will both find records based on a unique key with good distribution in 2 or less disk reads, even for up to a billion records, because the branching factor is around 1024 (a developer can correct me if I'm wrong of course). This means that you can retrieve a single record in the tree with only 2 disk/memory reads for up to 1024^3 ( ~1.07 Billion entries).

So, I would recommend a solution such as this (change the bland field names, of course):

CREATE TABLE main_record (
record_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
/* some more fields... */
);

CREATE TABLE record_flag (
flag_id TINYINT UNSIGNED NOT NULL
, description VARCHAR(100) NOT NULL
, PRIMARY KEY (flag_id)
);

CREATE TABLE main_record_flag (
main_record INT UNSIGNED NOT NULL
, record_flag TINYINT UNSIGNED NOT NULL
, PRIMARY KEY (main_record, record_flag)
, UNIQUE INDEX reverse_pk (record_flag, main_record)
);

The "dual indexing" on the main_record_flag table enables superfast equality and range lookups, regardless of the join or where conditions.

For example, to find all the flags set for record # 34567:

SELECT f.flag_id, f.description
FROM record_flag f
INNER JOIN main_record_flag mf
ON f.flag_id = mf.record_flag
WHERE mf.main_record = 34567;

to get all the flags *NOT* set for the same record (note the move of the WHERE condition in the prior example to the ON clause below. This is necessary to force the filter *before* the join occurs, as opposed to after):

SELECT f.flag_id, f.description
FROM record_flag f
LEFT JOIN main_record_flag mf
ON f.flag_id = mf.record_flag
AND mf.main_record = 34567
WHERE mf.main_record IS NULL;

Both of the above examples will utilize the PRIMARY KEY index on main_record_flag, with the left-side of the index (main_record) being used by the index lookup operations.

Now, to get all records where the flag #5 is set (turned on):

SELECT mf.main_record
FROM main_record_flag mf
WHERE mf.record_flag = 5;

pretty simple. This will use the reverse_pk index, since the left side of that index (record_flag) is used in the WHERE filter.

Along those same lines, what about getting all the records which have *both* the 5th and 3rd flags turned on:

SELECT mf1.main_record
FROM main_record_flag mf1
INNER JOIN main_record_flag mf2
ON mf1.main_record = mf2.main_record
WHERE mf1.record_flag = 3
AND mf2.record_flag = 5;

Here, a simple self join accomplishes the AND condition. This query will *ALWAYS* be faster than the equivalent bitwise AND operation, because this query will cache the index rows needed, and perform eq_ref lookups based on both the PRIMARY KEY and reverse_pk indexes on main_record_flag. The EXPLAIN from the statement above will show that the optimizer will first narrow the mf1 resultset to a list of main_record ID values by using the reverse_pk index (because the filter is on the left-side of that index, the record_flag field). Next, this first resultset will then be intersected with the index records in the PRIMARY KEY index on master_record_field, using an eq_ref access strategy, paired with a const value in the right-hand side of the index lookup. (You can check this yourself if you'd like). This kind of query is the perfect example of how many-to-many relationships can be made more performant through the use of what I call a reverse primary key index.

Lastly, what if you wanted to get all the record IDs of records having the 3rd and 5th flag turned on, but having the 2nd flag turned *off*? Here's how:

SELECT mf1.main_record
FROM main_record_flag mf1
INNER JOIN main_record_flag mf2
ON mf1.main_record = mf2.main_record
LEFT JOIN main_record_flag mf3
ON mf2.main_record = mf3.main_record
AND mf3.record_flag = 2
WHERE mf1.record_flag = 3
AND mf2.record_flag = 5
AND mf3.master_record IS NULL;

Here, we combine the previous query, with a LEFT JOIN from it's final result to a copy of the PRIMARY KEY index records already cached to find those not containing any records with the record_flag of 2.

Hope this demonstrates the solution qell enough. Let me know if I can explain any of the concepts in a different way for you.

Cheers,

Jay Pipes
Community Relations Manager, North America, MySQL Inc.

Got Cluster? http://www.mysql.com/cluster
Personal: http://jpipes.com

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Views

Written By

Posted

are index scans possible with bitwise comparison

9660

Gary Byatt

July 21, 2005 12:38AM

Re: are index scans possible with bitwise comparison

6086

Jay Pipes

July 21, 2005 07:14AM

Re: are index scans possible with bitwise comparison

3383

Gary Byatt

July 22, 2005 02:23AM

Re: are index scans possible with bitwise comparison

4710

Jay Pipes

July 22, 2005 12:59PM

Re: are index scans possible with bitwise comparison

2758

Gary Byatt

July 25, 2005 08:00AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.