MySQL Forums
Forum List  »  Performance

Re: Trying to Compete with MS SQL but no luck
Posted by: Jay Pipes
Date: July 03, 2005 09:07AM

My best advice would be: normalize the database. Storing text like this (which represents a number of flags/bits) will perform poorly as you get into the hundreds of thousands of records. Why? Because, even if you have an index on a VARCHAR(500) field, you're not storing the information in as efficient a manner as you could. A SUBSTRING() type function would be inefficient, since you aren't going to be able to use an index on the field (because you cannot isolate the field on one side of the equation. And, you're sorting through a massive amount of data that is really pointless if you're only interested in a very small portion of it.

A more efficient mechanism would be to have a one-to-many relationship defined from the main table to a detail table, with records only stored for "on" bits. Also, unless you really plan on having more than 4 *billion* records, don't use BIGINT. Use INT. You're using twice the space to store the same amount of data, since the high bytes are all redundant.

So, my advice would be to have a schema like:

CREATE TABLE master_record (
master_id INT UNSIGNED NOT NULL
, PRIMARY KEY (master_id)
);

CREATE TABLE flag_type (
flag_position SMALLINT UNSIGNED NOT NULL
, description VARCHAR(100) NOT NULL
, PRIMARY KEY (flag_position)
);

CREATE TABLE master_flag (
master_id INT UNSIGNED NOT NULL
, flag_position SMALLINT UNSIGNED NOT NULL
, PRIMARY KEY (master_id, flag_position)
);

The flag_type table is simply a lookup table which you would use to give descriptive names for the various flags represented in the strings you're currenlty using.

You would have a *very* simple script that inserts records into the master_flag table by looping through the 500-character string array and inserting a record into for the master_id at each of the string positions in which a 1 was found...

For example, in PHP we'd do something like:

<?php

$master_id = 827736;
$flag_string = '100010011101001'; // obviously, longer than this...

$len_string=strlen($flag_string);
for ($i=0;$i<$len_string;++$i) {

$flag_on = (bool) $flag_string{$i};

if ($flag_on) {
$sql = "INSERT INTO master_flag (master_id, flag_position) VALUES (" . $master_id . "," . $i . ")";
// use you're favorite DB layer to execute the SQL string...
}

}
?>

Once the data is loaded, you can find which flags are on for master record ID=827736 using:

SELECT ft.flag_position, ft.description
FROM flag_type ft
INNER JOIN master_flag mf
ON ft.flag_position = mf.flag_position
WHERE mf.master_id = 827736;

You can likewise find the flags that are OFF by using:

SELECT ft.flag_position, ft.description
FROM flag_type ft
LEFT JOIN master_flag mf
ON ft.flag_position = mf.flag_position
AND mf.master_id = 827736
WHERE mf.flag_position IS NULL;

Both these queries should run in milliseconds with millions of records. In both cases, the index on master_flag (master_id, flag_position) will be used to query for the records. Using standard b-tree indexing in MyISAM, MySQL shouldn't have to even look at the .MYD file, as all the data will be in the index leaves. So, when running an EXPLAIN on either of the above SQL statements, you will see in the Extra column: "Using where, Using index", with the ref column containing (const, ref).

Total index record size in this case would be 6-bytes (INT + SMALLINT) + 4-byte record pointer for a total of 10 bytes. Compare that to an "index" (which wouldn't relly be used of a VARCHAR(500) and BIGINT identifier with a total of 512 bytes per row (500 + 8 + 4). More index records will fit on a single 1024 byte MyISAM index block, (only 1 would fit for the VARCHAR index), meaning more records can be compared in a single pass.

Jay Pipes
Community Relations Manager, North America, MySQL Inc.

Got Cluster? http://www.mysql.com/cluster
Personal: http://jpipes.com

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Trying to Compete with MS SQL but no luck
1813
July 03, 2005 09:07AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.