Re: What's the worst that can happen with below database design? Immediately or in the future?
> i think it decreases search performance a lot, and all data fetches requires json_decode in php.
Either MySQL parses the fields its way, or json_decode parses the fields. I doubt if there is much performance difference. The #1 MySQL performance cost is I/O. #2 is locating rows in the cache after the I/O has put them into cache. I would put bursting the row to be about #3.
SELECT mt.*
FROM main_table mt
LEFT JOIN attribute_table1 at1 ON at1.primary_key = mt.primary_key
LEFT JOIN attribute_table2 at2 ON at2.primary_key = mt.primary_key
WHERE mt.field1 LIKE '%Bamboozle%'
AND at1.field1 = '501835'
AND at2.field5 <> 'NOTTHIS'
That has to be performed as follows:
1. Scan the entire mt, filtering on field1. (A leading wildcard on the LIKE prevents use of any index.) For each successful row, do:
2. Do point-query via primary_key into at1. If 501835 is not found, remove the row from consideration.
3. Similarly for at2.
Total rows scanned: The number of rows in mt (you mentioned 10K?), plus a (hopefully) smaller number of rows in at1 and at2. If mt cannot be cached, then the query will be I/O-bound even after the first time.
Since they are _LEFT_ JOINs (presumably deliberately), the optimizer must start with mt, in spite of the likelihood that at1.field1 = '501835' would be very selective.
The alternative is to understand the application enough to realize that
1. att.field1 is not optional
2. that it is likely to be used for lookups
3. move it into the main table so that an index can be used.
Now, it looks more like:
SELECT mt.*
FROM main_table mt
LEFT JOIN attribute_table2 at2 ON at2.primary_key = mt.primary_key
WHERE mt.field1 LIKE '%Bamboozle%'
AND mt.account_number = '501835'
AND at2.field5 <> 'NOTTHIS'
-- with this on mt:
INDEX(account_number)
Now the query
1. hits the new index to locate one (or a small number of rows) with 501835,
2. double checks for '%Bamboozle%'
3. moves on to at2.
This adds up to a few (or few dozen) rows, not 10K+. It is unlikely to be I/O-bound, even for a huge dataset.
Then... Taking the JSON step would be to put mt.field1 and at2.field5 into the JSON column in mt:
CREATE TABLE main_table (
id ...,
account_number ...,
json TEXT NOT NULL,
PRIMARY KEY(id),
INDEX(account_number)
);
-- and the query becomes
SELECT mt.*
FROM main_table mt
WHERE mt.account_number = '501835';
-- plus code in PHP to filter for Bamboozle and not NOT THIS.
This is even less effort on the part of MySQL (since there is no JOIN). But it is more effort in PHP -- it's either a lot of PHP code to construct all the LEFT JOINs, or a lot of PHP code to do the filtering of secondary search criteria.