MySQL Forums
Forum List  »  Performance

Re: Performance of ON DUPLICATE KEY and other options?
Posted by: Jay Pipes
Date: September 23, 2005 03:33PM

Mark Modrall wrote:
> Jay Pipes wrote:
> > I'm not sure I understand you. What is the
> > purpose of updating the inputTable at all?
> >
>
> Well, all tables start empty when you create them.
> In the process of filling up a table like
> inputTable, where there may be multiple
> occurrences of the data you're putting in that you
> don't want duplicated (where you want a unique
> identifier to stand in for the value in normalized
> tables), you have to have some way of knowing if
> the value you want to put in is new or a duplicate
> of a value loaded prior.

I'm fully aware of this.

> When you're running through your raw input, you
> don't know if the value's already in there. You
> could try
> SELECT qid from inputTable where input='?in'
> first and then do an INSERT if you get no
> response.

That was exactly my point. There's no reason to go any further or try an "alternate technique" in this situation.

> Or you could use the ON DUPLICATE KEY UPDATE
> qid=LAST_INSERT_ID(qid) mechanism on your INSERT
> to try both with one statement. My question was
> whether the former approach performed better than
> ON DUPLICATE KEY UPDATE.

See above.

> If the above is long winded and/or missing your
> point, the other thing I think you might mean is
> that there are/were other columns in inputTable
> that also track other aspects about the rows in
> inputTable. I left those out to simplify the
> example and discussion. In my case those things
> involve booleans indicating whether the input was
> as is or changed by a normalization process. I
> can also imagine a case where inputTable might
> have an integer column counting the number of
> different users who used the same value, etc.
>
> So to flesh out the original example with the
> other boolean columns, the non-ON DUP logic would
> look
> SELECT qid FROM inputTable where input='?in'
> if (qid not null)
> UPDATE inputTable SET rawInput=true where
> qid=?qid
> else
> INSERT INTO inputTable (input, rawInput) VALUES
> (?in, ?raw);

I understand you now. OK, here's my best advice for logging search terms and the associated items/stats for searches:

a) Have only a single table, not 2 tables.
b) Structure table with an MD% hash of the text term, and index off that, not the main text. This willl be *much* faster for lookups.
c) Do a quick search on the hashed index before updating search entry stats, otherwise INSERT new search entry record.

Whole process might move something like this (in pseudo-code/stored procedure):

CREATE TABLE search_log (
log_id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
, entry_hash CHAR(32) NOT NULL
, entry_text TEXT NOT NULL
, times_used INT UNSIGNED NOT NULL
, last_searched DATETIME NOT NULL
, UNIQUE INDEX (entry_hash)
);

DELIMITER ///
CREATE PROCEDURE insert_search_entry (IN search_text TEXT)
BEGIN
DECLARE entry_id INT;
SELECT @entry_id:=log_id FROM search_log WHERE entry_hash = MD5(search_text);
IF entry_id IS NOT NULL THEN
UPDATE search_log SET times_used = times_used + 1, last_searched = NOW() WHERE log_id = entry_id;
ELSE
INSERT INTO search_log (entry_hash, entry_text, times_used, last_searched)
VALUES (MD5(search_text), search_text, 0, NOW());
END IF;
END ///

This way, you've got the best of both worlds. You've got unique text entries, and fast lookups, combioned with search stats.

HTH,

Jay Pipes
Community Relations Manager, North America, MySQL Inc.

Got Cluster? http://www.mysql.com/cluster
Personal: http://jpipes.com

Options: ReplyQuote


Subject
Views
Written By
Posted
Re: Performance of ON DUPLICATE KEY and other options?
1353
September 23, 2005 03:33PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.