Habib Ayob wrote:
> Hi
>
> I have a problem with my stored procedures because
> they're way to slow. The database has a Data
> Warehouse Star Schema design. 1 fact table related
> to 7 dimensions or parent tables. The parent
> tables all have a similar design which is a PK
> field and one or two other fields which will be
> unique. For each of the tables I have a stored
> procedure which inserts a record if it doesn't
> already exist and returns the auto generated id.
>
> The function which I use looks like this:
>
> DELIMITER $$
>
> DROP FUNCTION IF EXISTS
> `w3caccdm`.`sp_protocol_method_insert` $$
> CREATE FUNCTION
> `sp_protocol_method_insert`(pprotocol
> VARCHAR(100), pmethod VARCHAR(100)) RETURNS
> int(11)
> BEGIN
> DECLARE result INT(11);
> SELECT id INTO result FROM
> protocol_method_dimension WHERE protocol =
> pprotocol AND request_method = pmethod;
> IF result IS NULL THEN
> INSERT INTO
> protocol_method_dimension(protocol,
> request_method) VALUES (pprotocol, pmethod);
> SELECT id INTO result FROM
> protocol_method_dimension WHERE protocol =
> pprotocol AND request_method = pmethod;
> END IF;
> RETURN result;
> END $$
>
> DELIMITER ;
>
> I have replaced the 2nd SELECT statement with
>
> SET result = last_insert_id();
>
> The fact table stored procedure takes all the
> values including eg. protocol and method, and
> inserts it into the relevant table by using the
> relevant stored function and uses the returned
> keys to insert a record into the fact table.
>
> In 1.5 hours it didn't finish processing
> approximately 300000 records. With Batch
> processing the same amount of data will take less
> than 10 minutes.
>
> Any help or changes to the design welcome.
>
> Habib
Habib,
Is all you're trying to do with this stored function is to get the last ID of an inserted row (if it inserts)?
If that's the case, you'd be better served by using INSERT IGNORE (with a unique index on protocol, request_method) and then using Statement.getGeneratedKeys() in Java, which should reduce your round-trips quite a bit. (see
http://dev.mysql.com/doc/refman/5.0/en/insert.html and
http://dev.mysql.com/doc/refman/5.0/en/cj-retrieve-autoinc.html)
If you want to still use your function, you could of course batch it up directly by duplicating the function call over and over again in the SELECT.
The thing you have to remember is that _every_ network round-trip takes time, so you're never going to beat "batch" processing using something like "LOAD DATA INFILE" that you clean up after the load with thousands of calls to Statement.executeUpdate() that takes a few milliseconds for every insert because of going over the network.
-Mark
Mark Matthews
Consulting Member Technical Staff - MySQL Enterprise Tools
Oracle
http://www.mysql.com/products/enterprise/monitor.html