Basically, I have multiple hashes associated with an entity:
hash1: Entity 1
hash2: Entity 1
...
hash6: Entity 2
hash7: Entity 2
Consequently, I'm attempting to create a hashes table that looks something like this:
hash (unique primary key), entity_id (auto increments with the *group*)
For example, if I inserted the above example information, I'd like it to be reflected in the hashes table like so:
hash | entity_id
-------|------------
hash1 | 1
hash2 | 1
hash6 | 2
hash7 | 2
hash8 | 2
hash13 | 3
hash14 | 3
From there, things get a bit more tricky. Say I want to add some new hashes, all for the same entity but I don't know if its in the tables or not. To decide that, I use the following logic:
[ol]
[li]If none of the new hashes exist in the hashes table, I want to create a new entity_id for that group and use that id for these hashes and in subsequent insert statements.[/li]
[li]If, however, any of the new hashes already exist in the hashes table, I want to merge all of the entity_id's into a new value in the hashes table and any other table that has rows referencing that id. Finally, I want to add any additional data using the new entity_id.[/li]
[/ol]
Here's a quick test case. Say I have these hashes to add for a single entity:
hash1 | ?
hash6 | ?
hash17 | ?
hash18 | ?
The result, after inserting them, should be:
hash | entity_id
-------|------------
hash13 | 3
hash14 | 3
hash2 | 4 -- entity_id modified 1->4, hash2 appeared in the new hashes
hash6 | 4 -- entity_id modified 2->4, hash6 appeared in the new hashes
hash1 | 4 -- entity_id modified, (due to hash2)
hash7 | 4 -- entity_id modified, (due to hash6)
hash8 | 4 -- entity_id modified, (due to hash6)
hash17 | 4 -- added, new hash
hash18 | 4 -- added, new hash
Does this design make any sense? Part of me thinks I'm going to have to do a portion of this with PHP but I am trying to make this as efficient/fast as possible (I will be handling a lot of data with whatever method is chosen). As of now, my workflow is something like this:
*Select all entity_id`s where the hash equals any of the existing hashes
*Select the largest entity_id from the `hash` table
*If no entity_id's exist from step 1, insert appropriate rows with an entity_id of max(entity_id) + 1
*If one or more entity_id exists, change all rows with those entity_id's to an entity_id of max(entity_id) + 1
*Perform steps 3 and 4 on any other tables that reference the entity_id, writing most of the logic in PHP
As you can see, my method neglects MySQL almost entirely. I have a feeling I can use `ON DUPLICATE KEY` effectively here, and I'm wondering if there's a way to skip step 5 entirely (as in other tables automatically update to reference the changes, possibly through a trigger).