Re: one table with self joins VS. several tables
Posted by:
Joe Celko
Date: January 03, 2007 12:33PM
>> I am designing a database to store biological names which are arranged hierarchically (family >genus>species etc). <<
Do you have a copy of TREES & HIERARCHIES IN SQL yet?
>> My question: Will I see a performance improvement if I switch to using a single table for all the levels of the hierarchy with self-joins? <<
No. What ou want is one table for the hierarchy, one for the proper names and one for aliases
>> i.e. Should I use a single table with the following fields: NameID(primary), Name, ParentID, ValidID with the ParentId and ValidID self-joined to the correct NameID <<
Ther is no such thing as a "name_id" in a correct data model; an attribute is a name or an identifier and NEVER both by definition. You are attempting an adjacency list model; go with a nested sets model instead. Google Nested sets to find out how (lft, rgt) pairs work -- or buy my book :)
CREATE TABLE Taxonomy
(animal_name VARCHAR(25) NOT NULL
REFERENCES Animals(animal_name),
lft INTEGER NOT NULL,
rgt INTEGER NOT NULL,
bio_category CHAR(10) NOT NULL
CHECK (bio_category IN ('family', 'genus', 'species', etc)),
etc.);
CREATE TABLE Animals
(animal_name VARCHAR(25) NOT NULL PRIMARY KEY,
etc.);
CREATE TABLE JuniorNames
(animal_name VARCHAR(25) NOT NULL
REFERENCES Animals (animal_name),
junior_name VARCHAR(25) NOT NULL,
PRIMARY KEY (animal_name, junior_name),
etc.);