Re: Does a normalized design lead to complex queries?
Posted by: phlype johnson
Date: October 19, 2006 06:32AM

To illustrate better my question on whether normalized designs lead to more complex queries yes or no, I have prepared an example. The example is a database with the following tables:
*table person with fields:
-persid: autoincrement id
-name: name of the person
*table material with fields:
-materialid: autoincrement id
-material: name of the material eg "wood"
*table color with fields:
-colorid: autoincrement id
-color: name of the color eg "green"
*table persmaterial with fields:
-persmatid: autoincrement id
-persid: link to table person
-materialid: link to table material
*table perscolor with fields:
-perscolorid: autoincrement id
-persid: link to table person
-colorid: link to table color
Using these tables it is straightforward to store the preference of a certain person for colors and materials.
Now use the statements at the end of the post to create the tables and populate them with intial values. Next we want to find all persons who like the colors red or blue and also like the color green. Furthermore the person should have a preference for iron as material. I understand there are several ways to obtain the requested result.
Option 1:
The initial query can be written in pseudocode as
Find all persons that (like as color (red OR blue) AND green) AND that (like as material iron). I do not see right away how to write this as a query with joins so I rewrite the pseudo query as follows:
Find all persons that ((like as color red AND green) AND that (like as material iron)) OR ((like as color blue AND green) AND that (like as material iron))
this leads to the following query
(SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE p.persid=pc.persid AND (pc.colorid=1 OR pc.colorid=2) AND p.persid=pm.persid AND pm.materialid=2 GROUP BY p.persid HAVING (count(DISTINCT pc.colorid)=2 AND count(DISTINCT pm.materialid)=1)) UNION
(SELECT p.persid FROM person p, perscolor pc, persmaterial pm WHERE p.persid=pc.persid AND (pc.colorid=2 OR pc.colorid=3) AND p.persid=pm.persid AND pm.materialid=2 GROUP BY p.persid HAVING (count(DISTINCT pc.colorid)=2 AND count(DISTINCT pm.materialid)=1))
You can say that for someone familiar to sql this is not overly complicated but the problem I see is the following. I want to use this database in a webapplication where a user will be able to find persons with certain preferences. The user will be allowed to play around with AND and ORs in his request. To state it differently the user will have the possibility to type in a pseudo query like the example I used above. As you have seen above, to get to the final sql query I need to do a transformation of the pseudo query to the sql query. Secondly, if we have many criteria you can see easily that the number of unions can grow exponentially if the user starts playing around with ANDs and ORs for a certain property.

Option 2:
The pseudo query
Find all persons that (like as color (red OR blue) AND green) AND that (like as material iron)
can straightforwardly be coded in a query using subqueries:
SELECT persid FROM person p WHERE
(EXISTS(SELECT * FROM perscolor pc WHERE pc.colorid=1 AND p.persid=pc.persid)
OR
EXISTS(SELECT * FROM perscolor pc WHERE pc.colorid=3 AND p.persid=pc.persid))
AND
EXISTS(SELECT * FROM perscolor pc WHERE pc.colorid=2 AND p.persid=pc.persid)
AND
EXISTS(SELECT * FROM persmaterial pm WHERE pm.materialid=2 AND p.persid=pm.persid)

Both options are quite different. I am no expert but this is my understanding of the pros and contras of the 2 options:
*option 1:
-difficult to automatically generate the query from the pseudo query
*option 2:
+easy to generate query from pseudo query
-are subqueries not inefficient? If for each subquery you have to go over the parameters outside the subquery, it means pretty much that for each subquery you go over the entire person table; I might be wrong and subqueries might be more optimized but I have no idea on this

Overall, I would like to know
*if there are other options to translate the above pseudo query into an sql query?
*what of the otpions (proposed + new ones) are best from a performance point of view?

-- phpMyAdmin SQL Dump
-- version 2.6.1
-- http://www.phpmyadmin.net
--
-- Host: localhost
-- Generation Time: Oct 19, 2006 at 01:13 PM
-- Server version: 4.1.9
-- PHP Version: 4.3.10
--
-- Database: `aston`
--

-- --------------------------------------------------------

--
-- Table structure for table `color`
--

CREATE TABLE `color` (
`colorid` int(11) NOT NULL auto_increment,
`color` varchar(30) NOT NULL default '',
PRIMARY KEY (`colorid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=5 ;

--
-- Dumping data for table `color`
--

INSERT INTO `color` VALUES (1, 'red');
INSERT INTO `color` VALUES (2, 'green');
INSERT INTO `color` VALUES (3, 'blue');
INSERT INTO `color` VALUES (4, 'yellow');

-- --------------------------------------------------------

--
-- Table structure for table `material`
--

CREATE TABLE `material` (
`materialid` int(11) NOT NULL auto_increment,
`material` varchar(30) NOT NULL default '',
PRIMARY KEY (`materialid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=3 ;

--
-- Dumping data for table `material`
--

INSERT INTO `material` VALUES (1, 'wood');
INSERT INTO `material` VALUES (2, 'iron');

-- --------------------------------------------------------

--
-- Table structure for table `perscolor`
--

CREATE TABLE `perscolor` (
`perscolorid` int(11) NOT NULL auto_increment,
`persid` int(11) NOT NULL default '0',
`colorid` int(11) NOT NULL default '0',
PRIMARY KEY (`perscolorid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=7 ;

--
-- Dumping data for table `perscolor`
--

INSERT INTO `perscolor` VALUES (1, 1, 1);
INSERT INTO `perscolor` VALUES (2, 1, 2);
INSERT INTO `perscolor` VALUES (3, 2, 1);
INSERT INTO `perscolor` VALUES (5, 3, 3);
INSERT INTO `perscolor` VALUES (6, 3, 2);

-- --------------------------------------------------------

--
-- Table structure for table `persmaterial`
--

CREATE TABLE `persmaterial` (
`persmatid` int(11) NOT NULL auto_increment,
`persid` int(11) NOT NULL default '0',
`materialid` int(11) NOT NULL default '0',
PRIMARY KEY (`persmatid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=6 ;

--
-- Dumping data for table `persmaterial`
--

INSERT INTO `persmaterial` VALUES (1, 1, 1);
INSERT INTO `persmaterial` VALUES (2, 1, 2);
INSERT INTO `persmaterial` VALUES (3, 2, 1);
INSERT INTO `persmaterial` VALUES (5, 3, 2);

-- --------------------------------------------------------

--
-- Table structure for table `person`
--

CREATE TABLE `person` (
`persid` int(11) NOT NULL auto_increment,
`name` varchar(30) NOT NULL default '',
PRIMARY KEY (`persid`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=4 ;

--
-- Dumping data for table `person`
--

INSERT INTO `person` VALUES (1, 'john');
INSERT INTO `person` VALUES (2, 'emily');
INSERT INTO `person` VALUES (3, 'liz');

Options: ReplyQuote


Subject
Written By
Posted
Re: Does a normalized design lead to complex queries?
October 19, 2006 06:32AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.