On Functional Dependencies.
Posted by: Juan Casanova
Date: January 25, 2012 10:26AM
Hello, my first post in this forum (and probably one of the last too).
First, let me briefly introduce myself; I am a 20 year old Computer Science and Mathematics student and passionate who has been working at his first job in the sector for the past 6 months. The doubt and question I am going to ask has been inside my head since the moment I studied the formal relational database paradigm and at the same time the implementation of SQL for it.
I know this is not a forum for this kind of questions. This is a forum about MySQL, not about the SQL standard (and as far as I have been able to read, functional dependencies are not even a part of the SQL standard); and this subforum is about constraints within MySQL, not about functional dependencies, but it is what is most similar to functional dependencies, I guess.
Now onto the topic itself. I have always wondered, as I said, why in hell does SQL not offer the possibility to define functional dependencies and use them for all the different things they could be used?
As far as I see it, you could analyse it like this.
Advantages of having functional dependencies:
- Would allow for more clearly defined meanings and constraints of databases.
- Same as foreign keys allow you to make sure that the database data is consistent within the meaning of your universe, functional dependencies would further allow for this. It would be harder to "fuck up" with the data.
- Auto-normalization of the database would be doable, as there are well thought algorithms for doing this knowing the functional dependencies. In the cases where the amount of data is very high and normalization makes database treatment too slow, you could simply skip auto-normalization; or possibly add a more complex algorithm with parameters setting up "how much normalization" you wish to put on your database. Overall, having the possibility of auto-normalizing would never be bad and would for sure be good in certain cases (it would not be unused, for small databases at least it would be highly recommendable).
* Search and index efficiency.
- This is probably by long the most important advantage. Imagine a database table where you have a column named "Type" and a column named "Subtype". It is not far-fetched to think (and imagine in this particular case it is like that) that you have the functional dependency Subtype -> Type. That is, subtypes are actually sub - types of the types, you can't have the same subtype for two different types. Without functional dependencies, the database has no information about the relationship between these 2 columns. Now, you could make 2 indexes, one for Type and one for Subtype; or you could make a compound index of (Type, Subtype). Now imagine the functional dependencies are more complex, and you have yet another third column Class, with the functional dependency Subtype -> Class; and no relationship between Type and Class.
The result is that you can make 3 simple indexes, 2 compound-indexes or just leave one of the columns un-indexed.
Now, if you had functional dependencies, you could exclusively define the index on Subtype and still use it partially for searchs or joins on the Type or Class columns. Imagine you had Subtype indexed and you wanted to join the table with antoher table through the Type column. You could create a temporary table with the Subtype and Type of each register of your table ordered by subtype (this would be simply extending the index by adding the Type column, order 1.000.000); then proceed to do the linear search through Type on that table, and once you found a row with a matching Type, you know that all the rows with the same Subtype have therefore the same Type and can bulk-join them as your table is ordered by Subtype. Then simply continue to linearly look for more matching Types.
In the opposite case. B -> A, C -> A. You could have the index only in A; and the efficiency gain wouldn't be that big, but you could still, to search by B, do a linear search on B until you found an equivalence. Therefore, you know that all other ocurrences of that value b will have a specific value a on the A column (the one that matched). Therefore, for that row of the <<other>> column you can restrict the search to the subset which has a on the A column, which you can separate because A is indexed, and later on remove the row from the <<other>> table and continue the linear join. The gain here is significantly lower and probably deprecatable, but in the first case it isn't.
Disadvantages of functional dependencies:
- They are very easy to define and as defined over columns and no data needed to be "maintained" when adding/removing rows (it is a column constraint, as the way its defined); it wouldn't take much space to keep so it isn't a disadvantage on that sense.
- They can make operations slower when adding new rows, but since typically you will have an index on the left attribute of the dependency (A in A->B), it will not be that slow. You could even force this index the same way as an index is forced on a foreign key. In any case, this can be controlled by simply not overkilling functional dependencies and LETTING THE DATABASE DESIGNER THINK IF IT IS WORTH USING FUNCTIONAL DEPENDENCIES.
- You have to program the algorithms??? I don't think lazyness is a disadvantage if there are actual advantages to it.
I know that all of these things are well-known (for those that know them well) and etc... but I wanted to explain them to avoid discussing with other people afterwards about what I have already thought. Overall, I just don't understand the reasoning for simply not adding functional dependencies to SQL.
Thanks in advance for reading and for your comments.