How to model an i18n table and how to query it
Posted by:
Aziz
Date: July 23, 2005 07:24AM
Hi!
I've got a table in my database, where I store the multilingual message resources for the GUI of my web application. The current structure of this table looks like this:
[sql]
CREATE TABLE i18n (
IN_ID MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
N_ID TINYINT(4) NOT NULL, //FK to another table with namespace identifiers (e.g. MENU).
string VARCHAR(150) NOT NULL, //The universal, unique identifier of a resource (e.g. DB_NO_CONNECTION)
annot TEXT NOT NULL, //An annotation to the english text being translated
en TEXT NOT NULL, //The english text and the source for all translations.
de TEXT NOT NULL, //German text
ur TEXT NOT NULL, //Urdu text
en_ts TIMESTAMP NOT NULL DEFAULT 0000-00-00 00:00:00, //Date of creation/last modification
de_ts TIMESTAMP NOT NULL DEFAULT 0000-00-00 00:00:00, //-"-
ur_ts TIMESTAMP NOT NULL DEFAULT 0000-00-00 00:00:00, //-"-
PRIMARY KEY(IN_ID),
);
[/sql]
I'm aware of the fact that this table is far from being normalized, but it was satisfying enough as a first draft and I didn't want to change it, because I wanted to make faster progress in my project. Now I'd like to normalize the table, which I would do like so:
[sql]
CREATE TABLE i18n (
IN_ID MEDIUMINT(8) UNSIGNED NOT NULL AUTO_INCREMENT,
lang ENUM('en','de','ur') NOT NULL DEFAULT 'en',
N_ID TINYINT(4) NOT NULL,
string VARCHAR(150) NOT NULL,
annot TEXT NOT NULL,
resource TEXT NOT NULL,
res_ts TIMESTAMP NOT NULL DEFAULT 0000-00-00 00:00:00,
PRIMARY KEY(IN_ID, lang),
);
[/sql]
The en,de,ur columns are merged into the resource column and the en_ts,de_ts,ur_ts are merged into res_ts. Furthermore, I created a new column named lang and changed the primary key to be a combination of IN_ID and lang. Normalizing the table a bit further is still possible, but I'll do this at a later time. Having a multi-column PK, consisting of a numeric ID and an ENUM indicating the language, is necessary because it enables me to determine which resources belong to each other, for instance:
IN_ID | lang | resource
-------------------------
1 | 'en' | English
1 | 'de' | Englisch
1 | 'ur' | who knows?
2 | 'en' | German
2 | 'de' | Deutsch
2 | 'ur' | who knows?
The new version of the table has the advantage that adding a new language doesn't require changing the structure of the table. You can easily add new languages by extending the ENUM list in the lang column. However, introducing such a table structure poses the problem of building a sophisticated query, which selects all resources where each language has its own column. For example:
IN_ID | en | de | ur
-------------------------------
1 | English | Englisch | who knows?
2 | German | Deutsch | who knows?
So what does the query have to look like if I want to have such a result set?
I had only two ideas regarding this, each with some shortcomings:
1.) GROUP BY clause in combination with the GROUP_CONCAT aggregate function:
[sql]
SELECT IN_ID, GROUP_CONCAT(resource SEPARATOR '<ยง$>') AS res_list, GROUP_CONCAT(lang SEPARATOR ';')
FROM i18n
GROUP BY IN_ID
[/sql]
The problems I see with this query are:
.) My idea is to use PHP to parse the concatenated string into an array, therefore the separator defined in the first GROUP_CONCAT mustn't exist in the resources, which possibly can.
.) The second GROUP_CONCAT is needed to be able to know what language the resources in the list are in. If the order of both concatenation operations aren't exactly the same this can lead to problems in my application.
2.) Multiple self-joins with sub-selects:
[sql]
SELECT en.IN_ID, en.resource AS en, de.resource AS de, ur.resource AS ur
FROM (SELECT IN_ID, resource FROM i18n WHERE lang='en') as en,
(SELECT IN_ID, resource FROM i18n WHERE lang='de') as de,
(SELECT IN_ID, resource FROM i18n WHERE lang='ur') as ur
WHERE en.IN_ID = de.IN_ID AND en.IN_ID = ur.IN_ID
[/sql]
The problems I see with this query are:
.) Every additional language requires another self-join and sub-select. I don't know what you think, but this query seems to be kind of an overkill if you attempt to select all languages. Just imagine if you have 20 languages or more...
.) The self-join is done in the WHERE clause, which is much stricter than a LEFT JOIN. A left JOIN would also return NULL results, indicating that there exists no translation of a specific resource. The WHERE clause only returns non-NULL results, therefore only resources are returned that exist in all the sub-tables. I tried to do a sub-select in a LEFT JOIN clause but MySQL (4.1.x) refused to execute the query.
Hopefully I have been able to explain my problem as lucid as possible. I'm looking forward to a reply from a (My)SQL expert.
Regards,
Aziz