Good structure for diversified data
Posted by: Kai Schätzl
Date: October 30, 2013 07:01AM

Hello. I hope this is the correct category for this topic.

I'm working on a project that works on phrases in lots of languages, e.g. there is one phrase in English or German that gets translated to many other languages. For display we usually fetch two languages containing to a certain topic (maybe 5 or 10 or 20 records) and display this side-by-side. With the current table structure I have all language phrases in one record for a certain phrase. This is quite easy.

However, over time we got more and more fields that are language specific. e.g. first there was one field per language for the phrase, so, with 30 languages I have 30 fields per record plus some other fields with identification, codes and what not.

Now I need much more fields per language, e.g. 30 fields for a short phrase, 30 fields for a description, 30 fields for editor lock, 30 fields for comments etc., all specific to the language.

So, instead of having all this in one record/table I think of splitting in two tables. One contains the base data and the other one contains the data per individual language. So, I can reuse this structure again and again and have to add only one field in table 2 instead of 30 if I want to add a new field.

The problem I face is that I cannot generate the desired output of having two (or more) languages *side-by-side* per record with the new structure

The basic structure of the old database is like that:

Views -> Views_Relations -> Modules (=Phrases)
There is 1:N (actually M:N, as different Views can contain the same module) for Views -> Modules (that's why there is Views_Relations in-between). e.g. View no. 1 may have 5 phrase records (modules). So, a query for view 1 results in 5 records that each contain all the 30 languages.

Changing this to the new split database makes something like:

Views -> Views_Relations -> Module_Stamm -> Module_Lang
Where M:N for Views/View_Relations -> Module_Stamm and 1:N for Module_Stamm -> Module_Lang e.g. View no. 1 may have 5 phrase records (modules) as above. And each of the phrase records may have 1 - 30 records (one per language) in Module_Lang. Relations for Views -> Views_Relations are done by a ViewID. But are not really a subject here (as I know the View I can just leave out Views from the join). Relations for Views_Relations -> Module_Stamm and relations for Module_Stamm -> Module_Lang, both are done by the same ItemID that identifies the phrase.

At this point I'm not able to get only 5 records with a query for two languages. I get 5 records per language, so for two languages I get ten.
e.g.
SELECT *
FROM Views_Relations vr
LEFT JOIN Module_Stamm ms ON vr.ItemID = ms.ID
INNER JOIN Module_Lang ml ON ms.ID = ml.ID
WHERE vr.ViewID = 1
AND (ml.LangID = 'de'
OR ml.LangID = 'en')
ORDER BY vr.ItemOrder

or I can do:
SELECT *
FROM Views_Relations vr
LEFT JOIN Module_Stamm ms ON vr.ItemID = ms.ID
INNER JOIN Module_Lang ml ON vr.ItemID = ml.ID
WHERE vr.ViewID = 1
AND (ml.LangID = 'de'
OR ml.LangID = 'en')
ORDER BY vr.ItemOrder

(type of JOIN may not be correct in the end, but doesn't make a difference for now)

I get ten records that are identical except for the different language results. I would like to have the ten records merged to 5. Is this possible by using a different database structure or a different query? I can postprocess the results in PHP and thus get the desired results, but this adds extra overhead and (probably) has to be adapted each time I change the database structure.

I consider myself intermediate in SQL skills but have no experience with stored procedures and such, in case this might help to achieve my goal.

Additional information:
output result may be something like
ID, phrase, other fields
and two rows per item (= two languages)

I want to have this as
ID, phrase_Lang1, phrase_Lang2, other fields
in just *one* row.

Edit: I've just learned about GROUP_CONCAT. With GROUP BY I would indeed reduce to five rows, but put the various languages in *one* field. I need them in two fields (or three, if I query for three languages for instance).

Edit: I found a solution. Just don't know if it is the best one (performance ?) or whether I could use a better table structure. And it gets more complex with additional fields and additional joins (which are necessary to pull up phrases that are dependant on other phrases/topics).
(LF is the field with the phrase.)
SELECT * ,
MAX(IF(ml.LangID = 'de', ml.LF, NULL)) as `de`,
MAX(IF(ml.LangID = 'en', ml.LF, NULL)) as `en`
FROM Views_Relations vr
LEFT JOIN Module_Stamm ms ON vr.ItemID = ms.ID
INNER JOIN Module_Lang ml ON vr.ItemID = ml.ID
WHERE vr.ViewID = 1
AND (ml.LangID = 'de'
OR ml.LangID = 'en')
GROUP BY vr.ItemID
ORDER BY vr.ItemOrder

Thanks for any hints!

Thanks.



Edited 4 time(s). Last edit at 10/30/2013 08:17AM by Kai Schätzl.

Options: ReplyQuote


Subject
Written By
Posted
Good structure for diversified data
October 30, 2013 07:01AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.