MySQL Forums
Forum List  »  General

Re: linked data
Posted by: Rick James
Date: March 21, 2009 02:15PM

<key-value-tirade>
My comments go for any key-value system, not just RDF.

What will you do with the keys and values (predicates and objects)?

- If you are just collecting them, you don't need to power of a relational database. I recommend collecting them in a blob.

- If you are searching on them, then, sure RDF holds that promise. And I'm promising you will have scaling problems after several million "items", each composed of dozens of triples, each stored as individual rows in a database table.

If your task is small enough and you have enough RAM for it to fit in cache, you won't be caught with serious I/O bottleneck. Instead you will be hit with a less serious CPU bottleneck.

Queries end up looking like
SELECT subj
   FROM triples t1,
   JOIN triples t2 ON t2.subj = t1.subj 
   JOIN triples t3 ON t3.subj = t1.subj 
   WHERE t1.pred = 'foo' AND t1.obj = 'foovalue'
     AND t2.pred = 'bar' AND t1.obj = 'something'
     AND t3.pred = 'zyx' AND t1.obj = 'else'
In RDF, they are called subject/predicate/object.
In other key-value systems they may be called id/key/value.

Note: If you do the 'right' thing and normalize the tables, then the predicates and objects are not literals, but ids used to JOINs to other table(s). This makes the SELECT struggle even harder, _especially_ if you are checking for an object (value) in some range. For that reason, I insist that continuous values (counts, measurements, dates) must not be normalized.

A kludge to make RDF semi-scalable is "Property tables". These are effectively ordinary database tables with copies of selected values. The column names mimic the 'predicates'; the cell contain the 'objects'; one row per 'subject' Such table(s) can be indexed and searched in ways that are more efficient for database engines (like MySQL) to manage.

SELECT subj 
   FROM propertyTable
   WHERE foo = 'foovalue'
     AND bar = 'something'
     AND zyx = 'else';
If you have an index with foo, bar, and zyx, this query can be done 'instantly'. And the table is _much_ smaller than the triples table (hence more easily cached).

But, once you have pulled out the things you might want to search on, and put them into Property tables, there is no need (and a big waste) for storing the triples in separate rows. Instead marshal them up and store them in a blob. (I like JSON + zlib::compress.)
</key-value-tirade>

Options: ReplyQuote


Subject
Written By
Posted
March 18, 2009 02:03PM
March 19, 2009 10:41PM
March 21, 2009 10:42AM
Re: linked data
March 21, 2009 02:15PM
March 26, 2009 07:06PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.