MySQL Forums
Forum List  »  Full-Text Search

full text hashing
Posted by: mikhail malamud
Date: May 07, 2005 08:07PM

This question does not deal directly with usage of mysql but rather underlying technologies/algorithms used in mysql. I was hoping some developers or other knowledgeble folks could share some thoughts with me.
I am developing an application. Part of the application contains an xpath engine. Xpath engine processes documents in xml format fed to him via some middleware entity. The nature of the document traffic is such that many documents are the same. Since it is quite costly to parse documents into logical trees every time, I am developing a caching mechanism that will cache already parsed objects. Because documents can be quite large, using plain hashtable might not work more efficiently than parsing documents every time. My questions are the following:

1. When writing an implementaion of text index in mysql what type of hashing function does mysql use?

2. Assuming hash function is h and document is d and h(d) = i, where i is an index of record in the table, how do you deal with large i values that are generated as a result of h(d).

2a. What's a good ratio between document size and hash size(load factor) for optimal retrieval and minimum collision?

3. How do you deal with collisions? If you are using chaining, do you implement another hashing mecahnism when searching through remaining candidates for a match?

Thanks a lot.

Options: ReplyQuote

Written By
full text hashing
May 07, 2005 08:07PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.