MySQL Forums
Forum List  »  General

Re: full-text algorithm
Posted by: Bálint Hajduk
Date: March 05, 2018 12:42PM

I was short on time (I made this to my final exam at msc), so the program is the first working version (parts of the algorithm, witch can make it faster, missing).
I only had time to make test in Hungarian language. I made this, because (at least in Hungarian) of the affixes, the 96% of the search term (the people not using affixes in search terms) only can search in 30% of the written text (where people using affixes). For example a simple b-tree won't match if someone "house" against "in your house", in Hungarian.
In oracle has a simpler solution, what I programmed that too in mysql, to test against mine. It can index 20000 sentence in 39-45 seconds (on my notebook), and make 145000 search queries in them in 13-14 seconds with 2129502 match.
My algorithm building index in 6-11 second, and do the queries in 12-13 seconds with 2134123 match (the match number should be the same, I tested the first 2000 queries manually, but only found error in b-tree).
Normal b-tree building the index in 2-3 seconds, and do the queries in 10-11 seconds with 744804 match.
So the program search around 10% faster (of course there was not too match test cases (1000, 10000, 20000 indexed sentence)), despite it's the first working, and partial version of the algorithm.

Options: ReplyQuote


Subject
Written By
Posted
March 05, 2018 09:14AM
March 05, 2018 10:23AM
Re: full-text algorithm
March 05, 2018 12:42PM
March 05, 2018 03:21PM
April 20, 2018 07:02AM
April 20, 2018 11:34AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.