MySQL :: Re: full-text algorithm

New Topic

Re: full-text algorithm

Posted by: Bálint Hajduk
Date: March 05, 2018 12:42PM

I was short on time (I made this to my final exam at msc), so the program is the first working version (parts of the algorithm, witch can make it faster, missing).
I only had time to make test in Hungarian language. I made this, because (at least in Hungarian) of the affixes, the 96% of the search term (the people not using affixes in search terms) only can search in 30% of the written text (where people using affixes). For example a simple b-tree won't match if someone "house" against "in your house", in Hungarian.
In oracle has a simpler solution, what I programmed that too in mysql, to test against mine. It can index 20000 sentence in 39-45 seconds (on my notebook), and make 145000 search queries in them in 13-14 seconds with 2129502 match.
My algorithm building index in 6-11 second, and do the queries in 12-13 seconds with 2134123 match (the match number should be the same, I tested the first 2000 queries manually, but only found error in b-tree).
Normal b-tree building the index in 2-3 seconds, and do the queries in 10-11 seconds with 744804 match.
So the program search around 10% faster (of course there was not too match test cases (1000, 10000, 20000 indexed sentence)), despite it's the first working, and partial version of the algorithm.

Navigate: Previous Message• Next Message

Options: Reply• Quote

Subject

Written By

Posted

full-text algorithm

Bálint Hajduk

March 05, 2018 09:14AM

Re: full-text algorithm

Peter Brawley

March 05, 2018 10:23AM

Re: full-text algorithm

Bálint Hajduk

March 05, 2018 12:42PM

Re: full-text algorithm

Peter Brawley

March 05, 2018 03:21PM

Re: full-text algorithm

Bálint Hajduk

April 20, 2018 07:02AM

Re: full-text algorithm

Peter Brawley

April 20, 2018 11:34AM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.