MySQL Forums
Forum List  »  Full-Text Search

MATCH Keywords with Spaces AGAINST Large Text
Posted by: David Allen
Date: December 04, 2021 12:25PM

We have a keyword database, for example:

1 | MySQL
2 | Database Developer
3 | Software Engineer
4 | Director of Development

And we need a fast way to scan a large set of text (say a resume in base text) to match these exact words, with spaces.

Granted, most of the text we're scanning is well over 1,000 characters, it could be something like:

"I am a Senior Software Engineer with Director of Development experience specializing in JavaScript, JSON, and Web Development. I also have Management experience and have worked in Banking, Cloud, and custom CRM development environments. I am also familiar with .NET and Kittens."

We need to get exact matches for the keyword phrases; for example we want "Director of Development" to match, and not "Director" or "Development" by itself.

Right now we are using REGEXP like this:

sit_keyword sk
@large_resume_text REGEXP CONCAT('\\b',sk.keyword,'\\b');

It works fine, but it's very slow, and has issues with + and -. But it works - matching the FULL TEXT of the keyword and not just pieces of it.

However, the fastest way to do this would be to use MATCH AGAINST. Note the sit_keyword table has a FULLTEXT index.


sit_keyword sk
MATCH(sk.keyword) AGAINST (@large_resume_text);

The problem is the MATCH AGAINST system breaks the words from the MATCH(col1) up into separate words. For example "Software Engineer" will return a match against the text if either word "Software" or "Engineer" is in the AGAINST target. Even if we put quotes around the col1 keywords, it's the same. This is causing our MATCH AGAINST to pull out keywords that are not true matches.

I tried swapping MATCH to use the large resume text (over 1,000 characters) but AGAINST won't allow the definition of iterative values from our keyword database (at least I don't know how to define that in AGAINST. So it seems we're stuck requiring MATCH to use the database keywords and AGAINST to use the TEXT variable.

The thing is if MATCH AGAINST actually used the whole "phrase" from the col1 keyword (e.g. "Software Engineer") I think it would work just fine. But it won't.

How can we force MATCH AGAINST to match the entire MATCH(col1) value against the target text, and not the individual words in col1? Is there another approach to do this?

Options: ReplyQuote

Sorry, only registered users may post in this forum.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.