MATCH Keywords with Spaces AGAINST Large Text
We have a keyword database, for example:
1 | MySQL
2 | Database Developer
3 | Software Engineer
4 | Director of Development
And we need a fast way to scan a large set of text (say a resume in base text) to match these exact words, with spaces.
Granted, most of the text we're scanning is well over 1,000 characters, it could be something like:
"I am a Senior Software Engineer with Director of Development experience specializing in JavaScript, JSON, and Web Development. I also have Management experience and have worked in Banking, Cloud, and custom CRM development environments. I am also familiar with .NET and Kittens."
We need to get exact matches for the keyword phrases; for example we want "Director of Development" to match, and not "Director" or "Development" by itself.
Right now we are using REGEXP like this:
SELECT
*
FROM
sit_keyword sk
WHERE
@large_resume_text REGEXP CONCAT('\\b',sk.keyword,'\\b');
It works fine, but it's very slow, and has issues with + and -. But it works - matching the FULL TEXT of the keyword and not just pieces of it.
However, the fastest way to do this would be to use MATCH AGAINST. Note the sit_keyword table has a FULLTEXT index.
e.g.
SELECT
*
FROM
sit_keyword sk
WHERE
MATCH(sk.keyword) AGAINST (@large_resume_text);
The problem is the MATCH AGAINST system breaks the words from the MATCH(col1) up into separate words. For example "Software Engineer" will return a match against the text if either word "Software" or "Engineer" is in the AGAINST target. Even if we put quotes around the col1 keywords, it's the same. This is causing our MATCH AGAINST to pull out keywords that are not true matches.
I tried swapping MATCH to use the large resume text (over 1,000 characters) but AGAINST won't allow the definition of iterative values from our keyword database (at least I don't know how to define that in AGAINST. So it seems we're stuck requiring MATCH to use the database keywords and AGAINST to use the TEXT variable.
The thing is if MATCH AGAINST actually used the whole "phrase" from the col1 keyword (e.g. "Software Engineer") I think it would work just fine. But it won't.
How can we force MATCH AGAINST to match the entire MATCH(col1) value against the target text, and not the individual words in col1? Is there another approach to do this?
Subject
Views
Written By
Posted
MATCH Keywords with Spaces AGAINST Large Text
1592
December 04, 2021 12:25PM
512
December 04, 2021 08:07PM
525
December 05, 2021 11:56AM
526
December 05, 2021 12:08PM
481
December 05, 2021 03:22PM
477
December 05, 2021 07:33PM
503
December 05, 2021 07:45PM
523
December 05, 2021 08:32PM
501
December 06, 2021 12:18PM
457
December 22, 2021 10:59AM
385
December 22, 2021 12:30PM
529
December 06, 2021 12:30PM