regexp challenger
Posted by:
loic duros
Date: August 30, 2009 09:22AM
Hello,
I'm trying to reproduce a well-known word game using a list of words in a mysql table and the regexp command.
Take a word, for instance with today's newspaper: remissible
You have to find as many words of 4 letters or more using the letters from the word only.
This would make: miss - missile - bliss matches, but not bell (two L, while the word only has one L).
Here is the regexp I came up with so far:
SELECT * FROM words WHERE words REGEXP '^[remisibl]{4,}[[:>:]]'
This will find a big list of words, but unfortunately, it includes words with more than one instance of single letters, such as:
Bell
Beriberi
and other disallowed words. The only letters present twice in a word would be: i (remIssIble), s (remiSSible), and e (rEmissiblE), making bliss, beers, and many other double i, s, and e words correct.
But my REGEXP doesn't incorporate a count of the letters, and I'm not sure how to achieve that with mysql REGEXP.
If I were to take the letters out of the class [....] and assign them with ? or {,2} counts, then the order of the letter would matter for the matches, which is of course not working since the pattern has to look for words with the letters in any order.
I've also tried the following, without success because the class is executed 4 times:
SELECT * FROM words WHERE words REGEXP '^[r?e{,2}m?i{,2}s{,2}b?l?]{,4}[[:>:]]'
I get the same words with this SELECT than with the first one...
Any idea?
Thanks for your help!