MySQL Forums
Forum List  »  Full-Text Search

implementing match scores between records
Posted by: Dustin DeVries
Date: January 01, 2006 08:08PM

I'm creating a mysql-driven website that does data mining of various sites for a particular topic and compiles the data onto one site. One of the complexities is dealing with duplicate entries, i.e., if two or more of my source sites report on the same data.

Is there anything built in to mysql, or perhaps some other sort of data matching package I could use? I'm not sure if full-text does what I need or not.

What I'm picturing right now is a system that compares a new story to existing stories in the database. If there is a match, i.e. a score above a certain threshold, the story is considered a duplicate of an entry already in the database, and it is flagged and/or discarded as such.

If anyone can advise me, either with functionality native to mysql, or perhaps some other technology that works well for this sort of thing, I would appreciate it. If it helps, I'm programming in CGI/perl, although the data mining portion of my site could really be done in any language. I'm certainly open to using any other language, as I'm fluant in many.

Also, if this is the wrong group to post this question in, please let me know, and my appologies in advance.


Options: ReplyQuote

Written By
implementing match scores between records
January 01, 2006 08:08PM

Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.