MySQL Forums
Forum List  »  Newbie

Re: Automated User Analysis and Similarity Scoring - Social Media
Posted by: Joshua Morley
Date: March 23, 2015 10:29PM

Not exactly what you're looking for, but perhaps useful anyway.

I might be taking your sample data too literally, but since verything there is a 0 or 1, you can treat the entire set of answers for each user as a binary representation of a number. This query is one way of constructing that number, which I have called "kasafrak":

SELECT [TABLE(1)].user_id, [a1]+[a2]*2+[a3]*4+[a4]*8+[a5]*16+[a6]*32+[a7]*64+[a8]*128+[a9]*256 AS kasafrak
FROM [TABLE(1)]

My syntax might not be mySql compatible. I presently have access to something else. Here's what I get from that query:
x111 394
x112 465
x113 426
x114 394

So you can already see that x111 and x114 have the same value because they answered the questions the same way. If your data isn't actually all 0s and 1s, you might be able to create kasafrak by concatenating the answers, but make sure you're guaranteed a unique string for every possible combination of answers.

Now you use that query as the source for a 2nd query that looks for matching kasafraks:

SELECT qryKasafrakA.user_id, qryKasafrakB.user_id
FROM qryKasafrak AS qryKasafrakA INNER JOIN qryKasafrak AS qryKasafrakB ON qryKasafrakA.kasafrak = qryKasafrakB.kasafrak
WHERE (((qryKasafrakA.user_id)<>[qryKasafrakB].[user_id]))

The "WHERE" part is to eliminate showing you what you already know, that each user matches his or her self. If you actually want to see those results, then exclude every thing after and including the word "WHERE". Here's what I get if I leave out the WHERE:
x1111	x1111
x1114	x1111
x1112	x1112
x1113	x1113
x1111	x1114
x1114	x1114

You might be able to get your desired output using a crosstab query, but I'm not sure.

Options: ReplyQuote


Subject
Written By
Posted
Re: Automated User Analysis and Similarity Scoring - Social Media
March 23, 2015 10:29PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.