MySQL Forums
Forum List  »  Perl

new to PERL and MYSQL - needing help with comparing two files and removing duplicates
Posted by: Carlo Piva
Date: May 25, 2009 10:01AM

Hi there!

Okay I'm quite new to PERL and MySQL. I've been requested to make a script that would take the input from a database (only requiring specific fields), compare the results to a flat file, and not include any duplicates. It sounds easy but for someone who hasn't programed since the early 90's I'm lost lol

To make things simple I'm starting by loading the database utilizing fetchrow_hashref.

Since the database contains the duplicate entries, I need to sort through it and remove the duplicates.

Next would be to read the flat file and compare the results.

Lastly would be to display to a webpage (not to worried about that right now)

Here is my code so far. It is only up to reading the data and my failed (possibly feeble) attempt at trying to get it to remove the duplicates.

I hope someone can give me a hand, a few pointers, and some direction. I've purchased a MySQL Developers Guide book ($60) and a Learning PERL book ($40) and have been going through them. I understand what they have in the book but nothing is covered with what I'm specifically looking to do.


:EDIT:

Okay I figured out how to remove the duplicates. I can't believe I spent 2 days researching this and I accidentally come across the answer. It was as simple as including a DISTINCT option in my SELECT clause. Man I'm smashing my head against a wall now lol

So I think the only thing I'm royally stuck on doing now is comparing the contents of the two hash arrays to each other. I need to use the one labeled as $file_ref as the master and compare the $db_ref to it. If there are duplicates across the two arrays they are to be ignored (because that means that the database has those entries). I'm only looking for the differences between them. Specifically, I'm looking for entries that are in the CSV file but missing from the Database or different than the database entry that we have. The csv file is a form of the master host file on the system (with printer, server, and firewall information stripped out leaving only the users). The database contains email connection information as it is reported from LDAP and other softwares. We have noticed a discontinuity between the ldap server and the database. Many people aren't being reported. Some are being reported but missing information (IE:Netbios name but no username or vice versa). We're not sure if they are setup properly in ldap or if it's a problem with the program used to generate the database program. As the master host file on the main server records everyone's IP and associated computer name, we decided to use that as our master comparison. We want to compare this to the database. This wont tell us what the exact problem is, but will tell us who is and who is not being reported on. From there we can further investigate. Oh and did I mention I'm only an intern and my school didn't teach any programing and that I haven't programed anything since 1993 LOL

Thank you so much in advance! Bellow is my code (current as of the last edit of this post)

#!/usr/bin/perl
#use strict;
#The Setup
use warnings;
use Text::CSV_XS;
use DBI;
my %conn_attrs = (RaiseError =>1, PrintError =>0, AutoCommit =>1);
my $dbh = DBI->connect('DBI:mysql:pstsize;host=localhost');
my $sth = $dbh->prepare("SELECT DISTINCT ipaddr,user,nbname from sizehist order by ipaddr");
my @data;
#Read the DB into a hash array
$sth->execute();
while (my $db_ref = $sth->fetchrow_hashref ())
{
# my $delim = "";
foreach my $key ("ipaddr", "user", "nbname")
{
$db_ref->{$key} = "" if !defined ($db_ref->{$key});
# print $delim, $db_ref->{$key};
# $delim = ", "
}
print "\n";
}
$dbh->disconnect () or die DBI->errstr;
#Read CSV file into hash array using CSV_XS module
my %file_ref;
my $csv = Text::CSV_XS->new ({binary => 1, eol => "\n"});
open my $io, "<", "use_me.txt" or die "use_me.txt: $!";
my @fld = @{$csv->getline ($io)};
while (my $row = $csv->getline ($io)) {
push @{$file_ref{$fld[$_]}}, $row->[$_] for 0 .. $#fld;
}
close $io;
#Use $file_ref as master record/array and compare the contents of $db_ref to the master.
#If different, report. If the same ignore.



Edited 4 time(s). Last edit at 05/25/2009 02:29PM by Carlo Piva.

Options: ReplyQuote


Subject
Written By
Posted
new to PERL and MYSQL - needing help with comparing two files and removing duplicates
May 25, 2009 10:01AM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.