MySQL Forums
Forum List  »  Perl

Reading in Single Characters in FASTA files
Posted by: Alex Kreibich
Date: July 02, 2009 01:58PM

Hi all,

I have code like this...

<code>

#!/usr/bin/perl

($root,$gene) = (@ARGV);
open(TXT,$root.".txt") || die "Cannot open $root.txt";
open(FASTA, $root.".fasta") || die "Cannot open $root.fasta";

$found=0;
while (<TXT>) {
($start,$stop) = m/^$gene \((\d+)..(\d+)\)/io || next;
$found=1;
last;
}
die "Did not find gene $gene in $root.txt" unless $found=1;

$found = 0;
while (<FASTA>) {
chop;
@x=split;
if ($x[0] >= $start) {
# start-logic here;
}
}
while (<FASTA>) {
chop;
@x=split;
if ($x[0] <= $stop) {
# stop-logic here
}
}
# print logic here

my %ibase;
$ibase{"A"} = "T";
$ibase{"T"} = "A";
$ibase{"C"} = "G";
$ibase{"G"} = "C";
sub seq_inverse
{
my ($nseq) = @_;
my @seq = split(m//,$nseq);
my $lim = @seq;
my $inv = "";

$inv .= $ibase{$seq[$lim]} while($lim > 0 and $lim--);

return $inv;
}

</code>

I input this to my windows command line: perl DNA_sequence.pl NC_001666 rps12

I have a NC_001666.txt file and a NC_001666.fasta file. The beginning of the code goes into NC_001666.txt and finds the specific given gene (in this case, rps12), and returns it's range (i.e. it's coding position). With this information I can go into the NC_001666.fasta file and extract the exact DNA base sequence for that gene based on its range. As you can see in the code, I haven't figured out how to do this yet. I want to read in an entire FASTA file, but first I need to know how to completely erase the first line (which is just a header line), and then I need to know how to read in single characters at a time, putting each into an array. Then after that I can go into the array and just print out the bases from the specific range of positions I want from the array. I do not know how to do this, so any help on that would be great.

Also, at the end of the code, I have a subroutine that would reverse the bases if a specific gene is located on the negative strand (which is found in the text file). I have it at the end of the code, though, because I don't know how to integrate into the rest of the script, so any advice on that would be great. Thanks!

Options: ReplyQuote


Subject
Written By
Posted
Reading in Single Characters in FASTA files
July 02, 2009 01:58PM


Sorry, you can't reply to this topic. It has been closed.

Content reproduced on this site is the property of the respective copyright holders. It is not reviewed in advance by Oracle and does not necessarily represent the opinion of Oracle or any other party.