Reading in Single Characters in FASTA files
Hi all,
I have code like this...
<code>
#!/usr/bin/perl
($root,$gene) = (@ARGV);
open(TXT,$root.".txt") || die "Cannot open $root.txt";
open(FASTA, $root.".fasta") || die "Cannot open $root.fasta";
$found=0;
while (<TXT>) {
($start,$stop) = m/^$gene \((\d+)..(\d+)\)/io || next;
$found=1;
last;
}
die "Did not find gene $gene in $root.txt" unless $found=1;
$found = 0;
while (<FASTA>) {
chop;
@x=split;
if ($x[0] >= $start) {
# start-logic here;
}
}
while (<FASTA>) {
chop;
@x=split;
if ($x[0] <= $stop) {
# stop-logic here
}
}
# print logic here
my %ibase;
$ibase{"A"} = "T";
$ibase{"T"} = "A";
$ibase{"C"} = "G";
$ibase{"G"} = "C";
sub seq_inverse
{
my ($nseq) = @_;
my @seq = split(m//,$nseq);
my $lim = @seq;
my $inv = "";
$inv .= $ibase{$seq[$lim]} while($lim > 0 and $lim--);
return $inv;
}
</code>
I input this to my windows command line: perl DNA_sequence.pl NC_001666 rps12
I have a NC_001666.txt file and a NC_001666.fasta file. The beginning of the code goes into NC_001666.txt and finds the specific given gene (in this case, rps12), and returns it's range (i.e. it's coding position). With this information I can go into the NC_001666.fasta file and extract the exact DNA base sequence for that gene based on its range. As you can see in the code, I haven't figured out how to do this yet. I want to read in an entire FASTA file, but first I need to know how to completely erase the first line (which is just a header line), and then I need to know how to read in single characters at a time, putting each into an array. Then after that I can go into the array and just print out the bases from the specific range of positions I want from the array. I do not know how to do this, so any help on that would be great.
Also, at the end of the code, I have a subroutine that would reverse the bases if a specific gene is located on the negative strand (which is found in the text file). I have it at the end of the code, though, because I don't know how to integrate into the rest of the script, so any advice on that would be great. Thanks!