So I have an exercise in which I have to print the three first lines of a fasta file as well as the protein sequence. I have tried to run a script I wrote, but cygwin doesnt seem to print the sequence out. My code is as follows:
#!usr/bin/perl
open (IN,'P30988.txt');
while (<IN>) {
if($_=~ m/^ID/) {
print $_ ;
}
if($_=~ m/^AC/) {
print $_ ;
}
if ($_=~ m/^SQ/) {
print $_;
}
if ($_=~ m/\^s+(\w+)/) { #this is the part I have trouble with
$a.=$1;
$a=~s/\s//g; #this is for removing the spaces inside the sequence
print $a;
}
The fast file looks like this:
SQ SEQUENCE 474 AA; 55345 MW; 0D9FA81230B282D9 CRC64;
MRFTFTSRCL ALFLLLNHPT PILPAFSNQT YPTIEPKPFL YVVGRKKMMD AQYKCYDRMQ
QLPAYQGEGP YCNRTWDGWL CWDDTPAGVL SYQFCPDYFP DFDPSEKVTK YCDEKGVWFK
HPENNRTWSN YTMCNAFTPE KLKNAYVLYY LAIVGHSLSI FTLVISLGIF VFFRSLGCQR
VTLHKNMFLT YILNSMIIII HLVEVVPNGE LVRRDPVSCK ILHFFHQYMM ACNYFWMLCE
GIYLHTLIVV AVFTEKQRLR WYYLLGWGFP LVPTTIHAIT RAVYFNDNCW LSVETHLLYI
IHGPVMAALV VNFFFLLNIV RVLVTKMRET HEAESHMYLK AVKATMILVP LLGIQFVVFP
WRPSNKMLGK IYDYVMHSLI HFQGFFVATI YCFCNNEVQT TVKRQWAQFK IQWNQRWGRR
PSNRSARAAA AAAEAGDIPI YICHQELRNE PANNQGEESA EIIPLNIIEQ ESSA
//
To match the sequence I used the fact that each line starts with several spaces and then its only letters. It doesnt seem to do the trick regarding cygwin. Here is the link for the sequence https://www.uniprot.org/uniprot/P30988.txt
The problem is with this line
if ($_=~ m/\^s+(\w+)/) { #this is the part I have trouble with
You have the backslash in the wrong place in this part \^s+
. You are actually escaping the ^
. The line in your code should be
if ($_=~ m/^\s+(\w+)/) { #this is the part I have trouble with
I'd write that block of code like this
if ($_=~ m/^\s/) {
s/\s+//g; #this is for removing the spaces inside the sequence
print $_;
}