Search code examples
perlbioinformaticsbioperl

How to read reference line (start with RN,RT,RA,RC,RX,RP,RL) and print all


Hello Everyone, I had a problem regarding a Perl Module as I am using this module to retrieve some specific lines form a flat file that contains multiple sets of information as I had mentioned in code.(This is an example code of Bio::Parse::SwissProt.pm). But the problem is that whenever we are working with this code, it has a problem in Refs statement. It is giving an error as modification of read-only value attempted atc:/wamp/bin/perl/site/lib/bio/parse/swissprot.pm line 345. Input file looks like this

Input File(Flate file)

ID   P72354_STAAU            Unreviewed;       575 AA.
AC   P72354;
DT   01-FEB-1997, integrated into UniProtKB/TrEMBL.
DT   01-FEB-1997, sequence version 1.
DT   29-MAY-2013, entry version 79.
DE   SubName: Full=ATP-binding cassette transporter A;
GN   Name=abcA;
OS   Staphylococcus aureus.
OC   Bacteria; Firmicutes; Bacilli; Bacillales; Staphylococcus.
OX   NCBI_TaxID=1280;
RN   [1]
RP   NUCLEOTIDE SEQUENCE.
RC   STRAIN=NCTC 8325;
RX   PubMed=8878592;
RA   Henze U.U., Berger-Bachi B.;
RT   "Penicillin-binding protein 4 overproduction increases beta-lactam
RT   resistance in Staphylococcus aureus.";
RL   Antimicrob. Agents Chemother. 40:2121-2125(1996).
RN   [2]
RP   NUCLEOTIDE SEQUENCE.
RC   STRAIN=NCTC 8325;
RX   PubMed=9158759;
RA   Henze U.U., Roos M., Berger-Bachi B.;
RT   "Effects of penicillin-binding protein 4 overproduction in
RT   Staphylococcus aureus.";
RL   Microb. Drug Resist. 2:193-199(1996).
 CC   -!- SIMILARITY: Belongs to the ABC transporter superfamily.
CC   -----------------------------------------------------------------------
CC   Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms
CC   Distributed under the Creative Commons Attribution-NoDerivs License
CC   -----------------------------------------------------------------------
DR   EMBL; X91786; CAA62898.1; -; Genomic_DNA.
DR   ProteinModelPortal; P72354; -.
DR   SMR; P72354; 335-571.
DR   GO; GO:0016021; C:integral to membrane; IEA:InterPro.
DR   GO; GO:0005524; F:ATP binding; IEA:UniProtKB-KW.
DR   GO; GO:0042626; F:ATPase activity
DR   GO; GO:0006200; P:ATP catabolic process; IEA:GOC.
DR   InterPro; IPR003593; AAA+_ATPase.
DR   InterPro; IPR003439; ABC_transporter-like.
DR   InterPro; IPR017871; ABC_transporter_CS.
DR   InterPro; IPR017940; ABC_transporter_type1.
DR   InterPro; IPR001140; ABC_transptr_TM_dom.
DR   InterPro; IPR011527; ABC_transptrTM_dom_typ1.
DR   InterPro; IPR027417; P-loop_NTPase.
DR   Pfam; PF00664; ABC_membrane; 1.
DR   Pfam; PF00005; ABC_tran; 1.
DR   SMART; SM00382; AAA; 1.
DR   SUPFAM; SSF90123; ABC_TM_1; 1.
DR   SUPFAM; SSF52540; SSF52540; 1.
DR   PROSITE; PS50929; ABC_TM1F; 1.
DR   PROSITE; PS00211; ABC_TRANSPORTER_1; 1.
DR   PROSITE; PS50893; ABC_TRANSPORTER_2; 1.
PE   3: Inferred from homology;
KW   ATP-binding; Nucleotide-binding.
SQ   SEQUENCE   575 AA;  64028 MW;  F7E30A85971719B9 CRC64;
     MKRENPLFFL FKKLSWPVGL IVAAITISSL GSLSGLLVPL FTGRIVDKFS VSHINWNLIA
     LFGGIFVINA LLSGLGLYLL SKIGEKIIYA IRSVLWEHII QLKMPFFDKN ESGQLMSRLT
     DDTKVINEFI SQKLPNLLPS IVTLVGSLIM LFILDWKMTL LTFITIPIFV LIMIPLGRIM
     QKISTSTQSE IANFSGLLGR VLTEMRLVKI SNTERLELDN AHKNLNEIYK LGLKQAKIAA
     VVQPISGIVM LLTIAIILGF GALEIATGAI TAGTLIAMIF YVIQLSMPLI NLSTLVTDYK
     KAVGASSRIY EIMQEPIEPT EALEDSENVL IDDGVLSFEH VDFKYDVKKI LDDVSFQIPQ
     GQVSAFVGPS GSGKSTIFNL IERMYEIESG DIKYGLESVY DIPLSKWRRK IGYVMQSNSM
     MSGTIRDNIL YGINRHVSDE ELINYAKLAN CHDFIMQFDE GYDTLVGERG LKLSGGQRQR
     IDIARSFVKN PDILLLDEAT ANLDSESELK IQEALETLME GRTTIVIANR LSTIKKAGQI
     IFLDKGQVTG KGTHSELMAS HAKYKNFVVS QKLTD
//

Script part C:/wamp/bin/perl/bin/perl.exe

use strict;
use warnings;
use Data::Dumper;
use SWISS::Entry;
use Bio::Parse::SwissProt;
my $sp = Bio::Parse::SwissProt->new(FILE =>"me.txt")or die $!;

# Read in all the entries and fill %entries
my $entry_name =  $sp->entry_name( );
print "$entry_name\n";
my $seq_len = $sp->seq_len( );
print "$seq_len\n";
$refs = $sw->refs();
$refs = $sw->refs(TITLE => 1, AUTH => 1);
for my $i (0..$#{$refs}) {
    print "@{$refs->[$i]}\n";

OUTPUT should be like

[1]
  NUCLEOTIDE SEQUENCE.
  STRAIN=NCTC 8325;
  PubMed=8878592;
  Henze U.U., Berger-Bachi B.;
  "Penicillin-binding protein 4 overproduction increases beta-lactam
  resistance in Staphylococcus aureus.";
  Antimicrob. Agents Chemother. 40:2121-2125(1996).
[2]
  NUCLEOTIDE SEQUENCE.
  STRAIN=NCTC 8325;
  PubMed=9158759;
  Henze U.U., Roos M., Berger-Bachi B.;
  "Effects of penicillin-binding protein 4 overproduction in
  Staphylococcus aureus.";
  Microb. Drug Resist. 2:193-199(1996).
</code></pre>

Solution

  • After some searching on the internet, it appears that you are using SWISS::Entry from the Swissknife package, and it appears you (or someone) downloaded Bio::Parse::SwissProt as an independent project (not part of BioPerl) from sourceforge. I am not familiar with either of these projects, but you can get the information you want by simply using Bio::SeqIO from BioPerl. Here is an example to get the refs:

    #!usr/bin/env perl
    
    use strict;
    use warnings;
    use Bio::SeqIO;
    
    my $usage = "perl $0 swiss-file\n";
    my $infile = shift or die $usage;
    
    my $io = Bio::SeqIO->new(-file => $infile, -format => 'swiss');
    my $seqio = $io->next_seq;
    my $anno_collection = $seqio->annotation;
    
    for my $key ( $anno_collection->get_all_annotation_keys ) {
        my @annotations = $anno_collection->get_Annotations($key);
        for my $value ( @annotations ) {
            if ($value->tagname eq "reference") {
                my $hash_ref = $value->hash_tree;
                for my $key (keys %{$hash_ref}) {
                    print $key,": ",$hash_ref->{$key},"\n" if defined $hash_ref->{$key};
                }
            }
        }
    }
    

    Running this gives the information you wanted:

    authors: Henze U.U., Berger-Bachi B.
    location: Antimicrob. Agents Chemother. 40:2121-2125(1996).
    title: "Penicillin-binding protein 4 overproduction increases beta-lactam resistance in Staphylococcus aureus."
    pubmed: 8878592
    authors: Henze U.U., Roos M., Berger-Bachi B.
    location: Microb. Drug Resist. 2:193-199(1996).
    title: "Effects of penicillin-binding protein 4 overproduction in Staphylococcus aureus."
    pubmed: 9158759
    

    The BioPerl Feature Annotation HOWTO is a helpful page for parsing these types of files. If you want to fetch the entries and then parse them, you can use Bio::DB::Swissprot and add just a couple of lines of code to the above example. I know that is not an answer to your specific problem but it is a solution and you'll find that many people can help you with BioPerl.