I have this XML:
<RECEIPT receiptDate="2012-02-10T12:46:26.661Z" submissionFile="E.coli_ENT_WS.submission.xml" success="false">
<EXPERIMENT alias="ENT 23" status="PUBLIC"/>
<EXPERIMENT alias="WS 23" status="PUBLIC"/>
<RUN alias="ENT 23" status="PUBLIC"/>
<RUN alias="WS 23" status="PUBLIC"/><
SAMPLE alias="ENT 23" status="PUBLIC"/>
<SAMPLE alias="WS 23" status="PUBLIC"/>
<STUDY alias="ENT 23" status="PUBLIC"/>
<STUDY alias="WS 23" status="PUBLIC"/>
<SUBMISSION alias="E.coli_ENT_WS"/>
<MESSAGES>
<ERROR> In run(ENT 23), the FC018_s_6_sequence_L70.txt.md5 not found </ERROR>
<ERROR> In run(ENT 23) found the file format(Illumina_native_fastq), but requires SPOT_DESCRIPTOR information in the experiment(ENT 23)</ERROR>
<ERROR>The Illumina_native_fastq file format required gzip compression for submission.</ERROR>
<ERROR> FILE attribute quality_scoring_system is required</ERROR>
<ERROR>Same file FC018_s_6_sequence_L70.txt found in Run(WS 23) has been used with other Run</ERROR>
<ERROR> In run(WS 23), the FC018_s_6_sequence_L70.txt.md5 not found </ERROR>
<ERROR> In run(WS 23) found the file format(Illumina_native_fastq), but requires SPOT_DESCRIPTOR information in the experiment(WS 23)</ERROR>
<ERROR>The Illumina_native_fastq file format required gzip compression for submission.</ERROR>
<ERROR> FILE attribute quality_scoring_system is required</ERROR>
<INFO> VALIDATE action for the following XML: E.coli_ENT_WS.study.xml E.coli_ENT_WS.sample.xml E.coli_ENT_WS.experiment.xml E.coli_ENT_WS.run.xml </INFO>
<INFO>Inform_on_error is not filled in; auto populated from Submission account. </INFO>
<INFO>Number of files in drop box = 2 & Number of files in Submission = 1</INFO>
<INFO>Deprecated element ignored: CENTER_NAME</INFO>
<INFO>Deprecated element PROJECT_ID converted to RELATED_STUDY</INFO>
<INFO>Deprecated element ignored: CENTER_NAME</INFO>
<INFO>Deprecated element PROJECT_ID converted to RELATED_STUDY</INFO>
<INFO> SPOT_DESCRIPTOR is missing</INFO><INFO> SPOT_DESCRIPTOR is missing</INFO>
<INFO>Experiment (ENT 23) SPOTDESCRIPTOR is optional is null</INFO>
<INFO>Experiment (WS 23) SPOTDESCRIPTOR is optional is null</INFO>
<INFO> In run(ENT 23) file name (FC018_s_6_sequence_L70.txt) mentioned is not found among the submitted files</INFO>
<INFO> In run(WS 23) file name (FC018_s_6_sequence_L70.txt) mentioned is not found among the submitted files</INFO>
</MESSAGES>
<ACTIONS>VALIDATE</ACTIONS>
<ACTIONS>VALIDATE</ACTIONS>
<ACTIONS>VALIDATE</ACTIONS>
<ACTIONS>VALIDATE</ACTIONS>
<ACTIONS>HOLD</ACTIONS>
</RECEIPT>
I am able to retrieve all the element tags mainly EXPERIMENT
, ERROR
, INFO
, ACTION
, MESSAGE
.
What I would like to retrieve is the attributes from elements like EXPERIMENT
and RECEIPT
I am using Nokogiri for my parsing.
My code is like this:
@req_test = %x[curl -F "SUBMISSION=@xml/#{@experiment.alias}.submission.xml" -F "STUDY=@xml/#{@experiment.alias}.study.xml" -F "SAMPLE=@xml/#{@experiment.alias}.sample.xml" -F "RUN=@xml/#{@experiment.alias}.run.xml" -F "EXPERIMENT=@xml/#{@experiment.alias}.experiment.xml" https://www-test.ebi.ac.uk/ena/submit/drop-box/submit/]
@doc = Nokogiri::XML(@req_test)
# collecting all the errors
@expt = @doc.xpath("//ERROR")
# Collecting all the INFO
@info = @doc.xpath("//INFO")
That was my controller. My View is something just for display:
<h3>This is the ERRORS Collected</h3>
<% for expt in @expt %>
<ul>
<li><%= expt %><br \></li>
</ul>
<% end %>
<br \ >
<h3>This is the INFO Collected</h3>
<% for info in @info %>
<ul>
<li><%= info %><br \></li>
</ul>
<% end %>
and the application renders something like this:
This is the ERRORS Collected
In run(ENT 23), the FC018_s_6_sequence_L70.txt.md5 not found
In run(ENT 23) found the file format(Illumina_native_fastq), but requires SPOT_DESCRIPTOR information in the experiment(ENT 23)
The Illumina_native_fastq file format required gzip compression for submission.
FILE attribute quality_scoring_system is required
Same file FC018_s_6_sequence_L70.txt found in Run(WS 23) has been used with other Run
In run(WS 23), the FC018_s_6_sequence_L70.txt.md5 not found
In run(WS 23) found the file format(Illumina_native_fastq), but requires SPOT_DESCRIPTOR information in the experiment(WS 23)
The Illumina_native_fastq file format required gzip compression for submission.
FILE attribute quality_scoring_system is required
This is the INFO Collected
VALIDATE action for the following XML: E.coli_ENT_WS.study.xml E.coli_ENT_WS.sample.xml E.coli_ENT_WS.experiment.xml E.coli_ENT_WS.run.xml
Inform_on_error is not filled in; auto populated from Submission account.
Number of files in drop box = 2 & Number of files in Submission = 1
Deprecated element ignored: CENTER_NAME
Deprecated element PROJECT_ID converted to RELATED_STUDY
Deprecated element ignored: CENTER_NAME
Deprecated element PROJECT_ID converted to RELATED_STUDY
SPOT_DESCRIPTOR is missing
SPOT_DESCRIPTOR is missing
Experiment (ENT 23) SPOTDESCRIPTOR is optional is null
Experiment (WS 23) SPOTDESCRIPTOR is optional is null
In run(ENT 23) file name (FC018_s_6_sequence_L70.txt) mentioned is not found among the submitted files
In run(WS 23) file name (FC018_s_6_sequence_L70.txt) mentioned is not found among the submitted files
Please could someone suggest the retrieving method/option.
It's not clear to me what you are trying to do, or what your problem is. Below are a variety of answers that might help.
For any element you can use Nokogiri::XML::Node#attributes
to get a hash mapping the name of the node to a Nokogiri::XML::Attr
(which has a .value
you can read):
require 'nokogiri'
require 'erb'
template = <<ENDERB
<% unless @expts.empty? %>
<h3>Experiments</h3>
<ul><% for expt in @expts %>
<li><%= expt %><ul>
<% expt.attributes.each do |name,attr| %>
<li><%=name%> = <%=attr.value%></li>
<% end %>
</ul></li>
<% end %></ul>
<% end %>
ENDERB
doc = Nokogiri.XML(DATA)
@expts = doc.xpath("//EXPERIMENT")
puts ERB.new(template).result(binding).gsub(/^[ \t]*\n/,'')
#=> <h3>Experiments</h3>
#=> <ul>
#=> <li><EXPERIMENT alias="ENT 23" status="PUBLIC"/><ul>
#=> <li>alias = ENT 23</li>
#=> <li>status = PUBLIC</li>
#=> </ul></li>
#=> <li><EXPERIMENT alias="WS 23" status="PUBLIC"/><ul>
#=> <li>alias = WS 23</li>
#=> <li>status = PUBLIC</li>
#=> </ul></li>
#=> </ul>
Instead of attributes
(a Hash) you can also use .attribute_nodes
, which gives you a straight array of Attr
s (with a .name
and .value
each).
Alternatively, while iterating through your experiment elements you could use…
<%= expt['alias'] %>
…to extract the value of a known attribute (returning a string such as "ENT 23"
).
If you're trying to extract all the attributes on their own, you could also use…
@aliases = @doc.xpath('//@alias')
…if you wanted to get an array of just those attributes anywhere in the document (which have a .name
and .value
).
If you only want all the alias
attributes on a particular element (e.g. EXPERIMENT
) then you can use…
@expt_aliases = @doc.xpath('//EXPERIMENT/@alias')