I need to be able to read what is on each row and column, for example what is on row D3, but am unsure how I can do this. I know it makes a spreadsheet but what is the code or language this is written in and how can I learn to read specific info from this source?
I am under the impression it's xml and has the completed table that it creates but am still unable to make clear how I read what is on each row or column.
<wf:table h="85" w="405" range="A1:D5">
<wf:fmts>
<wf:bdrFmts>
<wf:bdrFmt style="solid"/>
<wf:bdrFmt style="double"/>
</wf:bdrFmts>
<wf:fillFmts>
<wf:fillFmt color="#0094ff"/>
</wf:fillFmts>
<wf:valFmts>
<wf:valFmt fmtStr="MMMM D, <new_line> YYYY" typ="dateTime"/>
<wf:valFmt typ="text"/>
<wf:valFmt outScl="6" typ="accounting" thouSep="true"/>
</wf:valFmts>
<wf:txtFmts>
<wf:txtFmt fontFamily="Arial"/>
<wf:txtFmt fontWeight="bold" textAlign="center" fontFamily="Arial"/>
<wf:txtFmt fontWeight="bold" fontFamily="Arial" color="#00cc00"/>
</wf:txtFmts>
<wf:condFmts/>
</wf:fmts>
<wf:cols>
<wf:col w="201" />
<wf:col gutter="3.35" w="100" />
<wf:col w="4" />
<wf:col gutter="3.35" w="100" />
</wf:cols>
<wf:rows>
<wf:row h="25">
<wf:c tFmt="1"/>
<wf:c formattedString="June 30, 
2016" tFmt="2" val="6/30/2016" vFmt="1" bFmt="0|. 0|0|1"/>
<wf:c tFmt="1"/>
<wf:c formattedString="December 31, 
2015" tFmt="2" val="12/31/2015" vFmt="1" bFmt="0|0|0|1"/>
</wf:row>
<wf:row h="15">
<wf:c formattedString="Debt Securities" tFmt="1" vFmt="2" val="Debt Securities"/>
<wf:c formattedString="1,000" tFmt="1" fFmt="1" val="1000" inScl="6" vFmt="3"/>
<wf:c tFmt="1"/>
<wf:c formattedString="1,200" tFmt="1" fFmt="1" val="1200" inScl="6" vFmt="3"/>
</wf:row>
<wf:row h="15">
<wf:c formattedString="Equities" tFmt="1" vFmt="2" val="Equities"/>
<wf:c formattedString="500" tFmt="1" val="500" inScl="6" vFmt="3"/>
<wf:c tFmt="1" />
<wf:c formattedString="600" tFmt="1" val="600" inScl="6" vFmt="3"/>
</wf:row>
<wf:row h="15">
<wf:c formattedString="Money Market Funds" tFmt="1" vFmt="2" val="Money Market Funds"/>
<wf:c formattedString="200" tFmt="1" fFmt="1" val="200" inScl="6" vFmt="3"/>
<wf:c tFmt="1"/>
<wf:c formattedString="200" tFmt="1" fFmt="1" val="200" inScl="6" vFmt="3"/>
</wf:row>
<wf:row h="15">
<wf:c formattedString="Total Cash Equivalents" tFmt="1" vFmt="2" val="Total Cash Equivalents"/>
C
<wf:c tFmt="1" />
<wf:c formattedString="2,000" tFmt="3" formula="SUM(D2:D4)" val="2000" inScl="6" vFmt="3" bFmt="0|0|1|2"/>
</wf:row>
</wf:rows>
</wf:table>
</wf:Worksheet>
</WFML>
The BeautifulSoup module in Python can easily traverse though any sort of XML-looking code like this.
After putting your code inside of a string I've named pagecode
, I ran this to extract what was in the fourth row, third column:
from bs4 import BeautifulSoup
soup = BeautifulSoup(pagecode, 'lxml')
rows = soup.find_all("wf:row")
cell = rows[3].find_all("wf:c")[2] # Indexing starts at 0, not 1!
print(cell) # Displays <wf:c tfmt="1"></wf:c>