I have a table as below for which I want to export the text OR the src to a *.csv
file.
<table class="GridView plm-table" id="pageLayout_projectTeamMembersGridView_gridView">
<tbody>
<tr id="pageLayout_projectTeamMembersGridView_gridView_headerRow" class="GridViewHeaderRow">
<th class="GridViewHeader" scope="col">A</th>
<th class="GridViewHeader" scope="col">B</th>
<th class="GridViewHeader" scope="col">C</th>
<th class="GridViewHeader" scope="col">D</th>
<th class="GridViewHeader" scope="col">E</th>
<th class="GridViewHeader" scope="col">F</th>
<th class="GridViewHeader" scope="col">G</th>
</tr>
<tr id="pageLayout_projectTeamMembersGridView_DataRow0" class="GridViewRow">
<td class="GridViewCell" align="right"><input type="checkbox" name="ss" value="zz"></td>
<td class="GridViewCell"><img class="Icon" src="../../Images/1.png" style="border-width:0px;"></td>
<td class="GridViewCell">John</td>
<td class="GridViewCell"><img id="Image0_IDcon" src="../../Images/0.png"></td>
<td class="GridViewCell"><img id="Image1_IDcon" src="../../Images/1.png"></td>
<td class="GridViewCell"><img id="Image1_IDcon" src="../../Images/1.png"></td>
<td class="GridViewCell"><img id="Image0_IDcon" src="../../Images/0.png"></td>
</tr>
<tr id="pageLayout_projectTeamMembersGridView_DataRow1" class="GridViewRow">
<td class="GridViewCell" align="right"><input type="checkbox" name="ss" value="zz"></td>
<td class="GridViewCell"><img class="Icon" src="../../Images/1.png" style="border-width:0px;"></td>
<td class="GridViewCell">Steve</td>
<td class="GridViewCell"><img id="Image1_IDcon" src="../../Images/1.png"></td>
<td class="GridViewCell"><img id="Image1_IDcon" src="../../Images/1.png"></td>
<td class="GridViewCell"><img id="Image0_IDcon" src="../../Images/0.png"></td>
<td class="GridViewCell"><img id="Image0_IDcon" src="../../Images/0.png"></td>
</tr>
<tr id="pageLayout_projectTeamMembersGridView_DataRow2" class="GridViewRow">
<td class="GridViewCell" align="right"><input type="checkbox" name="ss" value="zz"></td>
<td class="GridViewCell"><img class="Icon" src="../../Images/1.png" style="border-width:0px;"></td>
<td class="GridViewCell">Mary</td>
<td class="GridViewCell"><img id="Image0_IDcon" src="../../Images/0.png"></td>
<td class="GridViewCell"><img id="Image1_IDcon" src="../../Images/1.png"></td>
<td class="GridViewCell"><img id="Image1_IDcon" src="../../Images/1.png"></td>
<td class="GridViewCell"><img id="Image0_IDcon" src="../../Images/0.png"></td>
</tr>
</tbody>
</table>
What I have done so far is:
table1 = soup.find('table', id = 'pageLayout_projectTeamMembersGrdView_gridView')
headers = []
for i in table1.find_all('th'):
title = i.text.strip()
headers.append(title)
df = pd.DataFrame(columns = headers)
for row in table1.find_all('tr')[1:]:
data = row.find_all('td')
row_data = [td.text.strip() for td in data]
length = len(df)
df.loc[length] = row_data
df.to_csv('Export.csv', index=False)
print("CSV created!")
I'm getting the text value in the 3rd Column (C
) but how can I get the src value as "0.png"
or "1.png"
in the corresponding columns (A
, B
, D
, E
and F
)?
This is what I get:
This is what I want:
The problem in the following code
data = row.find_all('td')
row_data = [td.text.strip() for td in data]
length = len(df)
df.loc[length] = row_data
is that td can have a text element, an img or some other element and you're not checking that.
You can do something like
for row in table1.find_all('tr')[1:]:
data = row.find_all('td')
row_data = []
for td in data:
if (td.find("img")):
row_data.append(td.img.attrs.get('src').split("/")[-1])
else:
row_data.append(td.text)
length = len(df)
df.loc[length] = row_data
This will output
A,B,C,D,E,F,G
,1.png,John,0.png,1.png,1.png,0.png
,1.png,Steve,1.png,1.png,0.png,0.png
,1.png,Mary,0.png,1.png,1.png,0.png
And A column is empty as expected since it only contains input type. But you can probably handle that case as well.