I want to find colored cells by iterating the rows.
I know how to iterate columns, but not rows.
In pandas dataframe, it would be
for i in range(0, len(df.index), 1):
print(df.loc[i, 1])
But StyleFrame does not have loc
. How can I iterate it and check colored cell?
Thanks!
StyleFrame maintainer here... that was a fun one...
The problem here consists of 2 parts:
The default behavior of Pandas' read_excel
(which StyleFrame uses) is to skip blank rows. Thankfully it accepts a keyword argument to change this behavior, and StyleFrame's read_excel
passes every keyword argument it does not know to Pandas. Hence this problem can be fixed pretty easily:
StyleFrame.read_excel('test.xlsx', read_style=True, skip_blank_lines=False)
The second issue/problematic behavior is that openpyxl sets the background color of cells whose background wasn't changed (or has been set to "automatic" or "no fill", depending on your spreadsheet software) to "000000", which is black.
This introduced a bug in StyleFrame which, to be honest, I'm not sure how didn't surface until now.
To fix this issue you can make a small change in StyleFrame's code. Line #111 in styler.py (assuming your are using version 2.0.5 of StyleFrame) should be changed from
bg_color = openpyxl_style.fill.fgColor.rgb
to
bg_color = openpyxl_style.fill.fgColor.rgb if openpyxl_style.fill.patternType is not None else utils.colors.white
This change will be included in the next version.
Then, after fixing both issues above, fixing your actual problem becomes relatively easy (although not as easy as I'd have wished it to be):
from StyleFrame import StyleFrame, utils
def only_cells_with_colored_background(cell):
return cell if cell.style.bg_color not in {utils.colors.white, 'FFFFFFFF'} else np.nan
sf = StyleFrame.read_excel('test.xlsx', read_style=True, skip_blank_lines=False)
sf = StyleFrame(sf.applymap(only_cells_with_colored_background).dropna(axis=(0, 1),
how='all'))
print(sf)
Will output
Unnamed: 0
2 a
3 b
4 c
9 a
10 b
11 c
I'm planning on implementing a .style
accessor in a future version, so hopefully the above example will be as simple as
sf = StyleFrame.read_excel('test.xlsx', read_style=True, skip_blank_lines=False)
sf = sf.loc[~(sf['Unnamed: 0'].style.bg_color == utils.colors.white)]