Imagine a text file that looks like this:
Places Person Number Comments
bar anastasia 75 very lazy
home jimmy nothing to say
beach 2
Consider the first line as the header containing the names of the columns I want for my pandas data frame. You can see that there are empty cells. And there is a column that has strings with spaces. There is a readable patter in this file, columns are separated by at least 2 spaces and the information of each column can be red from the end point of the column name to the end point of the previous column name basically. There is no ambiguity in this.
If I do
df = pd.read_csv('text_file.txt')
I will get a 3 x 1 data frame where the only column gets called "Places Person Number Comments"
. So it fails to understand the table format.
If I do
df = pd.read_csv('text_file.txt', delim_whitespace = True)
It will create the good number of columns but won't be able to understand the spaces in the values in Comments
and will split the comments and send it to different cells, like so:
Places Person Number Comments
bar anastasia 75 very lazy
home jimmy nothing to say
beach 2 NaN NaN NaN
If I do
df = pd.read_csv('text_file.txt', sep = '\s{2,}', engine = 'python')
It will understand that only if there are two or more spaces it can be considered part of another column. So that's correct. But it won't be able to understand that there are empty cells and will wrongly displace cells from one column to another.
Places Person Number Comments
0 bar anastasia 75 very lazy
1 home jimmy nothing to say None
2 beach 2 None None
At this point I don't know what to do. Is there an elegant way to do this in Pandas?
You can use pd.read_fwf()
to read your file, which is a file of fixed-width formatted lines, into DataFrame.
df = pd.read_fwf('text_file.txt')
I use StringIO as demo. You can use your actual file name as parameter to the function call.
text = """
Places Person Number Comments
bar anastasia 75 very lazy
home jimmy nothing to say
beach 2
"""
from io import StringIO
df = pd.read_fwf(StringIO(text))
print(df)
Places Person Number Comments
0 bar anastasia 75.0 very lazy
1 home jimmy NaN nothing to say
2 beach NaN 2.0 NaN