I'm interested in reading fixed width text files in Python in as efficient a manner as I can. Specifically, most of the time I'm interested in one or more columns in the flat file but not entire records.
It strikes me as inefficient to read the file a line at a time and extract the desired columns after reading the entire line into memory. I think I'd rather have the option of reading only the desired columns, top to bottom, left to right (instead of reading left to right, top to bottom).
Is such a thing desirable, and if so, is it possible?
Files are laid out as a (one-dimensional) sequence of bits. 'Lines' are just a convenience we added to make things easy to read for humans. So, in general, what you're asking is not possible on plain files. To pull this off, you would need some way of finding where a record starts. The two most common ways are:
seek
, to go directly to where you need to go. This avoids reading the entire file, but is painful to do manually.I wouldn't worry too much about file reading performance unless it becomes a problem. Yes, you could memory map the file, but your OS probably already caches for you. Yes, you could use a database format (e.g., the sqlite3 file format through sqlalchemy), but it probably isn't worth the hassle.
Side note on "fixed width:" What precisely do you mean by this? If you really mean 'every column always starts at the same offset relative to the start of the record' then you can definitely use Python's seek
to skip past data that you are not interested in.