I have a conceptual question. I am new to Python and I am looking to do task that involves processing bigger log files. Some of these can get up to 5 and 6GB
I need to parse through many files in a location. These are text files.
I know of the with open() method, and recently just ran into pathlib. So I need to not only read the file line by line to extract values to upload into a DB, i also need to get file properties that Pathlib gives you and upload them as well.
Is it faster to use with open and underneath it, call a path object from which to read files... something like this:
for filename in glob('**/*.*', recursive=False):
fpath = Path(filename)
with open(filename, 'rb', buffering=102400) as logfile:
for line in logfile:
#regex operation
print(line)
Or would it be better to use Pathlib:
with Path("src/module.py") as f:
contents = open(f, "r")
for line in contents:
#regex operation
print(line)
Also since I've never used Pathlib to open files for reading. When it comes to this: Path.open(mode=’r’, buffering=-1, encoding=None, errors=None, newline=None)
What does newline and errors mean? I assume buffering here is the same as buffering in the with open function?
I also saw this contraption that uses with open in conjuction with Path object though how it works, I have no idea:
path = Path('.editorconfig')
with open(path, mode='wt') as config:
config.write('# config goes here')
pathlib
is intended to be a more elegant solution to interacting with the file system, but it's not necessary. It'll add a small amount of fixed overhead (since it wraps other lower level APIs), but shouldn't change how performance scales in any meaningful way.
Since, as noted, pathlib
is largely a wrapper around lower level APIs, you should know Path.open
is implemented in terms of open
, and the arguments all mean the same thing for both; reading the docs for the built-in open
will describe the arguments.
As for the last bit of your question (passing a Path
object to the built-in open
), that works because most file-related APIs were updated to support any object that implements the os.PathLike
ABC.