long story short: I am making a database which includes all the quotations ever done in our company. Looking after particular file extension: *.prc One of the information I would like to retrieve is the owner of the file. I am using the following code (showing only part of it):
import os, time, win32security, subprocess
from threading import Thread
from time import time
def GET_THE_OWNER(FILENAME):
open (FILENAME, "r").close ()
sd = win32security.GetFileSecurity (FILENAME, win32security.OWNER_SECURITY_INFORMATION)
owner_sid = sd.GetSecurityDescriptorOwner ()
name, domain, type = win32security.LookupAccountSid (None, owner_sid)
return name
starttime = time()
path = "C:/Users/cbabycv/Documents/Python/0. Quotations/Example"
for root, dirs, files in os.walk(path):
for file in files:
if (file.endswith(".prc")):
#getting data from the file information
Filename = os.path.join(root,file)
try:
Owner = GET_THE_OWNER(Filename)
except:
Owner = "Could not get the owner."
print(Owner)
endtime = time()
print (Owner)
print(endtime-starttime, " sec")
The process is slow (especially when you have to read around 100.000 files). I wonder if there is another way to make it faster? Please note, I am asking for Windows OS not everything else ( I can not use os.stat() in this case - simply not works on windows) I have tried another way described here: how to find the owner of a file or directory in python By Paal Pedersen, but it is even slower than using windows Api
I am using os.walk() to find the files on the server. I do not have the exact location of the files, they could be in any folder (so I am just looking on each file in all folders/subfolders and see if it is a *.prc file). One suggested multiprocessing - many thanks :) I will try to optimize the whole code, but my question is still valid - is there faster/better way finding the owner of the file in Windows OS?
@theCreator Sugested to use powershell. Have tried that. It is approx. 14 times slower...
import os, subprocess
from pathlib import Path
from time import time
starttime = time()
def GET_THE_OWNER(cmd):
startupinfo = subprocess.STARTUPINFO()
startupinfo.dwFlags |= subprocess.STARTF_USESHOWWINDOW
completed = subprocess.run(["powershell.exe", "-Command", "Get-Acl ", cmd, " | Select-Object Owner"], capture_output=True, startupinfo=startupinfo)
return completed
path = Path('C:/Users/cbabycv/Documents/Python/0. Quotations/Example')
for root, dirs, files in os.walk(path):
for file in files:
if (file.endswith(".prc")):
#getting data from the file information
Filename = os.path.join(root,file)
Filename = "\"" + Filename +"\""
Owner = GET_THE_OWNER(Filename)
if Owner.returncode != 0:
print("An error occured: %s", Owner.stderr)
else:
print(Owner.stdout)
endtime = time()
print(endtime-starttime, " sec")
It's useful in cases like this to run the code through a profiler:
> python3 -m cProfile -s cumtime owners.py
1.251999855041504 sec
163705 function calls (158824 primitive calls) in 1.263 seconds
Ordered by: cumulative time
ncalls tottime percall cumtime percall filename:lineno(function)
5/1 0.000 0.000 1.263 1.263 {built-in method builtins.exec}
1 0.019 0.019 1.263 1.263 owners.py:1(<module>)
4999 0.024 0.000 1.058 0.000 owners.py:6(GET_THE_OWNER)
4999 0.423 0.000 0.423 0.000 {built-in method win32security.LookupAccountSid}
4999 0.264 0.000 0.280 0.000 {built-in method io.open}
4999 0.262 0.000 0.262 0.000 {built-in method win32security.GetFileSecurity}
5778/938 0.011 0.000 0.130 0.000 os.py:280(walk)
...
There's some here that can't be helped, but the calls to LookupAccountSid and io.open can be helped. The SIDs don't change, and no doubt you have a fairly small list of SIDs to use compared to the list of files. I'm actually not sure why you're opening the file and closing it, but that alone is taking considerable time:
_owner_sid_cache = {}
def GET_THE_OWNER(FILENAME):
# open (FILENAME, "r").close ()
sd = win32security.GetFileSecurity (FILENAME, win32security.OWNER_SECURITY_INFORMATION)
owner_sid = sd.GetSecurityDescriptorOwner ()
if str(owner_sid) not in _owner_sid_cache:
name, _domain, _type = win32security.LookupAccountSid (None, owner_sid)
_owner_sid_cache[str(owner_sid)] = name
return _owner_sid_cache[str(owner_sid)]
Between using this version of the function, and outputting data to a file instead of the relatively slow console, the time was dropped the time from 252 seconds to 5 seconds on a test folder on my local machine with 60,000 files.