Search code examples
powershellfileserver

File Age Reporting


I often find my answers on this site without, but on this occasion I need more personalized assistance. I hope someone can point me in the right direction.

I have been messing around with trying to draw a report off my NAS system to get statistics of the Age of Data and the Size of data so I can attempt to provide a Charge Back/Show back solution.

I have managed to do this mostly with Powershell using get-childitem and I have even trying to tap into .net using [System.IO.Directory]::EnumerateDirectories, and other commands. All these solutions work, but I seem to get really slow times getting this information, especially if I compare this to Jam TreeSize which fishes this information out fairly quickly.

to note, I have even tried multi-threading in powershell, thinking if I can collect the data from various points it would collect the whole data would be quicker, but I have had largely mix results.

I'm hoping someone else has tackled this sort of project before and managed to get a nice quick(er) way of doing this. I am even open to other languages tackling this.

Quick notes, I am doing this in powershell v5. I have also started learning a bit of python so if anyone has suggestion in that it would be a great place for me to learn.

Edit:

OK Here are some examples. Times: Treesize takes 10 seconds Powershell Get-ChildItem takes about 2minutes Powershell dotnet actions takes about 2 minutes

Number of objects counted 60 000 objects, size 120gb.

get-childitem with recurse will get you all file objects in a specified location including their attributes, such as last accessed time and size in bytes dotnet you need to use a combination of EnumerateFiles etc and then loop that with FileInfo which is get the file objects in the given location and then inspect their attributes respectively

In terms of multithreading I will point you to some links which I used, it will be too much to add in here, but I have tried creating a runspacepool, but I also tried manually running two separate runspaces to compare results and they were much the same. why I am obsessed with times, while the test directory I am using above is only 2 minutes, my NAS in some volumes has millions of files. The one test I did took an hour and a half to complete, and if I were to do that with other volumes it would take hours. I just want to find speeds closer to Treesize

Edit: I have marked robocopy workaround as the answer, however if you do have any suggestions on a different language and procedure please feel free to comment and it will be something I will look into in the future


Solution

  • I've been there, and to get what you want is... tricky, at least: TreeSize is reading the information directly from the MFT table, while Get-ChildItem is acting at a higher level, already in the OS. Therefore, the speed varies a lot.

    So if you want to speed up your report you really need to go under the hood and code something at lower levels.

    For me, even if it wasn't the fastest solution, I got a compromise and used robocopy /l /log:c:\mylog.txt (which doesn't copy a byte, and just logs the files to mylog.txt), and then I've parsed it. You can play with the multithreading option (/MT:[N], where N is 8 by default) to speed it up.

    What I find useful with this method is that, if I need further investigation, I've all the data I need in a file and therefore it'll be faster to query it. Static, not updated, but when you're talking about million of files, a photo of a certain moment is a good approach, I think.