I have successfully implemented a Directory Structure to XML script, as shown here.
function GenerateLibraryMap ($path) {
function ProcessChildNode {
param (
$parentNode,
$childPath
)
$dirInfo = [System.IO.DirectoryInfo]::New($childPath)
foreach ($directory in $dirInfo.GetDirectories()) {
$childNode = $xmlDoc.CreateElement('folder')
$childNode.SetAttribute('name', $directory.Name) > $null
$parentNode.AppendChild($childNode) > $null
ProcessChildNode -parentNode:$childNode -childPath:"$childPath\$($directory.Name)"
}
foreach ($file in $dirInfo.GetFiles()) {
$childNode = $xmlDoc.CreateElement('file')
$childNode.SetAttribute('name', $file.Name) > $null
$childNode.SetAttribute('size', $file.Length) > $null
$childNode.SetAttribute('hash', (Get-FileHash -Path:$file.FullName -Algorithm:MD5).Hash) > $null
$parentNode.AppendChild($childNode) > $null
}
}
$xmlDoc = [XML]::New()
$xmlDoc.AppendChild($xmlDoc.CreateProcessingInstruction('xml', 'version="1.0"')) > $null
$rootNode = $xmlDoc.CreateElement('rootDirectory')
$rootNode.SetAttribute('path', $path) > $null
$xmlDoc.AppendChild($rootNode) > $null
ProcessChildNode -parentNode:$rootNode -childPath:$path
$xmlDoc.Save("$path\Tree.xml") > $null
Write-Host "$path\Tree.xml"
}
Measure-Command {
GenerateLibraryMap 'C:\assets\Revit\2020'
}
This works great, but takes upwards of 2 minutes on the file structure I am testing against, which is likely only 20% of my actual data. So I am looking at refactoring to use an XML Stream, as I understand that will likely be MUCH faster. I found this reference to get me started, but it only includes the root node, and no mention of how generating a hierarchy works. It seems almost as if you need to keep track of the hierarchy so you can .WriteEndElement
on each node appropriately. But that makes my simple recursion break down. I THINK I would need to simply .WriteEndElement
after the ProcessChildNode
at line 12, but I am not sure.
So, I guess I have two questions...
1: Before I go down this rabbit hole, is this going to result in noticeably faster code? Especially when dealing with thousands of files across scores of sub folders? And...
2: Can someone point me to a resource, or provide an example, of how to deal with the recursion issue? I have a feeling that is going to be a lot of banging my head on the wall otherwise.
OK, really three questions...
3: Once I have this working, I plan to refactor to a class, both for the performance boost and as an exercise since I am learning OOP. Are there any gotchas I need to look out for when I start down that next rabbit hole?
EDIT: With Mathias's response, and some digging, I arrived at this.
function GenerateLibraryMap {
param (
[String]$path
)
function ProcessChildNode {
param (
[String]$childPath
)
$dirInfo = [System.IO.DirectoryInfo]::New($childPath)
foreach ($directory in $dirInfo.GetDirectories()) {
$xmlDoc.WriteStartElement('folder')
$xmlDoc.WriteAttributeString('name', $directory.Name)
ProcessChildNode -childPath:"$childPath\$($directory.Name)"
}
foreach ($file in $dirInfo.GetFiles()) {
$xmlDoc.WriteStartElement('file')
$xmlDoc.WriteAttributeString('name', $file.Name)
$xmlDoc.WriteAttributeString('size', $file.Length)
$xmlDoc.WriteAttributeString('hash', (Get-FileHash -Path:$file.FullName -Algorithm:MD5).Hash)
$xmlDoc.WriteEndElement()
}
$xmlDoc.WriteEndElement()
}
$mapFilePath = "$(Split-Path $path -parent)\Tree_Stream.xml"
$xmlSettings = [System.XMl.XmlWriterSettings]::New()
$fileStream = [System.IO.FileStream]::New($mapFilePath, [System.IO.FileMode]::Append, [System.IO.FileAccess]::Write, [System.IO.FileShare]::Read)
$streamWriter = [System.IO.StreamWriter]::New($fileStream)
$xmlSettings.Indent = $true
$xmlSettings.IndentChars = ' '
$xmlSettings.ConformanceLevel = 'Auto'
$xmlDoc = [System.XMl.XmlTextWriter]::Create($fileStream, $xmlSettings)
$xmlDoc.WriteStartDocument()
$xmlDoc.WriteStartElement('rootDirectory')
$xmlDoc.WriteAttributeString('path', $path)
ProcessChildNode -childPath:$path
$xmlDoc.WriteEndElement
$xmlDoc.WriteEndDocument
$xmlDoc.Finalize
$xmlDoc.Flush
$xmlDoc.Close()
Write-Host $mapFilePath
}
CLS
Measure-Command {
GenerateLibraryMap 'C:\assets\Revit\2020'
}
I tried it both with the [System.IO.FileStream]
and instantiating $xmlDoc
directly with the file path (still not 100% sure I understand the difference). In any case, all three approaches are within just a few seconds of each other, right around 2 minutes. So it seems in this case there's no meaningful difference. If anyone sees some opportunity to increase performance I am all ears, but for now I will move ahead with the refactor to classes.
EDIT #2: Well, I implemented the Class based approach like this...
class GenerateLibraryMap {
# Properties
[XML.XMLDocument]$XML = [XML]::New()
[String]$MapFilePath
# Constructor
GenerateLibraryMap ([String]$path) {
$this.MapFilePath = "$(Split-Path $path -parent)\Tree_Class.xml"
$this.XML.AppendChild($this.XML.CreateProcessingInstruction('xml', 'version="1.0"')) > $null
$rootNode = $this.XML.CreateElement('rootDirectory')
$rootNode.SetAttribute('path', $path) > $null
$this.XML.AppendChild($rootNode) > $null
$this.ProcessChildNode($rootNode, $path)
$this.XML.Save($this.MapFilePath)
}
# Method
[Void] ProcessChildNode([XML.XMLElement]$parentNode, [String]$childPath) {
$dirInfo = [System.IO.DirectoryInfo]::New($childPath)
foreach ($directory in $dirInfo.GetDirectories()) {
$childNode = $this.XML.CreateElement('folder')
$childNode.SetAttribute('name', $directory.Name)
$parentNode.AppendChild($childNode)
$this.ProcessChildNode($childNode, "$childPath\$($directory.Name)")
}
foreach ($file in $dirInfo.GetFiles()) {
$childNode = $this.XML.CreateElement('file')
$childNode.SetAttribute('name', $file.Name)
$childNode.SetAttribute('size', $file.Length)
$childNode.SetAttribute('hash', (Get-FileHash -Path:$file.FullName -Algorithm:MD5).Hash)
$parentNode.AppendChild($childNode)
}
}
}
Measure-Command {
$xml = [GenerateLibraryMap]::New('C:\assets\Revit\2020')
}
Write-Host "$($xml.MapFilePath)"
Takes EXACTLY the same amount of time. But, educational in any case. It does look like the Stream based version is slightly more memory efficient. Hopefully someone finds the results useful.
is this going to result in noticeably faster code?
Maybe. The easiest way of finding out (without actually profiling your current approach) is to just go ahead and do it, then compare the results :)
how to deal with the recursion issue?
Easy!
Follow this rule:
function Recurse
{
WriteStartElement
if($shouldRecurse){
Recurse
}
WriteEndElement
}
As long as you stick to this form, you'll be fine.
I plan to refactor to a class, both for the performance boost and as an exercise since I am learning OOP. Are there any gotchas I need to look out for when I start down that next rabbit hole?
Probably, yes?
Again, the easiest way find out is to go ahead and just do it - StackOverflow is still gonna be here if and when you bump into a wall :)