I have 50K xml files (zml technically, but behave the same), each with metadata for a single file. I need to combine that metadata into a single SQL table so I can manipulate it there. The xml files have varying metadata tags, with some overlapping and some not. How can I get from 50K xml to a single SQL table?
I've looked at Powershell and can get part of the data from a single file parsed out, but haven't figured out writing it into a csv yet.
I've looked at XML data maps in Excel but it only seems to allow the import of a single xml file (1 row of data essentially).
I've also used a co-worker's script to import a single file's worth into SQL, but it puts the whole SQL into a single field, and would then need to be parsed out.
Looking for the best, most efficient path forward, or other ideas.
I have also considered combining all the individual xml files first, but there isn't anything in the file that would identify which file its from - so no way to link it back to the file its describing.
Figured it out! Used powershell to parse out each section.
##File with list of paths to parse
$CSV = Import-Csv "C:\Folder\AllZMLFiles.csv"
foreach($LINE in $CSV)
{
$Path = $LINE.filpath
[xml]$xmlFile = Get-Content -Path $Path
## For each XML section, list out all objects
$xmlFile.document.File | ForEach-Object {
## output the result object
[pscustomobject]@{
Source = $Path
DateCreated = $_.DateCreated
DateModified = $_.DateModified
JustUploaded = $_.JustUploaded
OrigFilename = $_.OrigFilename
}
} | Export-Csv -Path C:\Folder\File.csv -NoTypeInformation -Append ##Append to CSV
[xml]$xmlFile = Get-Content -Path $Path
$xmlFile.document.Fields | ForEach-Object {
## output the result object
[pscustomobject]@{
Source = $Path
Producer = $_.Producer
ProcessDate = $_.ProcessDate
Version = $_.Version
OcrEngine = $_.OcrEngine
secure = $_.secure
LABEL = $_.LABEL
Numero_Contrat = $_.Numero_Contrat
Date_du_contrat = $_.Date_du_contrat
Fin_du_contrat = $_.Fin_du_contrat
Duree = $_.Duree
Type_Contrat = $_.Type_Contrat
Artiste = $_.Artiste
Nom_civil_artiste = $_.Nom_civil_artiste
Contractant = $_.Contractant
Type_Document = $_.Type_Document
Objet = $_.Objet
Titre_phono_video = $_.Titre_phono_video
Commentaires = $_.Commentaires
Statut = $_.Statut
Conteneur = $_.Conteneur
Boite = $_.Boite
Origine = $_.Origine
OrigFilename = $_.OrigFilename
Type = $_.Type
}
} | Export-Csv -Path C:\Folder\Fields.csv -NoTypeInformation -Append
[xml]$xmldtInfos = Get-Content -Path $Path
$xmldtInfos.document.dtInfos | ForEach-Object {
## output the result object
[pscustomobject]@{
Source = $Path
ID = $_.ID
TypeID = $_.TypeID
WordCount = $_.WordCount
}
} | Export-Csv -Path C:\ACG\Zedoc\dtInfos.csv -NoTypeInformation -Append
}