Search code examples
jsonpowershellavro

Convert avro file to json with powershell


I'm capturing data in avro via Azure EventHub so it can be handled via a Azure Automation Runbook. The issue I'm facing is when I try to parse the avro file to be used in a powershell script.

Here the content of the file. enter image description here

    Objavro.codecnullavro.schemaì{"type":"record","name":"EventData","namespace":"Microsoft.ServiceBus.Messaging","fields":[{"name":"SequenceNumber","type":"long"},{"name":"Offset","type":"string"},{"name":"EnqueuedTimeUtc","type":"string"},{"name":"SystemProperties","type":{"type":"map","values":["long","double","string","bytes"]}},{"name":"Properties","type":{"type":"map","values":["long","double","string","bytes","null"]}},{"name":"Body","type":["null","bytes"]}]} Êg;Ï7ˆ@žkß6Þ¿¼¬.

16904(6/25/2024 5:50:16 AM&x-opt-enqueued-time ¸–߉d  ²{"value":[{"subscriptionId":"5exxxxa10333","subscriptionExpirationDateTime":"2024-06-28T00:23:31.567249+00:00","changeType":"created","resource":"Users/2ffxxxx81-a8d0xxxx83c/Messages/AAMkAxxxxxB7VrlAAA=","resourceData":{"@odata.type":"#Microsoft.Graph.Message","@odata.id":"Users/2ffxxxx-9f81-a8d****83c/Messages/AAMkAGY4xxxxxAAKB7VrlAAA=","@odata.etag":"W/\"CQAAABYAAxxxxxA38cxAAKB/4Iw\"","id":"AAMkAxxxxxxlAAA="},"clientState":"MlaxxxxyJ.MF7kmc3z","tenantId":"e1fxxxxxc83"}]}Êg;Ï7ˆ@žkß6Þ¿¼

I tried to parse the file with the pwsh module AvroTools but the read-avro cmdlet returns a PSCustom Object that I can't use:

Fields        : {SequenceNumber, Offset, EnqueuedTimeUtc, SystemProperties…}
Count         : 6
SchemaName    : Microsoft.ServiceBus.Messaging.EventData
Name          : EventData
Namespace     : Microsoft.ServiceBus.Messaging
Fullname      : Microsoft.ServiceBus.Messaging.EventData
Documentation :
Tag           : Record

When checking the Fields, below the content:

Aliases       :
Pos           : 0
Documentation :
DefaultValue  :
Ordering      : ignore
Schema        : {"type":"long"}
Name          : SequenceNumber

Aliases       :
Pos           : 1
Documentation :
DefaultValue  :
Ordering      : ignore
Schema        : {"type":"string"}
Name          : Offset

Aliases       :
Pos           : 2
Documentation :
DefaultValue  :
Ordering      : ignore
Schema        : {"type":"string"}
Name          : EnqueuedTimeUtc

Aliases       :
Pos           : 3
Documentation :
DefaultValue  :
Ordering      : ignore
Schema        : {"type":"map","values":["long","double","string","bytes"]}
Name          : SystemProperties

Aliases       :
Pos           : 4
Documentation :
DefaultValue  :
Ordering      : ignore
Schema        : {"type":"map","values":["long","double","string","bytes","null"]}
Name          : Properties

Aliases       :
Pos           : 5
Documentation :
DefaultValue  :
Ordering      : ignore
Schema        : ["null","bytes"]
Name          : Body

I would like to retrieve the content as json like:

[{"subscriptionId":"5e6eec86-xxxxxxxxxxxxx-f7xxxx333","subscriptionExpirationDateTime":"2024-06-28T00:23:31.567249+00:00","changeType":"created","resource":"Users/2ff83-fxxxxxxxxxxxx-a8d0afceb83c/Messages/AAMkAGY4ZTZlNTI1xxxxxxxxxxxxxxxxxxxtRXSbFa_gJA38cxAAKB7VrlAAA=","resourceData":{"@odata.type":"#Microsoft.Graph.Message","@odata.id":"Users/2ff-xxxxxxxxxxxx81-a8d0afceb83c/Messages/AAMkAGY4ZxxxxxxxxxxxxxxxxxxxxxxxxCeBfyMDtRXSbFa_gJA38cxAAKB7VrlAAA=","@odata.etag":"W/\"CQAAABYAAACeBfyMDtRXSbFa+gJA38cxAAKB/4Iw\"","id":"AAMkAGY4ZTZxxxxxxxxxxxxxxxxxxxxxxxxbFa_gJA38cxAAKB7VrlAAA="},"clientState":"Mla8Q~xxxxxxxxxxxxxxxxxxxxxxxMF7kmc3z","tenantId":"e1f8axxxxxxxxxxxxxxxxxxxxc83"}]

Solution

  • It looks like you're using Read-AvroSchema rather than Read-Avro to parse your file (both cmdlets are part of the AvroTools module that can be installed with, e.g., Install-Module -Scope CurrentUser AvroTools):

    • Read-AvroSchema reads metadata (structural data, like a database schema) from an Avro Object Container File (*.avro), whereas
    • Read-Avro reads data, in the form of [pscustomobject] instances (PowerShell's type for dynamically created "property bag" objects), whose properties reflect the fields in the input Avro data.

    Without having access to the source data in its original form (what is quoted in your question isn't valid as the content of an *.avro file), I think what will work is:

    # Extract the value of the "Body" field, assumed to be the JSON of interest
    # as bytes representing UTF-8 encoding.
    [System.Text.Encoding]::UTF8.GetString(
      (Read-Avro yourFile.avro).Body
    )
    

    Note:

    • I'm assuming that the Body field contains the JSON text of interest.

    • Since the type information lists of this field lists ["null", "bytes"], I'm assuming that the value is reported as a byte array.

    • I'm further assuming that the bytes represent the UTF-8 encoding of the JSON text; adjust as needed for a different character encoding.