Search code examples
excelvbaparsingpowerquerym

Is there a simple way to parse comma separated Key:Value pairs in Excel, Power Query or VBA if the values contain unescaped commas?


I'm working with a CSV export of identity data containing ~22000 records. One of the fields included is titled 'ExtendedAttributes' and each cell in the column contains a quote bound string of comma separated Key:Value pairs. Each record in the file has an arbitrary number of extended attributes (up to around 50). My ultimate objective is to expand these extended attributes into their own columns in Excel (2016). I already have solutions for the expansions into columns from other data using formulae, simple VBA and most recently Power Query based approaches.

However, my previous solutions have all been based on the Key:Value pairs being simple to delimit. In this export, the ExtendedAttributes field has:

  1. Value data that may contain unescaped/unquoted commas. e.g.

    "Key1: Value1, name: surname, forename, Key2: Value2, ... "

  2. Keys that may contain multiple comma separated values, which are also unquoted/unescaped. e.g.

    "Key1: Value1, emailAlias: alias1@domain, alias2@domain, alias3@domain, Key2: Value2, ... "

My usual approach to this, where Key:Value pairs don't have these problems would be to delimit using commas to break it into the key value pairs, transpose the data into rows, then delimit using the colon to populate my new columns and their values as described here in the PowerBI community pages

This doesn't work here because delimiting using a comma breaks the values.

Is there a straightforward way to parse this into the constituent Key:Value pairs using (ideally) Power Query? Happy to also go with VBA or formula based solutions.

My instinctive approach would be to try and identify substrings containing a colon and prepend them with a unique character, which can then be used as a delimiter. (It's not impossible that the data may also include unescaped colons, but I'm happy to assume that it doesn't) But recognise that this may be a needlessly complex approach and I'm unsure how best to do it.

I'm happy to keep values with multiple comma separated items as a single unit (A problem for me to deal with later).

For the example data:

"Key1: Value1, name: surname, forename, emailAlias: alias1@domain, alias2@domain, alias3@domain, Key2: Value2, ... "

I'd like to end up with something that lets me treat the data like this, using maybe a ! as an example unique character that I could then use as a delimiter:

"Key1: Value1!name: surname, forename!emailAlias: alias1@domain, alias2@domain, alias3@domain!Key2: Value2!..."

I don't have access to the original data (vendor controlled system) and have limited data processing tools on my corporate desktop (Excel 2016, VBA, PQ). Appreciate any help.


Solution

  • In Power Query, you can define a function Partition as follows:

    let
        Output = (str as text, sep as text) as text =>
        Text.RemoveRange(
            Text.Replace(
                Text.Combine(
                    List.Transform(
                        Text.Split(str, " "),
                        each if Text.Contains(_, ":") then sep & _ else _
                    ),
                    " "
                ), ", " & sep, sep
            ),
        0, Text.Length(sep)
        )
    in
        Output
    

    Example text transformation using separator !

    Starting text:

    Key1: Value1, name: surname, forename, emailAlias: alias1@domain
    
    1. Split the string based on spaces into a list
    Key1:
    Value1,
    name:
    surname,
    forename,
    emailAlias:
    alias1@domain
    
    1. Prepend any list items containing : with separator !
    !Key1:
    Value1,
    !name:
    surname,
    forename,
    !emailAlias:
    alias1@domain
    
    1. Combine the list back into a string
    !Key1: Value1, !name: surname, forename, !emailAlias: alias1@domain
    
    1. Replace , ! with !
    !Key1: Value1!name: surname, forename!emailAlias: alias1@domain
    
    1. Remove the first separator
    Key1: Value1!name: surname, forename!emailAlias: alias1@domain
    

    Once you have this function defined, you can call it in a column transformation that would look something like

    = Table.TransformColumns(#"Prev Step", {{"ColName", each Partition(_,"!") , type text}})