Search code examples
jq

calculate consecutive numeric values, but it fails because of floating point numbers jq command



With a Json field like the one below, if eventId is a continuous value, I would like to add the value of the message field.

However, even if eventId is continuous, if the message field starts with # Time, the message field is separated.

This is the original field of Json.

{
    "events": [
        {
            "message": "# Time: 1",
            "eventId": "38636469249093328935961608873790523617989208925384015872"
        },
        {
            "message": ", 2, 3",
            "eventId": "38636469249093328935961608873790523617989208925384015873"
        },
        {
            "message": "# Time: 11",
            "eventId": "38636469249093328935961608873790523617989208925384015875"
        },
        {
            "message": "# Time: 12",
            "eventId": "38636469249093328935961608873790523617989208925384015876"
        },
        {
            "message": "# Time: A",
            "eventId": "1"
        },
        {
            "message": ", B, C",
            "eventId": "2"
        },
        {
            "message": "# Time: C",
            "eventId": "3"
        },
        {
            "message": "# Time: D",
            "eventId": "5"
        }
    ]
}

Here's what I want it to look like:
[
  {
    "message": "# Time: 1, 2, 3"
  },
  {
    "message": "# Time: 11"
  },
  {
    "message": "# Time: 12"
  },
  {
    "message": "# Time: A , B, C"
  },
  {
    "message": "# Time: C"
  },
  {
    "message": "# Time: D"
  }
]

I've asked a question about this before and received an answer from a great person. (Thank you pmf.)
After this, I tried my best to refine it a little more, but it didn't work, so I asked once more.

First, When Ifirst run this JQ, I get the following results.

jq -r '.events 
| reduce .[1:][] as $i (.[:1];
    if ((.[-1].eventId | tonumber + 1 | tostring) != $i.eventId) or ($i.message | startswith("# Time:")) then
        . += [$i]
    else
        .[-1].message += " " + $i.message
    end
) 
| del(.[].eventId)
'
[
  {
    "message": "# Time: 1"
  },
  {
    "message": ", 2, 3"
  },
  {
    "message": "# Time: 11"
  },
  {
    "message": "# Time: 12"
  },
  {
    "message": "# Time: A , B, C"
  },
  {
    "message": "# Time: C"
  },
  {
    "message": "# Time: D"
  }
]

I saw it as a floating point issue and wanted to compare this number with the string itself, so I created a function as follows, but it did not work.
jq '
# Function to add one to a large number represented as a string
def add_one(num):
  (num | split("") | reverse | map(tonumber)) as $digits
  | reduce range(0; length) as $i ([];
      . + if $i == 0 or .[-1] == 10 then
            [($digits[$i] + 1) % 10]
         else
            [$digits[$i]]
         end
    )
  | reverse | map(tostring) | join("");

# Process events and check continuity
.events 
| map(.eventId |= tostring) # Ensure all eventIds are strings
| reduce .[1:][] as $i (.[:1];
    if (add_one(.[-1].eventId) != $i.eventId) or ($i.message | startswith("# Time:")) then
        . += [$i]
    else
        .[-1].message += " " + $i.message
    end
)
| del(.[].eventId)
'
[
  {
    "message": "# Time: 1"
  },
  {
    "message": ", 2, 3"
  },
  {
    "message": "# Time: 11"
  },
  {
    "message": "# Time: 12"
  },
  {
    "message": "# Time: A"
  },
  {
    "message": ", B, C"
  },
  {
    "message": "# Time: C"
  },
  {
    "message": "# Time: D"
  }
]

As I searched further, I found a command called gojq, but I would like to test it first to see if it is possible with jq.

I am using jq version 1.7.1.


Solution

  • If you can rely on the ordering of the items, and that only the right messages start with a #, you only have to check for occurrences of the latter.

    reduce .events[] as {$message} ([];   # iterate over events, extract message
      if $message | startswith("#")       # if message starts with #
      then . + [{$message}]               # add a new item to the result list
      else last.message += $message end   # otherwise append it to last item's
    )
    

    Demo

    However, implementing the stricter requirements is not much more difficult. Note that carrying over the eventId for comparison necessitates their removal afterwards.

    def check($e1; $e2):
      ($e2.message[:1] != "#") and
      ($e1.eventId | tonumber) + 1 == ($e2.eventId | tonumber);
    
    reduce .events[1:][] as $e (.events[:1];
      if check(last; $e) then last.message += $e.message else . + [$e] end
    )
    | map({message})  # drop eventId, just keep message
    

    Demo

    Output:

    [
      {
        "message": "# Time: 1, 2, 3"
      },
      {
        "message": "# Time: 11"
      },
      {
        "message": "# Time: 12"
      },
      {
        "message": "# Time: A, B, C"
      },
      {
        "message": "# Time: C"
      },
      {
        "message": "# Time: D"
      }
    ]