html bash curl html-parsing stackexchange-api

How can I retrieve programmatically from command line my SO rep and number of badges?

Orignal question

My initial attempt was to run curl https://stackoverflow.com/users/5825294/enlico and pipe the result into sed/awk. However, as I've frequently read, sed and awk are not the best tools to parse HTML code. Furthermore, the above URL changes if I change my user name.

Oh, this is my quick attempt with sed, written on multiple lines for readability:

curl https://stackoverflow.com/users/5825294/enlico 2> /dev/null | sed -nE '
/title="reputation"/,/bronze badges/{
    /"reputation"/{
        N
        N
        s!.*>(.*)</.*!\1!p
    }
/badges/s/.*[^1-9]([1-9]+[0-9]*,*[0-9]* (gold|silver|bronze) badges).*/\1/p
}'

which prints

10,968
5 gold badges
27 silver badges
56 bronze badge

Obviously this script heavily relies on the peculiar structure of the specific HTML page, the most notable example being that I run N twice because I've verified that the reputation is two lines below the first line in the file containing "reputation".

Update based on the answers

Léa Gris' answer almost answers my question. The missing bit is that I have 5 gold, 27 silver, and 56 bronze badges, not 5, 18, 7.

In this respect, I've noticed that 18 is the is the number of silver badges I have if I don't consider those awarded multilple times, therefore I've played around with jq and discovered that I can query for the award_count beside the rank, and I thought that I could use that to take multiply awarded badges into account. This kind of works, in the sense that running the following (fetch_user_badges is from Léa Gris' answer) generates the correct number of silver badges but the wrong number of bronze badges:

$ fetch_user_badges stackoverflow 5825294 | jq -r '
.items
| map({rank: .rank, count: .award_count})
| group_by(.rank)
| map([[.[0].rank],map(.count) | add])'

[
  [
    "bronze",
    22
  ],
  [
    "gold",
    5
  ],
  [
    "silver",
    27
  ]
]

Is anybody aware of why is that?

Solution

Full example using StackExchange API and jq for parsing the response.

#!/usr/bin/env bash

# This script fetches and prints some user info
# from a stack-site using the stackexchange's API

# Change this to the stackoverflow's numerical user ID

STACK_UID=5825294
STACK_SITE='stackoverflow'
STACK_API='https://api.stackexchange.com/2.2'

API_CACHE=~/.cache/stack_api

mkdir -p "$API_CACHE"

# Get a stack-site user using the stackexchange API and caches the result
# @Params:
# $1: the website (example stackoverflow)
# $2: the numerical user ID
# @Output:
# &1: API Json reply
stack_api::user() {
  stack_site=$1
  stack_uid=$2

  cache_file="${API_CACHE}/${stack_site}-users-${stack_uid}.json"

  yesterday_ref="${API_CACHE}/yesterday.ref"
  touch -d yesterday "$yesterday_ref"

  # Expire cache
  [ "$cache_file" -ot "$yesterday_ref" ] && rm -f -- "$cache_file"

  # Call stack API only if no cached answer
  [ -f "$cache_file" ] || curl \
    --silent \
    --output "$cache_file" \
    --request GET \
    --url "${STACK_API}/users/${stack_uid}?site=${stack_site}"

  # Return cached answer
  zcat --force -- "$cache_file" 2>/dev/null
}

IFS=$'\n' read -r -d '' username reputation bronze silver gold < <(
  # Fetch user from a stack site
  stack_api::user "$STACK_SITE" "$STACK_UID" |

  # Parse the stack_api user data from the JSON response
  jq -r '
.items[0] |
  .display_name,
  .reputation,
  ( .badge_counts |
    .bronze,
    .silver,
    .gold
  )
  '
)

printf 'Badges from UserID %d %s on the %s website:\n\n' \
  $STACK_UID "$username" "$STACK_SITE"
printf 'Réputation: %6d\n' "$reputation"
printf 'Bronze:     %6d\n' "$bronze"
printf 'Silver:     %6d\n' "$silver"
printf 'Gold:       %6d\n' "$gold"

Example output:

Badges from UserID 5825294 Enlico on the stackoverflow website:

Reputation:  11144
Bronze:         56
Silver:         27
Gold:            5