Search code examples
htmlbashcurlhtml-parsingstackexchange-api

How can I retrieve programmatically from command line my SO rep and number of badges?


Orignal question

My initial attempt was to run curl https://stackoverflow.com/users/5825294/enlico and pipe the result into sed/awk. However, as I've frequently read, sed and awk are not the best tools to parse HTML code. Furthermore, the above URL changes if I change my user name.

Oh, this is my quick attempt with sed, written on multiple lines for readability:

curl https://stackoverflow.com/users/5825294/enlico 2> /dev/null | sed -nE '
/title="reputation"/,/bronze badges/{
    /"reputation"/{
        N
        N
        s!.*>(.*)</.*!\1!p
    }
/badges/s/.*[^1-9]([1-9]+[0-9]*,*[0-9]* (gold|silver|bronze) badges).*/\1/p
}'

which prints

10,968
5 gold badges
27 silver badges
56 bronze badge

Obviously this script heavily relies on the peculiar structure of the specific HTML page, the most notable example being that I run N twice because I've verified that the reputation is two lines below the first line in the file containing "reputation".

Update based on the answers

Léa Gris' answer almost answers my question. The missing bit is that I have 5 gold, 27 silver, and 56 bronze badges, not 5, 18, 7.

In this respect, I've noticed that 18 is the is the number of silver badges I have if I don't consider those awarded multilple times, therefore I've played around with jq and discovered that I can query for the award_count beside the rank, and I thought that I could use that to take multiply awarded badges into account. This kind of works, in the sense that running the following (fetch_user_badges is from Léa Gris' answer) generates the correct number of silver badges but the wrong number of bronze badges:

$ fetch_user_badges stackoverflow 5825294 | jq -r '
.items
| map({rank: .rank, count: .award_count})
| group_by(.rank)
| map([[.[0].rank],map(.count) | add])'
[
  [
    "bronze",
    22
  ],
  [
    "gold",
    5
  ],
  [
    "silver",
    27
  ]
]

Is anybody aware of why is that?


Solution

  • Full example using StackExchange API and jq for parsing the response.

    #!/usr/bin/env bash
    
    # This script fetches and prints some user info
    # from a stack-site using the stackexchange's API
    
    # Change this to the stackoverflow's numerical user ID
    
    STACK_UID=5825294
    STACK_SITE='stackoverflow'
    STACK_API='https://api.stackexchange.com/2.2'
    
    API_CACHE=~/.cache/stack_api
    
    mkdir -p "$API_CACHE"
    
    # Get a stack-site user using the stackexchange API and caches the result
    # @Params:
    # $1: the website (example stackoverflow)
    # $2: the numerical user ID
    # @Output:
    # &1: API Json reply
    stack_api::user() {
      stack_site=$1
      stack_uid=$2
    
      cache_file="${API_CACHE}/${stack_site}-users-${stack_uid}.json"
    
      yesterday_ref="${API_CACHE}/yesterday.ref"
      touch -d yesterday "$yesterday_ref"
    
      # Expire cache
      [ "$cache_file" -ot "$yesterday_ref" ] && rm -f -- "$cache_file"
    
      # Call stack API only if no cached answer
      [ -f "$cache_file" ] || curl \
        --silent \
        --output "$cache_file" \
        --request GET \
        --url "${STACK_API}/users/${stack_uid}?site=${stack_site}"
    
      # Return cached answer
      zcat --force -- "$cache_file" 2>/dev/null
    }
    
    IFS=$'\n' read -r -d '' username reputation bronze silver gold < <(
      # Fetch user from a stack site
      stack_api::user "$STACK_SITE" "$STACK_UID" |
    
      # Parse the stack_api user data from the JSON response
      jq -r '
    .items[0] |
      .display_name,
      .reputation,
      ( .badge_counts |
        .bronze,
        .silver,
        .gold
      )
      '
    )
    
    printf 'Badges from UserID %d %s on the %s website:\n\n' \
      $STACK_UID "$username" "$STACK_SITE"
    printf 'Réputation: %6d\n' "$reputation"
    printf 'Bronze:     %6d\n' "$bronze"
    printf 'Silver:     %6d\n' "$silver"
    printf 'Gold:       %6d\n' "$gold"
    

    Example output:

    Badges from UserID 5825294 Enlico on the stackoverflow website:
    
    Reputation:  11144
    Bronze:         56
    Silver:         27
    Gold:            5