I am trying to add a badge type indicator in the github readme to show how many dataset downloads from the kaggle have occured so far (much like showing page visit count, etc). Is there any way to add this?
Particularly, I want to display this number of 602 downloads and the counter should update automatically with new downloads in real-time.
I didnt find any specific badge to integrate in the readme (from shield.io or elsewhere).
The simplest way to do it would be to implement your own web scraper and use a github workflow to periodically scrape the required data.
In the root directory of your github repository, create a requirements.txt
file with the following:
selenium==4.6.0
The selenium web scraper will be used to fetch data from the kaggle website.
In the root directory of your github repository, create a badge_generator.py
file with the following:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
from selenium.webdriver.common.by import By
def update_readme(readme_file_path, badge_id, new_badge):
new_file_content = ''
# id used to identify position of badge
line_id = f'![kaggle-badge-{badge_id}]'
badge_found = False
# open readme and update badge
with open(readme_file_path, 'r', encoding='utf-8') as f:
# get all lines in readme
lines = [line for line in f]
for i in range(0, len(lines)):
if line_id in lines[i]:
# replace old badge with new badge
lines[i] = new_badge
badge_found = True
break
# concatenate lines
new_file_content = ''.join(lines) if len(lines) > 0 else new_badge
if not badge_found:
raise Exception(
str(f"Badge {badge_id} not found in {readme_file_path}"))
# update readme
with open(readme_file_path, 'w', encoding='utf-8') as f:
f.write(new_file_content)
def create_badge(badge_id, badge_value,
badge_name='Downloads', badge_color='orange'):
badge_url = (f'https://img.shields.io/badge/{badge_name}'
f'-{badge_value}-{badge_color}')
markdown = (f'data:image/s3,"s3://crabby-images/68950/68950641158a174f2ad4de538b88134e9af7e327" alt="kaggle-badge-{badge_id}"\n')
return markdown
def get_download_count(kaggle_url: str):
chrome_options = Options()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--headless')
driver = webdriver.Chrome(options=chrome_options)
driver.get(kaggle_url)
time.sleep(3)
downloads_element = driver.find_element(
By.XPATH,
'//*[@id="site-content"]/div[2]/div/div[5]/div[6]/div[2]/div[1]/div/div[3]/h1')
download_count = downloads_element.get_attribute("textContent")
return (download_count)
def main():
readme_file_path = "README.md" # relative to root directory
# change this url
url = 'https://www.kaggle.com/datasets/utkarshx27/marijuana-arrests-in-columbia'
badge_id = 1 # each badge must be given a unique id
x = get_download_count(url)
y = create_badge(badge_id, x)
update_readme(readme_file_path, badge_id, y)
main()
Replace the value of url
in main
function with the URL of the kaggle card.
Your README.md
file must be in root directory. In your README file, add the following line at a line number where you want the badge to be:
![kaggle-badge-1]()
This line should be present before running script. When script is run, this line will be overwritten and the badge is updated.
Do not write anything else on this line.
Create a .github
folder in the root directory of your github repository and inside this folder create another folder workflows
. Place badge.yml
inside workflows
:
name: Kaggle Badge Generator
on:
push:
workflow_dispatch:
schedule:
- cron: '0 * * * *' # run every hour
jobs:
build:
runs-on: ubuntu-latest
steps:
- name: checkout repo content
uses: actions/checkout@v3
- name: setup python with pip cache
uses: actions/setup-python@v4
with:
python-version: '3.9'
cache: 'pip' # caching pip dependencies
- name : install any new dependencies
run: pip install -r requirements.txt
- name: execute py script
run: python badge_generator.py
- name: commit files
run: |
git config --local user.email "action@github.com"
git config --local user.name "GitHub Action"
git add -A
timestamp=$(date -u)
git diff-index --quiet HEAD || (git commit -a -m "Last badge update : ${timestamp}" --allow-empty)
- name: push changes
uses: ad-m/github-push-action@master
with:
github_token: ${{ secrets.GITHUB_TOKEN }}
branch: main
The python script will run every hour and will update the badge. The cron job can be modified to run more frequently.
The badge will look like this:
Your github file directory structure will be like this:
.github/
├─ workflows/
│ ├─ badge.yml
badge_generator.py
requirements.txt
README.md
... your stuffs
If you don't want the script to directly modify your README, you will have to implement some sort of API. Look into free serverless functions on Vercel or REST API on Render. This could pair with the dynamic badges Actions.