I've created a Bash function to summarize monthly Python and SAS commits from the past three years in a Git repository. My current function iterates through the most recent 36 months, generates a Git log for each month, and then counts the commits that touch Python and SAS files within that month. While it works, the loop takes a bit of time to execute.
When I attempt to generate a single Git log for the entire three-year period and count monthly commits from that, I end up with no counts in my output. I would prefer this method if I can get it to work, as it would eliminate the need for looping and improve performance. Has anyone successfully done this, or can anyone suggest how I might revise my current function to achieve this?
Here is my current working function (with loop):
#!/bin/bash
function count_commits_by_month() {
local start_date
start_date=$(date -d "$(date +%Y-%m-01) -36 months" +%Y-%m-01)
local end_date
end_date=$(date -d "$(date +%Y-%m-01) +1 month" +%Y-%m-01)
local current_date="$start_date"
echo -e "Month\tPython\tSAS"
while [[ "$current_date" < "$end_date" ]]; do
local next_month
next_month=$(date -d "$current_date +1 month" +%Y-%m-01)
# Count Python commits
local py_commits
py_commits=$(git log --no-merges --since="$current_date" --until="$next_month" --pretty=format:"%h" --name-only -- "*.py" | \
awk 'NF && !seen[$0]++' | wc -l)
# Count SAS commits
local sas_commits
sas_commits=$(git log --since="$current_date" --until="$next_month" --pretty=format:"%h" --name-only -- "*.sas" | \
awk 'NF && !seen[$0]++' | wc -l)
# Print the results for the current month
echo -e "$(date -d "$current_date" +%Y-%m)\t$py_commits\t$sas_commits"
# Move to the next month
current_date="$next_month"
done
}
Here is my non-working function (without loop):
function get_commits() {
local start_date
start_date=$(date -d "$(date +%Y-%m-01) -36 months" +%Y-%m-01)
local end_date
end_date=$(date -d "$(date +%Y-%m-01) +1 month" +%Y-%m-01)
# Print the header with aligned columns
printf "%-10s %-10s %-10s\n" "Month" "Python" "SAS"
# Use a single git log call to get all commits in the date range
git log --no-merges --since="$start_date" --until="$end_date" --pretty=format:"%ad %h" --date=format:'%Y-%m' --name-only -- "*.py" "*.sas" | \
awk '
BEGIN {
OFS = "\t";
}
/^[0-9]{4}-[0-9]{2}/ {
date = $1;
commit = $2;
seen_py[commit] = 0;
seen_sas[commit] = 0;
}
/\.py$/ {
if (!seen_py[commit]++) {
py[date]++;
}
}
/\.sas$/ {
if (!seen_sas[commit]++) {
sas[date]++;
}
}
END {
for (date in py) {
if (!(date in sas)) {
sas[date] = 0;
}
}
for (date in sas) {
if (!(date in py)) {
py[date] = 0;
}
}
PROCINFO["sorted_in"] = "@ind_str_asc"
for (date in py) {
printf "%-10s %-10d %-10d\n", date, py[date], sas[date];
}
}'
}
git log --date=format:%Y-%m --pretty=format:%cd \
--date-order --no-merges --name-status \
-- \*.py \*.sas \
| awk -F$'\t' '
NF>1 { sub(/.*\./,""); suf[$0]=1; next }
NF<1 { for ( s in suf ) ++ctouch[s]; delete suf; next }
END { for ( s in suf ) ++ctouch[s]; delete suf }
function endmonth() {
for (s in ctouch)
printf( \
"%s: %7d commits touched some .%s file(s)\n",
last,ctouch[s],s)
last=$1
delete ctouch
}
NF==1 { if ( $1!=last ) endmonth() }
END { endmonth() }
'
seems to do the trick for me.