I am using GitPython to find the changed file for a certain period of time (for example now and 1 week ago):
repo = Repo(self.repo_directory)
for item in repo.head.commit.diff('develop@{1 weeks ago}'):
print ("smth")
but nothing happens even by changing the number of weeks to different number, which means there is no diff detected for that time period. If I change 'develop@{1 weeks ago}'
to 'HEAD@{1 weeks ago}'
then the number of changes is huge which is not correct for a week. Any help is appreciated.
Based on the discussions in the comments I came up w/ the following solution using GitPython (only required code is put here and ignored the rest to avoid confusion)
import git
from git import Repo
from git import RemoteProgress
class MyProgressPrinter(RemoteProgress):
def update(op_code, cur_count, max_count=None, message=''):
print(op_code, cur_count, max_count, cur_count / (max_count or 100.0), message or "NO MESSAGE")
def _get_commits_info(self):
for fetch_info in self.repo.remotes.origin.fetch(progress=MyProgressPrinter()):
self.commits_info.append(
(fetch_info.commit.committed_date, fetch_info.commit))
self.commits_info = sorted(self.commits_info, key=lambda x: x[0]) #sort based on committed date
def _get_the_changed_components(self):
self._get_commits_info()
last_date = self.commits_info[-1][0]
last_commit = self.commits_info[-1][1]
since_date = last_date - self.time_period * 86400 # for example time_period is 7 (days)
since_commit = self._get_since_commit(since_date) # finds the since_commit from the sorted list of commits_info
for item in last_commit.diff(since_commit):
if item.a_path.find('certain_path') != -1:
self.paths.add(item.a_path) #self.path is a set()
However, the length of self.path
is not reasonable to me since it captures too many changes and I am not sure why. So basically, what I did is: found all the commits, sort them based on committed_date
and then found a commit (since_commit
in the code) where its committed_date
is for 7 days ago
. After that got the diff between the last commit
in the sorted commits_info
list and the since_commit
then saved the a_path
es into a set.
I also tried another way and got the diff between every two consecutive commits since since_commit
from the sorted commits_info
all the way up to the last commit. This way the number of changes is even higher.
Any comments or help? Or do you think it is the correct way of getting diff for a time period? and the reason that the number of changes is higher is just by accident?
UPDATE and FINAL SOLUTION
So it seems comparing (diff) two commits doesn't give the changes that have happened between now and sometimes ago because commits before merging may include the changes before the interested time period. For that, I found two solutions, first count the number of HEAD
changes since that time till the current date, which is not very accurate. For that we can use:
g = Git(self.repo_directory)
loginfo = g.log('--since={}'.format(since), '--pretty=tformat:')
Then count the number of Merge pull request
string which basically counts the number of times that merging has happened to the repo, which usually changes the HEAD
. However, it's not accurate but let's assume this count will be 31. Then:
for item in self.repo.head.commit.diff('develop~31'):
if item.a_path.find('certain_path') != -1:
self.paths.add(item.a_path) #self.path is a set()
The solution that works and is straight forward
def _get_the_changed_components(self):
g = Git(self.repo_directory)
today = date.today()
since = today - DT.timedelta(self.time_period) #some times ago
loginfo = g.log('--since={}'.format(since), '--pretty=tformat:', '--name-only')
files = loginfo.split('\n')
for file in files:
self.paths.add(file)