Search code examples
pythongitgit-diffgitpython

git diff between current date and some times ago in GitPython


I am using GitPython to find the changed file for a certain period of time (for example now and 1 week ago):

 repo = Repo(self.repo_directory)
 for item in repo.head.commit.diff('develop@{1 weeks ago}'):
     print ("smth") 

but nothing happens even by changing the number of weeks to different number, which means there is no diff detected for that time period. If I change 'develop@{1 weeks ago}' to 'HEAD@{1 weeks ago}' then the number of changes is huge which is not correct for a week. Any help is appreciated.


Solution

  • Based on the discussions in the comments I came up w/ the following solution using GitPython (only required code is put here and ignored the rest to avoid confusion)

       import git
       from git import Repo
       from git import RemoteProgress
    
       class MyProgressPrinter(RemoteProgress):
           def update(op_code, cur_count, max_count=None, message=''):
              print(op_code, cur_count, max_count, cur_count / (max_count or 100.0), message or "NO MESSAGE")
    
    
       def _get_commits_info(self):
            for fetch_info in self.repo.remotes.origin.fetch(progress=MyProgressPrinter()):
            self.commits_info.append(
                (fetch_info.commit.committed_date, fetch_info.commit))  
            self.commits_info = sorted(self.commits_info, key=lambda x: x[0]) #sort based on committed date
    
    
       def _get_the_changed_components(self):
           self._get_commits_info()
           last_date = self.commits_info[-1][0]
           last_commit = self.commits_info[-1][1]
           since_date = last_date - self.time_period * 86400 # for example time_period is 7 (days)
           since_commit = self._get_since_commit(since_date) # finds the since_commit from the sorted list of commits_info 
    
           for item in last_commit.diff(since_commit):
               if item.a_path.find('certain_path') != -1:
                   self.paths.add(item.a_path) #self.path is a set()
    

    However, the length of self.path is not reasonable to me since it captures too many changes and I am not sure why. So basically, what I did is: found all the commits, sort them based on committed_date and then found a commit (since_commit in the code) where its committed_date is for 7 days ago. After that got the diff between the last commit in the sorted commits_info list and the since_commit then saved the a_pathes into a set.

    I also tried another way and got the diff between every two consecutive commits since since_commit from the sorted commits_info all the way up to the last commit. This way the number of changes is even higher.

    Any comments or help? Or do you think it is the correct way of getting diff for a time period? and the reason that the number of changes is higher is just by accident?

    UPDATE and FINAL SOLUTION

    So it seems comparing (diff) two commits doesn't give the changes that have happened between now and sometimes ago because commits before merging may include the changes before the interested time period. For that, I found two solutions, first count the number of HEAD changes since that time till the current date, which is not very accurate. For that we can use:

     g = Git(self.repo_directory)
     loginfo = g.log('--since={}'.format(since), '--pretty=tformat:') 
    

    Then count the number of Merge pull request string which basically counts the number of times that merging has happened to the repo, which usually changes the HEAD. However, it's not accurate but let's assume this count will be 31. Then:

      for item in self.repo.head.commit.diff('develop~31'):
         if item.a_path.find('certain_path') != -1:
             self.paths.add(item.a_path) #self.path is a set()
    

    The solution that works and is straight forward

      def _get_the_changed_components(self):
          g = Git(self.repo_directory)
          today = date.today()
          since = today - DT.timedelta(self.time_period) #some times ago
          loginfo = g.log('--since={}'.format(since), '--pretty=tformat:', '--name-only')
          files = loginfo.split('\n')
          for file in files:
              self.paths.add(file)