I run the following MWE of python script to read throw commits and create another git projec somewhere else.
I call this script this way to iterate through git projectA and create another git projectB under bash command of
git filter-branch -f --tree-filter "python3 /media/sf_git/register-commits.py /home/mercury/splitted" --prune-empty --tag-name-filter cat -- --all
The argument to python3
is the script that runs on each commit and the path after it is the location where project B is supposed to be created.
/media/sf_git/register-commits.py
import os
import sys
def git_init(module):
os.system('git init ' + module)
def create_project(parent, module):
os.chdir(parent)
print('parent:', parent)
git_init(module)
if not os.path.exists(os.path.join(parent, module, '.git')):
sys.exit('.git folder is not created.')
arg1 = sys.argv[1]
if arg1 is None:
sys.exit('The script argument is not provided')
commit_id = os.environ["GIT_COMMIT"]
module = 'projectB'
cwd = os.getcwd()
try:
dst_module_path = os.path.join(arg1, module)
if not os.path.exists(dst_module_path):
create_project(arg1, module)
except Exception as e:
print('Error: ' + str(e))
finally:
os.chdir(cwd)
The problem is that the os.chdir
can change the path. I have even printed it. That's correct. But the git init command runs in the same working directory of project A instead of project B. It gives me the following error
WARNING: git-filter-branch has a glut of gotchas generating mangled history
rewrites. Hit Ctrl-C before proceeding to abort, then use an
alternative filtering tool such as 'git filter-repo'
(https://github.com/newren/git-filter-repo/) instead. See the
filter-branch manual page for more details; to squelch this warning,
set FILTER_BRANCH_SQUELCH_WARNING=1.
Proceeding with filter-branch...
Rewrite 8a30d5630ab7ead31ecc3b30122054d27eec0dbe (1/3058) (0 seconds passed, remaining 0 predicted)
Reinitialized existing Git repository in /home/mercury/projectA/.git/
.git folder is not created.
parent: /home/mercury/splitted
tree filter failed: python3 /media/sf_git/register-commits.py /home/mercury/splitted
It creates an empty folder projectB
under /home/mercury/splitted
with no .git
folder inside it.
It looks like that there is another side problem that the projectA is changed. Because when I run the script for the second time, there is an error
Proceeding with filter-branch...
You need to run this command from the toplevel of the working tree.
It looks like the projectA is hurt. The only way to fix that I know is to copy .git
folder of projectA from the backup.
Using subprocess.Popen
gives me a similar result:
def git_init(module):
parent = os.getcwd()
print('parent:', parent)
proc = subprocess.Popen(['git', 'init', module], stdout=subprocess.PIPE, stderr=subprocess.STDOUT, cwd=parent)
p_status = proc.wait()
(output, err) = proc.communicate()
print(output)
output
WARNING: git-filter-branch has a glut of gotchas generating mangled history
rewrites. Hit Ctrl-C before proceeding to abort, then use an
alternative filtering tool such as 'git filter-repo'
(https://github.com/newren/git-filter-repo/) instead. See the
filter-branch manual page for more details; to squelch this warning,
set FILTER_BRANCH_SQUELCH_WARNING=1.
Proceeding with filter-branch...
Rewrite 8a30d5630ab7ead31ecc3b30122054d27eec0dbe (1/3058) (0 seconds passed, remaining 0 predicted)
parent: /home/mercury/splitted
parent: /home/mercury/splitted
b'Reinitialized existing Git repository in /home/mercury/projectA/.git/\n'
.git folder is not created.
tree filter failed: python3 /media/sf_git/register-commits.py /home/mercury/splitted
This is strange that git creates a folder inside /home/mercury/splitted
but tries initiating the .git
under /home/mercury/projectA
.
When I run the scripts under normal python environment, everything is fine. But under git filter-branch
here the paths do not apply to git
even though the working directory is changed fine. In addition to that, it looks like projectA gets corrupted when git init
is applied for another directory.
I am not sure exactly if this is a git
problem or python
problem.
What is wrong and how to fix this problem?
What is wrong ...
There are two things you must not do in a tree filter, in git filter-branch
, in general:
This is not necessarily an exclusive list, and, luckily, there are some ways around these two.
and how to fix this problem?
The limitation on changing directories is actually specific to shell commands run in the top level shell (filter-branch eval
s your filter here). Since you're firing up a completely separate process, python
, that allows you to change the working directory. But it's worth mentioning the issue, since an attempt to optimize your filter might result in running into it.
The limitation on using Git commands is because a tree filter is specifically aimed at letting you use non-Git commands to rework the contents of each commit. Using git filter-branch
simply to examine the contents of each commit wasn't the intent here.
Fortunately, there is a simple workaround for running git init
like this: you just need to remove the environment variable GIT_DIR
from the environment when you invoke Git. If you invoke other Git commands there may be more environment variables you must unset.
Overall, though, it's not clear why you're trying to use git filter-branch
for this. If you want to get a list of commits, the correct tool is usually git rev-list
. If you want to get files from those commits, things get more complex, but filter-branch is still probably not the right tool.