I want to use python import re
to split the string of Git's log as below:
commit 8e018dbcdbff15c3fc9ef4460b4214f47f71ddf6
Author: ISAAC.NEWTON <[email protected]>
Date: Fri Apr 28 18:58:00 2023 +0800
new cat
commit 9274b33435238122c8d6d389e73266f6a3e68745
Author: ISAAC.NEWTON <[email protected]>
Date: Wed Apr 19 11:04:04 2023 +0800
meow
commit 4f113912741f753c75a44f18790ff5903e910fad
Author: ISAAC.NEWTON <[email protected]>
Date: Fri Apr 14 17:55:55 2023 +0800
Add test files
commit 9274b33435238122c8d6d389e73266f6a3e68745
Author: ISAAC.NEWTON <[email protected]>
Date: Wed Apr 19 11:04:04 2023 +0800
Second commit test
commit 9274b33435238122c8d6d389e73266f6a3e68745
Author: ISAAC.NEWTON <[email protected]>
Date: Wed Apr 19 11:04:04 2023 +0800
First commit
Then,
I want to get commits array as below:
[
'
commit 8e018dbcdbff15c3fc9ef4460b4214f47f71ddf6
Author: ISAAC.NEWTON <[email protected]>
Date: Fri Apr 28 18:58:00 2023 +0800
new cat
',
'
commit 9274b33435238122c8d6d389e73266f6a3e68745
Author: ISAAC.NEWTON <[email protected]>
Date: Wed Apr 19 11:04:04 2023 +0800
meow
',
...
]
It's hard to find the pattern which is Clean and General to match a commit for me.
Frame the problem as locating blocks with known start/end patterns.
Then, define where the block start and end - here by anchoring to commit hashes.
import re
rgx = r'(commit\s[0-9,a-f]{40}.*?)(?=commit\s[0-9,a-f]{40}|\Z)'
text = '''commit 8e018dbcdbff15c3fc9ef4460b4214f47f71ddf6
Author: ISAAC.NEWTON <[email protected]>
Date: Fri Apr 28 18:58:00 2023 +0800
new cat
commit 9274b33435238122c8d6d389e73266f6a3e68745
Author: ISAAC.NEWTON <[email protected]>
Date: Wed Apr 19 11:04:04 2023 +0800
meow
commit 4f113912741f753c75a44f18790ff5903e910fad
Author: ISAAC.NEWTON <[email protected]>
Date: Fri Apr 14 17:55:55 2023 +0800
Add test files
commit 87053deb6ad07fa1ea6dd7a5acfee075ce5b6322
Author: ISAAC.NEWTON <[email protected]>
Date: Fri Apr 14 15:16:57 2023 +0800
Add cat.jpg
'''
re.findall(rgx, text, re.DOTALL)
which gives the expected output
['commit 8e018dbcdbff15c3fc9ef4460b4214f47f71ddf6\nAuthor: ISAAC.NEWTON <[email protected]>\nDate: Fri Apr 28 18:58:00 2023 +0800\n\n new cat\n\n',
'commit 9274b33435238122c8d6d389e73266f6a3e68745\nAuthor: ISAAC.NEWTON <[email protected]>\nDate: Wed Apr 19 11:04:04 2023 +0800\n\n meow\n\n',
'commit 4f113912741f753c75a44f18790ff5903e910fad\nAuthor: ISAAC.NEWTON <[email protected]>\nDate: Fri Apr 14 17:55:55 2023 +0800\n\n Add test files\n\n',
'commit 87053deb6ad07fa1ea6dd7a5acfee075ce5b6322\nAuthor: ISAAC.NEWTON <[email protected]>\nDate: Fri Apr 14 15:16:57 2023 +0800\n\n Add cat.jpg\n']
EDIT: mind the EOF handled with the sentinel \Z