I was hired as a consultant to work with a terrible in-house DSL used by a large corporation.
I say terrible because instead of carriage returns or linefeeds to end each line of code, lines of code are separated with the five-character ASCII string <EOL>
. These files are thousands of "lines" long. Any embedded carriage returns or linefeeds tend to crash their interpreter.
I cannot change their interpreter or language, but I need to work with a massive (>100 MB) codebase written in this language.
Before making any changes to this code, I want to put it into a git repository to track it. Is there a way to tell git that the string <EOL>
represents an end-of-line, much like you can specify LF
or CR+LF
with core.eol=lf
? For example, core.eol="<EOL>"
. If so, this would make my life rather easier in two ways:
<EOL>
as the line ending, then check it out on another machine with core.eol=lf
set, and git would convert back and forth automatically. (I could use a regular text editor and regular tools!)I do recognize that this is a niche, edge case. I also understand I could add an intermediate processing step to convert back and forth before interacting with git, but I want to avoid that unless absolutely necessary, as I'd prefer to import their existing codebase directly into git without pre-processing it first.
If this feature is not available, I might even prefer creating a custom version of git to adding an extra processing step, so if anyone knows what complexities might be involved in that, I'd be interested in learning about those.
This custom filter setup will result in *.dsl
files containing <EOL>
in Git storage, but \n
when checked out in your working directory. Tools such as git diff
will operate on the checked-out versions (e.g. \n
). Is that what you want?
~/.gitconfig
or .git/config
[filter "crazy-eol"]
clean = awk 'BEGIN{ORS="<EOL>"}1'
smudge = awk 'BEGIN{RS="<EOL>"}1'
[diff "crazy-eol"]
textconv = awk 'BEGIN{RS="<EOL>"}1'
.gitattributes
or .git/info/attributes
*.dsl filter=crazy-eol diff=crazy-eol