Search code examples
gitdelimiterline-endings

Custom line-endings in git (other than LF and CR+LF)


I was hired as a consultant to work with a terrible in-house DSL used by a large corporation.

I say terrible because instead of carriage returns or linefeeds to end each line of code, lines of code are separated with the five-character ASCII string <EOL>. These files are thousands of "lines" long. Any embedded carriage returns or linefeeds tend to crash their interpreter.

I cannot change their interpreter or language, but I need to work with a massive (>100 MB) codebase written in this language.

Before making any changes to this code, I want to put it into a git repository to track it. Is there a way to tell git that the string <EOL> represents an end-of-line, much like you can specify LF or CR+LF with core.eol=lf? For example, core.eol="<EOL>". If so, this would make my life rather easier in two ways:

  1. It would make merges and diffs work intelligently; git would know where the "lines" are.
  2. I could (for example) check in the original code with <EOL> as the line ending, then check it out on another machine with core.eol=lf set, and git would convert back and forth automatically. (I could use a regular text editor and regular tools!)

I do recognize that this is a niche, edge case. I also understand I could add an intermediate processing step to convert back and forth before interacting with git, but I want to avoid that unless absolutely necessary, as I'd prefer to import their existing codebase directly into git without pre-processing it first.

If this feature is not available, I might even prefer creating a custom version of git to adding an extra processing step, so if anyone knows what complexities might be involved in that, I'd be interested in learning about those.


Solution

  • This custom filter setup will result in *.dsl files containing <EOL> in Git storage, but \n when checked out in your working directory. Tools such as git diff will operate on the checked-out versions (e.g. \n). Is that what you want?

    ~/.gitconfig or .git/config

    [filter "crazy-eol"]
        clean = awk 'BEGIN{ORS="<EOL>"}1'
        smudge = awk 'BEGIN{RS="<EOL>"}1'
    [diff "crazy-eol"]
        textconv = awk 'BEGIN{RS="<EOL>"}1'
    

    .gitattributes or .git/info/attributes

    *.dsl filter=crazy-eol diff=crazy-eol