Search code examples
gitmercurialglob

Convert regex syntax to glob (switching from Mercurial to Git)


Because Bitbucket are winding down their support for Mercurial, I'm switching over from Mercurial to Git on a number of projects. It seems straightforward enough to do, but I need, I think, to make some manual changes to my .hgignore files for them to be usable as .gitignore, as they use regex syntax which I don't think Git supports.

I have to say I'm not clear about how one does things like make patterns only applicable to root level. I'm hoping that someone can give me the glob equivalent of the following example lines from my .hgignore:

\.project
\.settings/
\.idea/
^out/
web-app/WEB-INF/classes

Solution

  • It's true that Mercurial's .hgignore is (much) more flexible than Git's .gitignore, becuase Mercurial supports regular expressions and glob syntax. However, glob syntax tends to be much easier to get right, and I recommend it even in Mercurial (it used to be slow but now globs are translated into regex internally so there should be no real speed penalty).

    The equivalent of \.project is just .project.1 The equivalent of \.settings/ is just .settings/. This also works for .idea/. The only slightly difficult ones here are ^out/ and web-app/WEB-INF/classes. You certainly want:

    /out/
    

    which in Git anchors the out/ part to the level at which the .gitignore file appears, and you probably want a simple:

    web-app/WEB-INF/classes
    

    unless you mean to match the two-name-component-series web-app/WEB-INF at any level underneath this point and unless you mean to match classes.* (in regex terms) after that. In those cases you may want:

    **/web-app/WEB-INF/classes*
    

    or similar. The reason for the leading **/ is that in Git .gitignore files, any glob pattern with an embedded slash is equivalent to the same one starting with a leading slash. That is:

    $ cat .gitignore
    foo
    

    This tells Git not to complain about a file or directory named foo at any level underneath this point, i.e., in this folder or any sub-folder. On the other hand:

    $ cat .gitignore
    /foo
    

    means only foo at the top level, which also makes sense to everyone. But the weird thing is that:

    bar/foo
    

    means exactly the same thing as:

    /bar/foo
    

    because the embedded (not trailing) slash means "match only in this folder", just as a leading slash would.

    (Trailing slashes mean "only a subdirectory / sub-folder should match this rule". They get removed just for the "is there a slash" test; if there is a slash after removing any trailing one, the whole thing is anchored to this particular folder / directory.)


    1If I needed more ammunition in favor of glob instead of regex, \.project was probably wrong as it also excludes a file named this.project.file. But if you really did mean the equivalent of ^.*\.project.*$, that one is hard to express with glob patterns.