Search code examples
gitreverse-proxygit-http-backend

Is there a specific regular expression of HTTP URIs a git client operates on?


I'm trying to configure my Apache reverse proxy to match URIs accessed by git clients using the HTTP backend, for authentication purposes¹. For this I would like to match HTTP requests on the URI on the proxy and treat them differently. No problem in the latter part, but I have trouble finding a good URI pattern/list to match those requests.

What I've found so far is:

  • Experimented with logging server side (access logs) and client side (GIT_CURL_VERBOSE=1). Observed so far:
    • GETs on <base-url>/info/refs?service=git-upload-pack
      (ls-remote, or preliminary to fetch/clone)
    • GETs on <base-url>/info/refs?service=git-receive-pack
      (preliminary to git push)
    • POSTs to <base-url>/git-upload-pack
      (git fetch)
    • POSTs to <base-url>/git-receive-pack
      (git push)
  • Documentation in the Git book on transfer protocols, but this seems incomplete by design:

    This section contains a very basic overview of the transfer protocols. The protocol includes many other features, such as multi_ack or side-band capabilities, but covering them is outside the scope of this book.

  • A suggested Apache configuration in the manpage of git-http-backend.

    • It assumes you're serving git repositories on a separate prefix, which is not always the case (see my footnote).
    • The part like RewriteCond %{QUERY_STRING} service=git-receive-pack makes the assumption nothing else is serving on the same VirtualHost, because it would break non-Git resources with this, unless I would add the additional requirement that the URI without the query string matches ~ /info/refs$.
    • While it might be still up to date, it seems a bit outdated, as it's still showing Apache 2.2 configuration examples with authorization. This makes me wonder if this is updated appropriately and suitable for a credible source.

What also worries me with simply listing the above patterns is:

  • Perhaps some clients are operating differently, e.g. 'dumb protocol' or 'smart protocol v2'?
  • Git protocol 2 might change things, or not?
  • I can't really find a specification on the HTTP parts of the protocol. I can find a lot on the Git-level of the protocol, but that's not what I'm interested in from a reverse proxy perspective.
  • As a result, I might break stuff for users, which is hard to debug due to an obscure URI matching on the proxy...

So, ideally, I'd like to be pointed at some piece of documentation/code that shows a complete overview which URIs git http clients may operate on. It may be a simple regular expression - that's what I'm looking for of in the end anyway - as long as it's authorative.


¹ I'm trying to perform SSO login using Apache as authenticating reverse proxy, with different type of authentication for Git over HTTPS vs regular web pages. The app, Gerrit Code Review, serves both pages and Git repositories over a common URL prefix with SSO authentication and auth.trustContainerAuth enabled, so I can't really match on e.g. ^/git/.* as suggested on the manpage of git-http-backend.


Solution

  • The list of paths for both smart and dumb HTTP is in the source code. Note that it does not include query parameters or Content-Types.

    Note that there is ongoing work to add SHA-256 support to Git, and consequently anything that now accepts a 40-character hex string will also in the future handle a 64-character hex string.