I'm studying the SSH URI specification in order to understand a bit more the URLs used to access repositories in services like Github or Bitbucket via SSH.
A typical SSH Github URL looks like this: [email protected]:myuser/myrepo.git
, which I think can be decomposed in the following sections:
scheme authority
| |
/‾‾\/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
ssh://[email protected]:myuser/myrepo.git
\_/ \________/ \_______________/
| | |
user host port
What I don't understand is the port section. The official general URI specification states that ports should contain only numeric values. The SSH URI scheme specification sticks to the general URI specification. And the OpenSSH config manual does the same.
Why then do they use path-like text in the port section? Is this a deviation from the standard that has been established de facto? Or am I understanding wrong this whole thing?
I'd appreciate if anyone can clarify this.
TL:DR The syntax described in the question is a heritage from an old program called rcp
, and does not conform to the URI spec.
Explanation
Thanks to @jthill and @torek for pointing me to the right direction. This is my guess after some days of research.
All three of the remote operation programs in OpenSSH (ssh
, scp
, sftp
), accept arguments that comply to the SSH URI scheme, described in its own IETF specification.
Broadly speaking, any URI is composed of five components: scheme, authority, path, query, and fragment. Furthermore, the authority can be composed of user info, host, and port. In the case of the SSH URI scheme, only scheme, authority, and path are allowed.
scheme authority path
| | |
/‾‾\/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾\
ssh://[email protected]:22/myuser/myrepo.git
\_/\_________/\_/
| | |
user host port
However, there's an alternative syntax that's quite similar to the standard SSH URI scheme, but not exactly compliant with it. This syntax is a heritage from the old rcp
(remote copy) program created in 1981, almost a decade and a half before the creation of the URI protocol. When the Berkeley r-commands were released, to specify the destination of an operation in rsh
(remote shell) or rlogin
(remote login) it was only required to specify the remote machine in which the operation was going to be performed, and probably the user in that machine as well. At that time, the consensus was the syntax username@hostname
.
The rcp
program, however, had an additional condition: to copy files from, or to, one remote location, the developer was required to specify the path of the file in that remote machine, in addition to the user and host, just as the cp
program does. The syntax they devised was to append the path to the end of the user/host declaration, separated by a colon: username@hostname:path/to/file
(as described in its own man page).
[email protected]:myuser/myrepo.git
\_/ \________/ \_______________/
| | |
user host path
When the OpenSSH team implemented the scp
program, they wanted to provide a familiar API to the developers already using rcp
, so they decided to keep support for this old, custom syntax, in addition to the new standardized SSH URI scheme. Then, in 2005, Git entered the scene with native support for SSH, probably using OpenSSH under the hood, and the rcp-like syntax ended up being used when SSH is used for remote operations like clone
, fetch
, push
, and pull
.
The Git documentation calls this syntax scp-like syntax, and it is briefly described in the scp man page, the git-clone docs, and the Git protocols docs. In the end, despite its lack of standardization, it seems to be the syntax used by all cloud Git services like GitHub or Bitbucket, when offering a URL for cloning repos.