Search code examples
gitsingle-file

git: send contents of a single file to stdout? like rcs/cvs -p


BRIEF: what is the best way in git to send the contents of a single file to STDOUT?

E.g. the equivalent of RCS co -p, which "[P]rints the retrieved revision on the standard output rather than storing it in the working file. This option is useful when co is part of a pipe." --https://linux.die.net/man/1/co

Eg. similarly cvs co -p "Pipe files to the standard output." --https://linux.die.net/man/1/cvs"

E.g. in a git work tree/clone, the equivalent of cvs co -p some-relative-pathname which I can pipe into other commands.

E.g. the equivalent of svn cat.

E.g. hg cat [OPTION]... FILE...

Why do I want to do this?

One reason, that prompted me to make this post, is that regrettably often I have to deal with UTF-16.XML, and other formats that my tools do not support all that well ( UNIX-ish, emacs, cygwin, etc.). I find myself frequently doing things like checking out the UTF-16, converting to UTF-8, editing UTF-8, savings UTF-8, wanting to diff ... leading to a proliferation of temporary files. Whereas with RCS/CVS co -p I was able to script a lot more of this. Of course I expect that somebody has already done this for git, hence this question.

I actually want more than just sending to STDOUT and thence to a pipeline. I probably want to do things like output to a specific modified filename and so on. But if you can do STDOUT you can write to any file, whereas if you can only write to a non-stdout file, while you can then cat that filed the STDOUT, you have left a turd behind that you have to clean up. Plus you cannot do -o tmp-file;cat tmp-file if you don't have filesystem write permissions. Although writing directly to a file has the advantage that it can preserve file properties like permissions, and special stuff like symlinks.
Ideally both output content to STDOUT as well as output to specified files with properties as well as content.

DETAIL:

What makes a Best Way

By "best way" I mean a way that involves the least work. Ideally just a single command line or pipeline (although obviously would script it if that is not possible).

Something that keeps the simple cases simple, but can be extended to more complex cases.

Ideally does not involve creating temporary clones or checkouts and then discarding them. Ideally does not modify the repository, or create a temporary repository/clone. (Believe it or not, At least one webpage suggests first "move all files in path/to to the repository root, then, remove all files except file.txt!!))

Ideally works when chdir'ed into a git work tree. In this case I would like to simply specify a file pathname present in the local, checked out tree. Relative to the current directory, slightly suboptimally relative to the root of the repo, submodule or super-project. But of course would like to be able to specify any normal git-tree-ish.

I.e. XXX ./local-file equivalent to cvs co -p ./local-file

Of course would like to be able to do this when I'm not actually in such a repository, in a remote. But for my current usage model that's just icing on the cake.

Ideally does not require a git-server listening on localhost... some systems do not consider that necessarily secure..

Minimally can output the contents of a single file. Bonus points if it usefully handle multiple files, although that raises the issue of how to do so: just concatenate as cvs -p does, losing file boundaries, or emit something useful like a tar.

Is this a duplicate question?

As far as I can tell this is not a duplicate of Q1 How to sparsely checkout only one single file from a git repository?.

Or at least that Q&A thread suffers from a lot of confusion: e.g. whether Q1's OP actually wanted to do a checkout, a sparse checkout, or did they just want to output the contents of the file. Q1 sent me on on a bit of a wild goose chase through git archive.

Closer: Q2 How to retrieve a single file from a specific revision in Git?, which arrives at git show object, and git show $REV:$FILE.

Although Q2's question name is very close to mine, the answers read on are mostly all relevant to Q2's first paragraph

I have a Git repository and I'd like to see how some files looked a few months ago. I found the revision at that date; it's 27cf8e84bb88e24ae4b4b3df2b77aab91a3735d8. I need to see what one file looks like, and also save it as a ("new") file.

So they don't mention the simplest thing that is equivalent to cvs co -p FILE --- git show HEAD:./FILE.
If Q2's accepted answer mentioned this I would delete my question - but that looks like a significant change. (If I were maintaining an FAQ list I would do it, but not sure that that is appropriate for StackOverflow.)

Perhaps git show HEAD:./file have been obvious to me, but it wasn't.

In part because of the git show HEAD:FILE --- root versus current working directory relative confusion.

Also because of some git show gotchas that I mention in the answer that I am supplying with my question, such as silent errors for files that do not exist and confusing behavior for symlinks.

Both of these and similar Q&As are full of misleading wild goose chases.

E.g. many suggested that git archive be used to do this.
I was pursuing this wild goose before switching back to git show.
I think that it might be inefficient to tar a directory and then extract a single file. However I think it would be useful to have a git archive answer that is substantially equivalent to cvs -p. Especially since this is probably the best way to extract multiple files or a subtree.

However, the biggest thrashing was everyone suggesting hashes that make your eyes bleed, rather than simply using HEAD or some similar. Many of these answers probably work in the general case, where you are not extracting a single file from a working tree, but are overkill for some of the simple cases that I am most interested in. What I really needed was a refresher on https://git-scm.com/docs/gitrevisions

  • I am pretty sure I knew the answer to this question in the past, but did not know it off the top of my head, and basically searching for it a fresh lead to a lot of wasted time, wild goose chases w/wo dead ends, and so on.

It's not enough to have the right answer implied but possibly obscure, or even explicit but hidden in a way such that standard searches don't find it, or find it far down the list with lots of other off target answers.

I may have found an answer before posting the question...

My efforts to provide background information for this question (to avoid accusations of not having tried to figure it out myself and mostly to annoying pseudo-answers that miss the point or are incomplete or just plain wrong) led me to eventually find what I think is the BKM:

git show HEAD:./git-file-cwd-relative-pathname == cvs co -p some-relative-pathname.

git show HEAD:./git-root-relative-pathname == cvs co -p ..., where it is a bit more work in CVS/RCS to get such a pathname relative to a project root.

I will post this this question and immediately answer it myself.

I think it is worth posting this if only try to make it easier for somebody else to find this answer quickly. I did not find it quickly, even though pieces of it are scattered around other Q&A threads. Also avoid some wild goose chases.

What do you call this operation?

"Extract"?

My first thought was to ask for "how to extract the contents of a single file from git". But googling that finds many posts and webpages that "checkout" or "pull".

"extract-file-markdown" looked promising, but clicking through https://gist.github.com/ssp/1663093#file-git-extract-file-markdown the title is "How to extract a single file with its history from a git repository". Whereas what I am trying to do is "extract a single file WITHOUT its history." Well, at least the author of that tool made that clear after only one level of onion peeling click.

The term "extract" leads to confusion.

Hence the title that I'm providing to this question: "send contents of a single file to stdout". With "like rcs/cvs -p" to further reduce ambiguity.

Yada yada yada ... in a current working tree, or from a remote repository ... when you just have a file pathname relative to the current working directory, not necessarily the git repo route ... yada yada yada....

Probably the closest existing answer used the word "retrieve" a single file, which neither I nor Google thought was equivalent. I only found that answer a day or so later, after had written almost all of this.


Solution

  • WIP: cleaning up -1 answer since no better available

    More than a year later: Nobody seems to be volunteering to provide a better answer to this, so I will incrementally clean it up. Mostly, incorporating @torek's comments, trying to make them understandable, and provide a quick compact answer that doesn't require looking at a lot of manpages and putting the pieces together.

    (I am not disputing the -1 score, I was just hoping that somebody could provide a clean answer. Where "clean" means not that the solution suggested by @torek is incorrect, not even with regards to inelegantcies like error codes, but where those suggestions from comments are actually formulated as a compact answer.)

    git cat-file -p <rev>:<cwd-or-root-relative-pathname> is probably the answer

    Simplest case: CWD relative, as checked out

    The closest thing to cvs co -p <cwd-relative-pathname>

    • seems to be git cat-file -p HEAD:./<cwd-relative-pathname>
    • with close contender git show HEAD:./<cwd-relative-pathname>

    differing in some ways.

    Specifying Revision

    Both CVS and GIT can specify or request a revision in many different ways.

    CVS examples: a numeric version number like -r 1.2.3.4, or more human friendly -r named-tag, with special tags like HEAD and BASE. -D date, etc.

    GIT has many more different ways of specifying or requesting revisions. Ranging from relatively user-friendly stuff like HEAD and HEAD^, through SHA-1 160-bit 40-hex-digit hashcodes like d08c8b53975f513ad3b320bf55273b852f25247c that you can abbreviate with the 1st few hex digits, with many places in between.

    CVS defaults to the current revision if you say cvs co -p some-relative-pathname

    GIT does not default this way, so explicitly providing a git revision like HEAD: or any of the umpteen other possible formats is usually required. It is rather pleasant that GIT allows revspecs like HEAD: to be concatenated to the pathname by colon. If you do not specify something like HEAD: you might not get anything, no output, no error messages.

    Summing up:

    The closest thing to cvs co -r CVS_REV -p <cwd-relative-pathname>

    • seems to be git cat-file -p GIT_REVSPEC:./<cwd-relative-pathname>
    • similarly for git show...

    With variations like cvs co -D date... <--> ...

    CWD vs repo ROOT relative pathnames

    In CVS cvs co -p a/b/c defaults to interpreting the path relative to the current working directory CWD. If you wanted to be relative to the "root" of the repository, you have to do some work

    In git git cat-file -p HEAD:a/b/c Is interpreted not relative to CWD, but relative to the root of the repository.

    In git, to specify relative to the current working directory, you need to say ./, e.g. git cat-file -p HEAD:./a/b/c. For that matter ../ also works. This is a fairly common convention nowadays, although it will break any scripts that assume that prepending ./ to any pathname that does not begin with / is effectively a NOP.

    You can get the GIT root relative pathname via something like git ls-files <cwd-relative-pathname>. In CVS ... well, in CVS/RCS it is a bit more work to get such a pathname relative to a project root. RCS does not really have the concept of a "project root" and depends on other tools wrapped around basic RCS. This can happen even with CVS, or at least would be happening if people were still using CVS.

    Therefore:

    The closest thing to cvs co -p <cwd-relative-pathname>

    • seems to be git cat-file -p HEAD:./<cwd-relative-pathname>
    • similarly for git show...

    The closest thing to cvs co -p <root-relative-pathname>

    • seems to be git cat-file -p HEAD:<cwd-relative-pathname>
    • similarly for git show...

    TBD: I may provide examples of some confusing error messages, or, worse, examples where you get no error message or status code but you are still doing the wrong thing.

    TBD: Example if you accidentally provide an absolute path to GIT.

    That's All

    That's all I was really looking for. Modulo obvious extensions to use remote repositories, fixup the git show formatting, etc. However, rather than ending the answer here, I hope that it might help somebody else to see some of the dead ends that I went through - things that I need to be careful of in shell scripts and Makefile's using this. Some shell session extracts, edited for clarity.

    Old stuff kept around, to be cleaned up

    I don't want to completely eliminate the stuff below yet, because some of the gotchas are still relevant. But I do want to rework it

    git show silent failures and other surprises

    git show NO-SUCH-FILE => silent error

    cwd1> git show README --> No output, not even an error code

    That's disappointing, but I know enough about git revisions to tryHEAD:README

    cwd1> git show HEAD:README.Dragon_Stuff
    fatal: path 'some-dir/README' exists, but not 'README'
    hint: Did you mean 'HEAD:some-dir/README' aka 'HEAD:./some-dir/README'

    That's a useful error message, and even provides an error code :-)

    cwd1> git show HEAD:./README.Dragon_Stuff ...
    ...
    --> yay! I get the output

    That's not too bad. It is annoying not to have error codes if the object not found, i.e. will probably lead to bugs and scripts that use this, but that can be added.

    git show SYMLINKs => more silent confusion

    ... In a different directory cwd2 where the README was a symlink.

    First, the usual:

    $ cwd2> git show README --> no output, no error code :-(
    $ cwd2> git show ./README --> no output, no error code :-(

    $ cwd2> git show HEAD:./README some-dir/README
    --> Huh?
    Oh, it's a symlink: ./README -> some-dir/README. Git show is outputting the target of the link. Which can be easily confused with the contents of a text file. Q: is there an option to reduce such confusion?

    (BTW, I did this on cygwin, whose symlink handling is interesting. It might be different on Linux. I haven't checked. Either case would be unfortunate.)

    git show ROOT-and-LOCAL => more silent confusion

    $ cwd2> cd some-dir
    $ some-dir> git show README --> no output, no error code
    $ some-dir> git show ./README --> no output no error code

    $ some-dir> git show HEAD:README
    some-dir/README
    --> Huh?

    In this example cwd2 just happened to be the root of the repository.

    But even if cwd2 is not the repo-root, it can happen that there might be a repo-root/README that might also be a symlink. I don't know if it would be more confusing to point to some-dir/README or some-other-dir/README.

    $ some-dir> git show HEAD:./README
    ... --> Yay! The actual contents!