Search code examples
ctagsexuberant-ctagstagbar

Can I add scope information to tags generated with `--regex-<LANG>` in exuberant ctags?


Technically, I'm using Tagbar in vim to view a file's tags, but this question should apply generally to exuberant ctags, v5.8.

Suppose I've got the following python file, call it foo.py:

class foo:
    def bar(baz):
        print(baz)

Let's run ctags on it: ctags foo.py. The resulting tags file looks like this:

!_ some ctags version / formatting stuff not worth pasting
bar foo.py  /^    def bar(baz):$/;" m   class:foo
foo foo.py  /^class foo:$/;"    c

The bit I'm interested in is the last field of the second line, class:foo. That's the scope of the bar() function. If I use tagbar in vim, it nests the function in the class accordingly.

Now suppose I'm adding support for a new language in my ~/.ctags. In fact, I'm adding support for this puppet file:

class foo {
    include bar
}

Suppose I use the following ~/.ctags arguments. The 'import' regex is ugly (errr... ugly for regex) but it gets the job done enough for this example:

--langdef=puppet
--langmap=puppet:.pp
--regex-puppet=/^class[ \t]*([:a-zA-Z0-9_\-]+)[ \t]*/\1/c,class,classes/
--regex-puppet=/^\ \ \ \ include[ \t]*([:a-zA-Z0-9_\-]+)/\1/i,include,includes/

That generates the following tag in my tags file:

bar foo.pp  /^    include bar$/;"   i
foo foo.pp  /^class foo {$/;"   c

Notice neither line contains scoping information. My question is this: Is there anyway for me to construct the --regex-puppet argument, or --regex-<LANG> lines generally, to collect information about a tag's scope? To perhaps declare that tags meeting criterion A are always going to be scope-parents of tags meeting criterion B?

man ctags suggests no clear way to add arbitrary scope information, but I might be overlooking another solution (snipped slightly for emphasis):

--regex-<LANG>=/regexp/replacement/[kind-spec/][flags]

        Unless modified by flags, regexp is interpreted as a Posix extended regular expression. The replacement should expand for all matching lines  to  a  non-empty  string  of
        characters,  or  a  warning message will be reported. An optional kind specifier for tags matching regexp may follow replacement, which will determine what kind of tag is
        reported in the "kind" extension field (see TAG FILE FORMAT, below). The full form of kind-spec is in the form of a single letter, a comma, a  name  (without  spaces),  a
        comma, a description, followed by a separator, which specify the short and long forms of the kind value and its textual description (displayed using --list-kinds). Either
        the kind name and/or the description may be omitted. If kind-spec is omitted, it defaults to "r,regex". Finally, flags are one or more single-letter characters having the
        following effect upon the interpretation of regexp:

           b   The pattern is interpreted as a Posix basic regular expression.

           e   The pattern is interpreted as a Posix extended regular expression (default).

           i   The regular expression is to be applied in a case-insensitive manner.

Solution

  • No, unfortunately that is not possible with the regex pattern support in ctags. The only way to get ctags to generate correct scopes is to write a parser as an additional module in C. I would like to add support for a better handling of new languages to ctags if I find the time, but so far that hasn't worked out and I'm also still unsure about the best approach.

    If you're mostly interested in Tagbar support there is another approach, though: Tagbar supports arbitrary tag-generating programs as long as their output is compatible to the ctags one, so you could write a simple parser in, say, Python and configure Tagbar to use that. Have a look at :h tagbar-extend (especially the last subsection "Writing your own tag-generating program") if that would be an option for you.