Search code examples
postgresqlgitlabgitlab-omnibus

Accessing gitlab postgres omnibus database


I'm trying to access my gitlab omnibus's postgres installation from other apps so that I can share data within. How do I find the login information, eg user/pass?


Solution

  • There should be no password.

    If you have sudo access on the machine where you installed GitLab Omnibus, then you can confirm this with:

    sudo grep gitlab-psql /etc/shadow
    

    and it should show '!' in the password field, something like:

    gitlab-psql:!!:16960::::::
    

    Faced with a similar goal (accessing GitLab's DB in order to derive some usage plots, counts of issues opened/closed over time, etc.), here is what I did (assuming sudo ability):

    sudo su -l gitlab-psql
    mkdir -p ~/.ssh
    chmod 0700 ~/.ssh
    cat >> ~/.ssh/authorized_keys << "EOF"
    <your ssh public key here>
    EOF
    
    chmod 0600 ~/.ssh/authorized_keys
    

    Once this is done, first check that you can ssh to that host as gitlab-psql, using the proper key, of course, either from a remote host: ssh gitlab-psql@my-gitlab-host, or locally: ssh gitlab-psql@localhost.

    After that, you should be able to access the DB from other apps via ssh. For example, here is a way to query the DB directly from a Python notebook (running on another host somewhere in EC2), and using Pandas:

    def gitlab_query(query):
        cmdargs = [
            'ssh', 'gitlab-psql@my-gitlab-host',
            f"""/opt/gitlab/embedded/bin/psql -h /var/opt/gitlab/postgresql/ gitlabhq_production -A -F $'\t' -c "{query}" """,
        ]
        proc = subprocess.Popen(cmdargs, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        try:
            outs, errs = proc.communicate(timeout=15)
        except subprocess.TimeoutExpired:
            proc.kill()
            outs, errs = proc.communicate()
        errors = errs.decode('utf-8')
        if errors:
            raise ValueError(errors)
        result = outs.decode('utf-8')
        result = result[:result.rfind('\n', 0, -1)]
        return result
    
    
    # simple example
    # NOTE: as is, this is incomplete, because many issues are closed by other
    # actions (e.g. commits or merges) and in those cases, there is no
    # closed_at date. See further below for better queries. (not included in
    # this SO answer as this is getting beyond the scope of the question).
    
    q = """
    select
      b.name, a.title, a.created_at, a.closed_at
    from issues a inner join projects b on (a.project_id = b.id)
    where closed_at > '2018-01-09' and b.name='myproject'
    order by 1,4 limit 10
    """
    
    pd.read_csv(io.StringIO(gitlab_query(q)), sep='\t', parse_dates=['created_at', 'closed_at'])