Search code examples
pythongitlabdockerfilegitlab-api

How can I extract contents from a file stored in gitlab repos


Using the gitlab-python package, I'd like to extract lines from all Dockerfiles. Using my code below, I am able to get project names and url to the repo I want but how can I ensure there is a Dockerfile and read the contents of the Dockerfile.

import gitlab
import json
from pprint import pprint
import requests
import urllib.request


# private token authentication
gl = gitlab.Gitlab('<path_to_gitlab_repo>', private_token=<token_here>)

gl.auth()

# list all projects
projects = gl.projects.list()
for project in projects:
    # print(project) # prints all the meta data for the project
    print("Project: ", project.name)
    print("Gitlab URL: ", project.http_url_to_repo)
    # print("Branches: ", project.repo_branches)
    pprint(project.repository_tree(all=True))
    f = urllib.request.urlopen(project.http_url_to_repo)
    myfile = f.read()
    print(myfile)
    print("\n\n")

The output I get now is :

Gitlab URL:  <path_to_gitlab_repo>
[{'id': '0c4a64925f5c129d33557',
  'mode': '1044',
  'name': 'README.md',
  'path': 'README.md',
  'type': 'blob'}]

Solution

  • You can use the project.files.get() method (see documentation) to get the Dockerfile of the project.

    You can then print the content of the Dockerfile/do whatever you want to do with it like this:

    import gitlab
    import base64
    
    
    # private token authentication
    gl = gitlab.Gitlab(<gitlab-url>, private_token=<private-token>)
    
    gl.auth()
    
    # list all projects
    projects = gl.projects.list(all=True)
    for project in projects:
        # print(project) # prints all the meta data for the project
        # print("Project: ", project.name)
        # print("Gitlab URL: ", project.http_url_to_repo)
    
        # Skip projects without branches
        if len(project.branches.list()) == 0:
            continue
    
        branch = project.branches.list()[0].name
    
        try:
            f = project.files.get(file_path='Dockerfile', ref=branch)
        except gitlab.exceptions.GitlabGetError:
            # Skip projects without Dockerfile
            continue
    
        file_content = base64.b64decode(f.content).decode("utf-8")
        print(file_content.replace('\\n', '\n'))
    
    

    You might have to adjust the branch name in case there are multiple branches.