Search code examples
bazaar

How to ignore certain files when branching / checking out?


I'd like to compare a few files from the bazaar branch lp:ubuntu/nvidia-graphics-drivers. I'm mainly interested in the debian subdirectory inside that branch, but due to the binary blob in http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files, it takes ages to get just the text files. I've already downloaded 555MB and it's still counting.

Is it possible to retrieve a bazaar branch, including or excluding certain files by one of the following properties:

  • file size
  • file extension
  • file name (include only debian/ for example)

I do not need to push back any changes, nor do I need to view the history of a file. I just want to compare two files in the debian/ directory, files with the .in extension and files without.


Solution

  • I ended up doing some dirty grep-ing on the HTTP response since bzr info "$branch" and bzr ls -d "$branch" "$directory" did not provide enough information to me.

    The below Bash script relies on the working of Launchpads front-end Loggerhead. It recursively downloads from a given URL. Currently, it ignores *.run files. Save it as bzrdl in a directory available from $PATH and run it with bzrdl http://launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/debian/. All files will be saved in the current directory, be sure that it's empty to avoid conflicts.

    #!/bin/bash
    max_retries=5
    rooturl="$1"
    if ! [[ $rooturl =~ /$ ]]; then
        echo "Usage: ${0##*/} URL"
        echo "URL must end with a slash. Example URL:"
        echo "http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-graphics-drivers/oneiric/files/head:/"
        exit 1
    fi
    tmpdir="$(mktemp -d)"
    target="$(pwd)"
    # used for holding HTTP response before extracting data
    tmp="$(mktemp)"
    # url_filter reads download URLs from stdin (piped)
    url_filter() {
        grep -v '\.run$'
    }
    get_files_from_dir() {
        local slash=/
        local dir="$1"
        # to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
        local storedir="${dir//$slash/.d${slash}}"
        mkdir -p "$tmpdir/$storedir" "$target/$dir"
        local i subdir
        for ((i=0; i<$max_retries; i++ )); do
            if wget -O "$tmp" "$rooturl$dir"; then
                # store file list
                grep -F -B 1 '<img src="/static/images/ico_file_download.gif" alt="Download File" />' "$tmp" |\
                    grep '^<a' | cut -d '"' -f 2 | url_filter \
                    > "$tmpdir/$storedir/files"
                IFS=$'\n'
                for subdir in $(grep -F -B 1 '<img src="/static/images/ico_folder.gif" ' "$tmp" | \
                    grep -F '<a ' | rev | cut -d / -f 2 | rev); do
                    IFS=$' \t\n'
                    get_files_from_dir "$dir$subdir/"
                done
                return
            fi
        done
        echo "Failed to download directory listing of: $dir" >> "$tmpdir/errors"
    }
    download_files() {
        local slash=/ 
        local dir="$1"
        # to avoid name collision: a/b/c/ -> a.d/b.d/c.d/
        local storedir="${dir//$slash/.d${slash}}"
        local done=false
        local subdir
        cd "$tmpdir/$storedir"
        for ((i=0; i<$max_retries; i++)); do  
            if wget -B "$rooturl$dir" -nc -i files -P "$target/$dir"; then
                done=true
                break
            fi
        done  
        $done || echo "Failed to download all files from $dir" >> "$tmpdir/errors"
        for subdir in *.d; do 
            download_files "$dir${subdir%%.d}/"
        done
    }
    get_files_from_dir ''
    # make *.d expand to nothing if no directories are found
    shopt -s nullglob
    download_files ''
    echo "TMP dir: $tmpdir"
    echo "Errors : $(wc -l "$tmpdir/errors" 2>/dev/null | cut -d ' ' -f 2 || echo 0)"
    

    The temporary directory and file is not removed afterwards, that must be done manually. Any errors (failures to download) will be written to $tmpdir/errors

    It's confirmed to work with:

    bzrdl http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/oneiric/nvidia-settings/oneiric/files/head:/debian/
    

    Feel free to correct any mistakes or add improvements.