Search code examples
chef-infraarchive-file

Is there a way for Chef to become aware of an archive file's contents during a run?


I have a chef recipe which clones a specific branch of a git repository that contains two .tgz files and an .sql file. The file names in the repo follow a convention, but are timestamped, which means there's no way to be sure of their exact names with each run. After cloning the repository, I'd like chef to extract both of the .tgz files.

I've gotten everything to work up until the part where chef needs to extract the .tgz files. The client run always errors out with the tgz filenames as nil. I believe the problem is that because of the way chef works, it may not be possible for chef to "discover" a file name that's been added to a directory during its run phase.

During my testing I found that if I clone the git repository before the chef run so that its contents are stored inside of the recipe's files/ directory, those files are included in chef's cache and are extracted as expected. I believe this works because the .tgz files are known to chef at this point; they aren't being made available during the run. This is a solution I can consider as a last resort, but it's not ideal as I'd like to do as little work on the end user's local machine as possible.

I'd like to know if my understanding is correct and if there's a way to achieve what I've outlined. Here's my code:

# Clone the repository
execute "Cloning the #{backup_version} from the #{backup_repository_url} repository" do
    command "su #{user} -c 'git clone --single-branch --branch #{backup_version} #{backup_repository_url} #{backup_holding_area}'"
    cwd web_root
end

# I need all three files eventually, so find their paths in the directory 
# they were cloned to and store them in a hash
backup_files = Hash.new
["code", "media", "db"].each do |type|
    backup_files[type.to_sym] = Dir["#{backup_holding_area}/*"].find{ |file| file.include?(type) }
end

# I need to use all three files eventually, but only code and media are .tgz files
# This nil check is where chef fails
unless backup_files[:code].nil? || backup_files[:media].nil? || backup_files[:db].nil?
    backup_files.slice(:code, :media).each do |key, file|
        archive_file "Restore the backup from #{file}" do
            path file
            destination web_root
            owner user
            group group
            overwrite :auto
            only_if { ::File.exist?(file) }
        end
    end
end

Solution

  • There are different phases of chef-client run. The "Compile" and "Converge" phase are the relevant ones in this situation. During the run, the "compile" phase comes first, then "converge".

    • Compile phase: "code" that is not within a Chef resource
    • Converge phase: "code" that is within Chef resources

    For e.g., the below variable assignment will run during compile phase.

    backup_files = Hash.new
    

    Whereas the execute block (like below) will be run during converge:

    execute "Cloning the #{backup_version} from the #{backup_repository_url} repository" do
        command "su #{user} -c 'git clone --single-branch --branch #{backup_version} #{backup_repository_url} #{backup_holding_area}'"
        cwd web_root
    end
    

    As all of the variable assignments are outside the resource blocks, they have been assigned long before the actual convergence. i.e. when files were not even in the destination directory. So they don't have the filenames as we are expecting.

    One way to ensure that we get the filenames is to assign the variables inside a Chef resource. One such resource is the ruby_block resource.

    Using this then we can have recipe like below:

    # use execute to clone or use the git resource with properties as required
    git backup_holding_area do
      repository backup_repository_url
      revision backup_version
      action :checkout
    end
    
    # Iterating over files in directory is still ok as there only 3 files
    ruby_block 'get and extract code and media tar files' do
      block do
        Dir.entries("#{backup_holding_area}").each do |file|
          if file.include?('tar.gz')
            # appropriate flags can be used for "tar" command as per requirement
            system("tar xzf #{backup_holding_area}/#{file} -C #{web_root}")
          end
        end
      end
    end