I have a chef recipe which clones a specific branch of a git repository that contains two .tgz files and an .sql file. The file names in the repo follow a convention, but are timestamped, which means there's no way to be sure of their exact names with each run. After cloning the repository, I'd like chef to extract both of the .tgz files.
I've gotten everything to work up until the part where chef needs to extract the .tgz files. The client run always errors out with the tgz filenames as nil. I believe the problem is that because of the way chef works, it may not be possible for chef to "discover" a file name that's been added to a directory during its run phase.
During my testing I found that if I clone the git repository before the chef run so that its contents are stored inside of the recipe's files/
directory, those files are included in chef's cache and are extracted as expected. I believe this works because the .tgz files are known to chef at this point; they aren't being made available during the run. This is a solution I can consider as a last resort, but it's not ideal as I'd like to do as little work on the end user's local machine as possible.
I'd like to know if my understanding is correct and if there's a way to achieve what I've outlined. Here's my code:
# Clone the repository
execute "Cloning the #{backup_version} from the #{backup_repository_url} repository" do
command "su #{user} -c 'git clone --single-branch --branch #{backup_version} #{backup_repository_url} #{backup_holding_area}'"
cwd web_root
end
# I need all three files eventually, so find their paths in the directory
# they were cloned to and store them in a hash
backup_files = Hash.new
["code", "media", "db"].each do |type|
backup_files[type.to_sym] = Dir["#{backup_holding_area}/*"].find{ |file| file.include?(type) }
end
# I need to use all three files eventually, but only code and media are .tgz files
# This nil check is where chef fails
unless backup_files[:code].nil? || backup_files[:media].nil? || backup_files[:db].nil?
backup_files.slice(:code, :media).each do |key, file|
archive_file "Restore the backup from #{file}" do
path file
destination web_root
owner user
group group
overwrite :auto
only_if { ::File.exist?(file) }
end
end
end
There are different phases of chef-client run. The "Compile" and "Converge" phase are the relevant ones in this situation. During the run, the "compile" phase comes first, then "converge".
For e.g., the below variable assignment will run during compile phase.
backup_files = Hash.new
Whereas the execute
block (like below) will be run during converge:
execute "Cloning the #{backup_version} from the #{backup_repository_url} repository" do
command "su #{user} -c 'git clone --single-branch --branch #{backup_version} #{backup_repository_url} #{backup_holding_area}'"
cwd web_root
end
As all of the variable assignments are outside the resource blocks, they have been assigned long before the actual convergence. i.e. when files were not even in the destination directory. So they don't have the filenames as we are expecting.
One way to ensure that we get the filenames is to assign the variables inside a Chef resource. One such resource is the ruby_block
resource.
Using this then we can have recipe like below:
# use execute to clone or use the git resource with properties as required
git backup_holding_area do
repository backup_repository_url
revision backup_version
action :checkout
end
# Iterating over files in directory is still ok as there only 3 files
ruby_block 'get and extract code and media tar files' do
block do
Dir.entries("#{backup_holding_area}").each do |file|
if file.include?('tar.gz')
# appropriate flags can be used for "tar" command as per requirement
system("tar xzf #{backup_holding_area}/#{file} -C #{web_root}")
end
end
end
end