My question centers around the for-loop in the listDirs function, where I am launching async tasks. I am passing path by reference to std::async
which then invokes the listDir function in a separate thread.
I am aware that once the for-loop moves to the next iteration, the path variable, which is a const reference to a std::filesystem::path
instance in the paths vector, goes out of scope. However, listDir function's parameter is a reference which should be bound to path.
My understanding is that even though path goes out of scope in the listDirs function, the actual std::filesystem::path
instances in the paths vector persist for the entire duration of the listDirs function, as we're passing by std::ref
. But I'm not certain if this understanding is correct.
Can someone please clarify how this works? Specifically:
Does std::ref in std::async
ensure that listDir gets a valid reference even when path goes out of scope in the listDirs function?
Is there any risk of a dangling reference in this scenario?
#include <filesystem>
using Iterator = std::filesystem::directory_iterator;
// The caller of this function is the thread runtime
std::vector<std::string> listDir(const std::filesystem::path& directory)
{
std::vector<std::string> files;
for (Iterator it(directory); it != Iterator(); ++it)
{
if (it->is_regular_file())
{
files.emplace_back(it->path().filename().string());
}
}
// When we return this vector as the final action in the function, Return Value Optimization(RVO) takes place to
// eliminate any extra copying of the vector
return files;
}
std::vector<std::string> listDirs(const std::vector<std::filesystem::path>& paths)
{
using Iterator = std::filesystem::directory_iterator;
std::vector<std::future<std::vector<std::string>>> futures; // listDir returns std::vector<std::string> type
// iterate over all the directory paths
for (const std::filesystem::path& path : paths)
{
// start each thread using std::async
futures.emplace_back(std::async(listDir, std::ref(path)));
}
std::vector<std::string> allFiles;
for (std::future<std::vector<std::string>>& fut : futures)
{
std::vector<std::string> files = fut.get(); // RVO
std::move(files.begin(), files.end(), std::back_inserter(allFiles));
}
// When we return this vector as the final action in the function, Return Value Optimization(RVO) takes place to
// eliminate any extra copying of the vector
return allFiles;
}
int main()
{
std::filesystem::path currentPath("G:\\lesson4");
std::vector<std::filesystem::path> paths;
for (Iterator it(currentPath); it!= Iterator(); ++it)
{
if (it->is_directory())
{
std::cout << it->path() << '\n';
paths.emplace_back(it->path());
}
}
for (const auto& fileName : listDirs(paths))
{
std::cout << fileName << std::endl;
}
}
In your loop, the variable path
is a reference. You can think of it a little like a pointer, except it's not.
for (const std::filesystem::path& path : paths)
{
// start each thread using std::async
futures.emplace_back(std::async(listDir, std::ref(path)));
}
At the first iteration of your loop, path
refers to the first element of the vector paths
. At the second iteration, it refers to the second element of the vector. And so on...
Because paths
does not change for the lifetime of any reference into its elements (even those used in futures
), this is safe. When you pass path
into the std::async
constructor with std::ref(path)
, that reference wrapper will encapsulate the current reference.
In fact, reference wrappers are typically implemented using a pointer under the hood, because that's the only practical way to pass around a reference as an lvalue.
Even if the loop moves to the second iteration before your first async method is called, the reference binding remains intact and still refers to the first element of paths
.