I want to copy files in a Container in an Azure blob storage which contains around 10,000,000 CSV or Zip files.
Filename format looks like "Energy_ReportName_Timestamp_VersionNumber.zip". The sample filename could be "Energy_Payment_20231209110007_0000000404988124.zip". VersionNumber at the end of the filename doesn't have a regular pattern.
I want to filter zip files for a specific ReportName and Date, and copy those files to another container.
For example, files for "ReportName" = Payment; and Date = 20231209 (at any time on this date and any VersionNumber).
Since there are millions of files in the source container, I am looking for an approach that is fast to find the desired files and copy them to the sink container using Copy activity in Azur eData Factory.
Note that this is a requirement from the client to use Azure Data Factory. It is part of a pipeline. All files are located in the container and there is no subfolder or nested folder.
Please let me know if there is any idea.
You can use wild card file name in the copy activity source as suggested by @Simon Goater in comments.
Go through the below demo:
These are the files in my input container named inputdata
.
Energy_Payment_20231209110007_0000000404988124.zip
Source_Payment_20231209010215_0000000404982426.zip
Employee_Payment_20231209122424_00000004049171224.zip
Business_Cash_20231209101602_0000000404988124.zip
Fund_Payment_20231208161001_0000000404988124.zip
In copy activity of ADF pipeline, take binary source and binary target dataset. In the source dataset, give the path till the container name.
Similarly, give the path till the target container in the target binary dataset.
Now, give the source dataset to copy activity source and give the wild card file name *Payment_20231209*.zip
like below.
Give the target dataset to the copy activity sink and Debug the pipeline. You can see the required files will get copied to the target location like below.