Search code examples
azurezipunzipazure-data-factory

using azure data factory to unzip multiple files from http site


I have set up an "http file" data source in my ADF to connect to a specific URL (https://www.sos.wa.gov/_assets/corps/txtCorpsData.zip) which points to a ZIP file that contains 4 separate .txt files.

The service successfully connects and unzips the file but it's only reading the first file in the ZIP archive. How do I make the source separate out into 4 different individual sources? I'm guessing there's some parameter I need to use but not sure what that could be.

Here's a screenshot of the connection detail: connection detail


Solution

  • I think I figured it out, kinda: Using a "Copy Data" task that points to an "Http file" as the source. That "Http file" source then looks at the url from my question as a Linked Service, the source also deflates the ZIP. Within the "Copy Data" task the sink is a blob connection.

    When I run this task it deflates the ZIP file into a new folder underneath the path at the blob connection. This presents a new issue that I'm working on now, which is that the new folder creation appears to be whatever the GUID is for the pipeline running, somehow I need to figure out a way to specify the folder name so it's consistent...I'll likely post another question asking that later.