Search code examples
pythonazureazure-blob-storageurllibazure-storage-explorer

IngestionTimestamp contains %3A symbol instead of semicolon


In my Azure Blob Storage I have a folder named as IngestionTimestamp in ISO 8601 format. But I want to have the semicolons in time part. Instead, I get the %3A symbols.

I've tried to use the urllib.parse unquote Python library to decode the string.

Also, I use the datetime library to format the timestamp current_timestamp_str = current_timestamp.strftime('%Y-%m-%dT%H:%M:%S')

decoded_timestamp = unquote(current_timestamp_str + '+00:00')

In VS Code it shows me the timestamp with semicolons. But when I land data into Blob container, I have the folder name like this:

IngestionTimestamp=2021-01-01T01%3A20%3A33.

I want this:

IngestionTimestamp=2021-01-01T01:20:33+00:00

Code with transforming date/time format:

    current_date = date.today()
    current_date_str = str(current_date)
    current_timestamp = datetime.now(timezone.utc)
    current_timestamp_str = current_timestamp.strftime('%Y-%m-%dT%H:%M:%S')
    decoded_timestamp = unquote(current_timestamp_str + '+00:00')

Solution

  • In VS Code, it shows me the timestamp with semicolons. However, when I land data into the Blob container, I have the folder name like this: IngestionTimestamp=2021-01-01T01%3A20%3A33. I want this: IngestionTimestamp=2021-01-01T01:20:33+00:00

    You can use the code below to create a directory(folder) with your specific format in Azure storage using Python.

    Code:

    from datetime import datetime, timezone
    import urllib.parse
    from azure.storage.filedatalake import DataLakeServiceClient
    
    current_timestamp = datetime.now(timezone.utc)
    current_timestamp_str = current_timestamp.strftime('%Y-%m-%dT%H:%M:%S').replace(';', ':')
    encoded_timestamp_str = urllib.parse.quote(current_timestamp_str + '+00:00', safe=';/?:@&=+$,-_.!~*()')
    
    folder_name = 'IngestionTimestamp=' + encoded_timestamp_str
    
    connection_string = 'xxxxx'
    filesystem_name = 'data'
    
    service_client = DataLakeServiceClient.from_connection_string(conn_str=connection_string)
    file_system_client = service_client.get_file_system_client(filesystem_name)
    directory_client = file_system_client.get_directory_client(folder_name)
    directory_client.create_directory()
    

    Output:

    IngestionTimestamp directory

    Update:

    You can use the below code to rename the folder.

    Code:

    from datetime import datetime,timezone
    import urllib.parse
    from azure.storage.filedatalake import DataLakeServiceClient
    
    current_timestamp = datetime.now(timezone.utc)
    current_timestamp_str = current_timestamp.strftime('%Y-%m-%dT%H:%M:%S').replace(';', ':')
    encoded_timestamp_str = urllib.parse.quote(current_timestamp_str + '+00:00', safe=';/?:@&=+$,-_.!~*()')
    
    new_folder_name = 'IngestionTimestamp=' + encoded_timestamp_str
    
    connection_string = 'xxxx'
    filesystem_name = 'data'
    foldername="sample123"
    
    service_client = DataLakeServiceClient.from_connection_string(conn_str=connection_string)
    file_system_client = service_client.get_file_system_client(filesystem_name)
    directory_client = file_system_client.get_directory_client(foldername)
    directory_client.rename_directory(
            new_name=f"{directory_client.file_system_name}/{new_folder_name}") 
    

    Reference:

    Use Python to manage data in Azure Data Lake Storage Gen2 - Azure Storage | Microsoft Learn