Search code examples
pythonazureazure-data-lakeu-sql

Programmatically submit a U-SQL job with code-behind


I'm currently submitting my U-SQL jobs via the Python library and I want to add additional code in a C# or Python code-behind file. Are code-behind files supported, either in python or in a CLI-based method that I could easily automate?

Ideally I'd like to use the Azure CLI or the Python library so this can run on both Linux and Windows (i.e. not relying on Visual Studio). I've check the documentation for both PowerShell and Python, but I don't see any instructions on how to submit jobs with code-behind logic.

Here is my python code:

from azure.mgmt.datalake.analytics.job import DataLakeAnalyticsJobManagementClient

adlaJobClient = get_client_from_cli_profile(
    DataLakeAnalyticsJobManagementClient,
    adla_job_dns_suffix='azuredatalakeanalytics.net')

def submit_usql_job(script):
    job_id = str(uuid.uuid4())
    job_result = adlaJobClient.job.create(
        ADLA_ACCOUNT_NAME,
        job_id,
        JobInformation(
            name='Sample Job',
            type='USql',
            properties=USqlJobProperties(script=script)
        )
    )
    print("Submitted job ID '{}'".format(job_id))
    return job_id

Solution

  • Once compiled, the DLL file for your code behind can be serialized into hexadecimal string and then imported inline via a few extra lines of code. This avoids the need to separately upload and register the DLL.

    CREATE ASSEMBLY [__TMP_inline_dll] FROM 0x4D5A900003000...;
    WITH ADDITIONAL_FILES = (0x2A543C... AS "__TMP_inline_dll.pdb");
    REFERENCE ASSEMBLY [__TMP_inline_dll];
    
    /* Your USQL Code Here... */
    
    DROP ASSEMBLY [__TMP_inline_dll];
    

    The files can be serialized to hexadecimal using this Python code:

    import binascii
    
    def get_file_hex_string(filepath: str):
        """Open file in binary mode and return as a hex string."""
        with open(filepath, 'rb') as f:
            hexdata = binascii.hexlify(f.read())
        return hexdata.upper()
    

    Notes:

    • The above assumes you have already compiled the dll.
    • This boilerplate code includes a pdb file noted as "additional" which should be optional.
    • The DROP ASSEMBLY statement at the end is needed to "clean up" the process afterwards, although I've been informed that in a future version of USQL this will no longer be necessary.
    • I received this method via the very helpful support team of the VS Code USQL add-in.