Palantir Foundry - How to load PDF files from Compass folder into code repository transform

In Palantir Foundry, my goal is:

Find all PDFs in a Compass folder
In transform, shutil / copy each PDF from Compass to a dataset file system

I have retrieved a list of PDF files stored in a Compass folder from this endpoint (compass/api/folders/{compass_rid}/children), and also successfully set up a Compass File Lister. I'm stuck on where to go from either option, as I haven't figured out how to use any of the information to actually read a blobster file from a transform.

Is it possible to read these PDFs in a transform to be able to copy them to an unstructured dataset file system?

Based on other SO questions, I read through read files in a repository but this seems to rely on the files actually being imported to the repository, so I'm not following if this would help me.

I also read through the Compass endpoints but I don't see a way to move/copy files from Compass to a dataset filesystem, only potentially from one Compass folder to another.

Solution

Sharing an updated version here that pulls all blobster files from a specified folder and writes them to one single dataset.

from transforms.api import transform, Output, configure
from transforms.external.systems import (
    EgressPolicy,
    Credential,
    use_external_systems
)
import requests


@configure(profile=["KUBERNETES_NO_EXECUTORS_SMALL"])
@use_external_systems(
    egress_policy=EgressPolicy("<POLICY_RID>"),
    creds=Credential("<SAVED_CREDENTIALS_RID>")
)
@transform(
    output=Output("<OUTPUT_RID>"),
)
def compute(egress_policy, creds, output):

    url_root = '<ROOT_URL>'
    compass_folder_read = '<FOLDER_RID>'
    resources_lister_url = f'{url_root}/compass/api/folders/{compass_folder_read}/children'
    get_blobster_url_root = f'{url_root}/blobster/api/salt/'
    TOKEN = creds.get("token")

    headers_compass = {
        'Authorization': f'Bearer {TOKEN}',
        'Content-Type': 'application/json',
    }

    headers_blobster = {
        'cookie': f'PALANTIR_TOKEN={TOKEN}'
    }

    files_response = requests.get(resources_lister_url, headers=headers_compass)
    # Get only blobster files.
    rid_filename_map = {f.get('rid'): f.get('name') for f in files_response.json().get('values') if 'blobster' in f.get('rid')}

    for blobster_rid, filename in rid_filename_map.items():
        url = get_blobster_url_root + blobster_rid
        file_contents_reponse = requests.get(url, headers=headers_blobster)
        with output.filesystem().open(filename, 'wb') as f:
            f.write(file_contents_reponse.content)
            f.close()