In Palantir Foundry, my goal is:
I have retrieved a list of PDF files stored in a Compass folder from this endpoint (compass/api/folders/{compass_rid}/children), and also successfully set up a Compass File Lister. I'm stuck on where to go from either option, as I haven't figured out how to use any of the information to actually read a blobster file from a transform.
Is it possible to read these PDFs in a transform to be able to copy them to an unstructured dataset file system?
Based on other SO questions, I read through read files in a repository but this seems to rely on the files actually being imported to the repository, so I'm not following if this would help me.
I also read through the Compass endpoints but I don't see a way to move/copy files from Compass to a dataset filesystem, only potentially from one Compass folder to another.
Sharing an updated version here that pulls all blobster files from a specified folder and writes them to one single dataset.
from transforms.api import transform, Output, configure
from transforms.external.systems import (
EgressPolicy,
Credential,
use_external_systems
)
import requests
@configure(profile=["KUBERNETES_NO_EXECUTORS_SMALL"])
@use_external_systems(
egress_policy=EgressPolicy("<POLICY_RID>"),
creds=Credential("<SAVED_CREDENTIALS_RID>")
)
@transform(
output=Output("<OUTPUT_RID>"),
)
def compute(egress_policy, creds, output):
url_root = '<ROOT_URL>'
compass_folder_read = '<FOLDER_RID>'
resources_lister_url = f'{url_root}/compass/api/folders/{compass_folder_read}/children'
get_blobster_url_root = f'{url_root}/blobster/api/salt/'
TOKEN = creds.get("token")
headers_compass = {
'Authorization': f'Bearer {TOKEN}',
'Content-Type': 'application/json',
}
headers_blobster = {
'cookie': f'PALANTIR_TOKEN={TOKEN}'
}
files_response = requests.get(resources_lister_url, headers=headers_compass)
# Get only blobster files.
rid_filename_map = {f.get('rid'): f.get('name') for f in files_response.json().get('values') if 'blobster' in f.get('rid')}
for blobster_rid, filename in rid_filename_map.items():
url = get_blobster_url_root + blobster_rid
file_contents_reponse = requests.get(url, headers=headers_blobster)
with output.filesystem().open(filename, 'wb') as f:
f.write(file_contents_reponse.content)
f.close()