I am trying to read the schema
from a text
file under the same package as the code but cannot read that file using the AWS glue job. I will use that schema
for creating a dataframe using Pyspark
. I can load that file locally. I am zipping the code files as .zip, placing them under the s3
bucket, and then referencing them in the glue job. Every other thing works fine. No problem there. But when I try the below code it doesn't work.
file_path = os.path.join(Path(os.path.dirname(os.path.relpath(__file__))), "verifications.txt")
multiline_data = None
with open(file_path, 'r') as data_file:
multiline_data = data_file.read()
self.logger.info(f"Schema is {multiline_data}")
This code throws the below error:
Error Category: UNCLASSIFIED_ERROR; NotADirectoryError: [Errno 20] Not a directory: 'src.zip/src/ingestion/jobs/verifications.txt'
I also tried with abs_path
but it didn't help either. The same block of code works fine locally.
I also tried directly passing the "./verifications.txt"
path but no luck.
So how do I read this file?
As @Bogdan mentioned the way to do this is use S3 to store the verifications.txt
file. Here's some example code using boto3
import boto3
# Hardcoded S3 bucket/key (these are normally passed in as Glue Job params)
s3_bucket = 'your-bucket-name'
s3_key = 'path/to/verifications.txt'
# Read data from S3 using boto3
s3_client = boto3.client('s3')
response = s3_client.get_object(Bucket=s3_bucket, Key=s3_key)
multiline_data = response['Body'].read().decode('utf-8')
If you want to access the file from inside the zip directly (given your comment) you might have to get more fancy...
import boto3
import zipfile
import io
# Initialize boto3 client for S3
s3 = boto3.client('s3')
# Define the bucket name and the zip file key
bucket_name = 'your-bucket-name'
zip_file_key = 'path/to/src.zip'
# Download the zip file from S3
zip_obj = s3.get_object(Bucket=bucket_name, Key=zip_file_key)
buffer = io.BytesIO(zip_obj['Body'].read())
# Open the zip file in memory
with zipfile.ZipFile(buffer, 'r') as zip_ref:
# List all files in the zip
print("Files in the zip:", zip_ref.namelist())
# Open and read a specific file within the zip without extracting
with zip_ref.open('verifications.txt') as file:
text_content = file.read().decode('utf-8')
print("Contents of the text file:", text_content)