json amazon-web-services dataframe pyspark aws-glue

Put entire json file in one cell of dataframe using Pyspark

I have nested JSON files and each of them I need to put in one cell of dataframe.

Original idea is to take nested json, create one more column with value of key called "DataType", put whole json file in the second column and write it out in S3 bucket partitioned by that data type, code of writing looks like this:

def write_data(data_df, output_path):
    data_df.coalesce(100).write.partitionBy("DataType").mode("append").parquet(output_path)

Basically it will be sorting glue job.

I have tried this:

df = dyf.toDF()
df2 = df.withColumn("data", lit(df.toJSON().first()))

And it looks fine until I take multiple JSON files to process, in the output I have the same json in every row because of this first().

Solution

Adding a working solution here :

df2 = df.withColumn("data", to_json(struct([df[col] for col in df.columns])))

Json Object Parsing using shell script
How to read body as any valid json?
How do you JSON.stringify an ES6 Map?
how to improve this perl/bash one-liner to deserialize json data
What is the convention in JSON for empty vs. null?
Using jq how can I replace the name of a key with something else
Merge multiple JSON files (more than two)
C#, how to validate JSON using Regex
How to filter filter data after extracted data in Json form
Using jq or alternative command line tools to compare JSON files
Transposing jq array elements (building x*y key/value pairs from x arrays of y items)
How to write and read an array of objects to a file
How to best validate JSON on the server-side
Multiple level nesting in JSON schema validation with sub-folders
check if the json file string is empty in bash script
Load local JSON file into variable
Need jolt spec for dynamic colum
C++ JSON Serialization
Jolt spec to lookup keys of an array
Helm convert data for nginx template
jqGrid not rendering data - Headers visible
Performance parsing JSON on Android emulator vs physical device
jsonpath in bigquery doesn't support @ for filter. Suggestions for alternatives?
Create product in printful using API - Python
Set string variable to dynamic list column value gives error: Input parameter of operation Set Variable contains invalid expression. PowerAutomate
How to examine api without documentation?
add JSON objects to specific index - Python
What is the difference between a .JSON file and .JL file?
How to get the POST values from serializeArray?
How to parse JSON data with jQuery / JavaScript?