python-3.x excel pyspark azure-databricks

How to read excel file (.xlsx) using Pyspark and store it in dataframe?

I have data in excel file (.xlsx). How to read this excel data and store it in the data frame in spark?

Solution

On your databricks cluster, install following 2 libraries:

Clusters -> select your cluster -> Libraries -> Install New -> Maven -> in Coordinates: com.crealytics:spark-excel_2.12:0.13.5

Clusters -> select your cluster -> Libraries -> Install New -> PyPI-> in Package: xlrd

Then, you will be able to read your excel as follows:

sparkDF = spark.read.format("com.crealytics.spark.excel")
    .option("header", "true") \
    .option("inferSchema", "true") \
    .option("dataAddress", "'NameOfYourExcelSheet'!A1") \
    .load(filePath)

ERROR: Could not build wheels for tgcrypto, which is required to install pyproject.toml-based projects
How to tackle time limit exceeded error in leetcode
Google Cloud Functions - ImportError: cannot import name 'InvalidKeyError' from 'jwt.exceptions'
OpenAI Assistants API error: "'Assistants' has no attribute 'files'"
Pip - Fatal error in launcher: Unable to create process using '"'
Generate all n choose k binary vectors python
Keyringrc.cfg Permission Issues when installing packages via a python script using poetry
`del` statement and free variables
Python 3: Multiply a vector by a matrix without NumPy
Airflow Task Group Execution Order
Why integers do not have size limit in python 3?
Python 3 superclass instantiation via derived class's default constructor
How To Choose Values From Tensor Using Another Tensor In Tensorflow
How to decorate console logger messages in Python?
Python loop on list
Why is "pip install gym" failing with "python setup.py egg_info did not run successfully" errors?
How to execute .py file with double-click on Windows
How to download xlsxwriter files in browser using BytesIO?
Document Intelligence Studio gives High latency while working on OCR (Handwritten)
What should be the redirect URL for an API request?
TypeError: 'cmp' is an invalid keyword argument for this function
fill nearest value in a column when null of pandas data frame
How to convert a dictionary into a subscriptable array?
How can I prevent the Anti-virus from detecting my app as a virus or malware when another user tries to install it?
Configuring interpreter in PyCharm: "please use a different SDK name"
String-based enum in Python
Detect mobile devices with Django and Python 3
Set operations: should only work with sets, but works with dict_keys?
JSON writes one list element per line
No module named 'PIL'