The right way to use the new pyspark.pandas?

This recent blog post from Databricks https://databricks.com/blog/2021/10/04/pandas-api-on-upcoming-apache-spark-3-2.html says that the only change needed to a pandas program to run it under pyspark.pandas is to change from pandas import read_csv to from pyspark.pandas import read_csv.

But that does not seem right. What about all the other (non read_csv) references to pandas? Isn't the right approach to change import pandas as pd to import pyspark.pandas as pd? Then all the other pandas references in your existing program will point to the pyspark version of pandas.

Solution

You got that right. The canonical way they have suggested, however, is, from pyspark import pandas as ps

Breaking long method chains into multiple lines in Python
what's the inverse of the quantile function on a pandas Series?
Writing back to a panda groupby group
Plotly change color mapping interactively based on data frame values
Find the index of the current df value in another series and add to a column
How to add a new row to an existing DataFrame which is the sum of two rows?
Pandas Error: need to escape, but no escapechar set
obtaining last value of dataframe column without index
Using named columns and relative row numbers with Pandas 3
How to convert index of a pandas dataframe into a column
Conditional mapping in pandas
How to stream DataFrame using FastAPI without saving the data to csv file?
How to control scientific notation in matplotlib?
How do I create a multiline plot using seaborn?
Pandas to Excel - make part of the text bold
How to extract multiple JSON objects from one file?
How to create a column with randomly generated values in a pandas dataframe
Convert Categorical codes to Categorical values
Polars vs. Pandas: size and speed difference
Visualizing Relationships Between Heterogeneous Data Variables in a Pandas DataFrame
Pandas.DataFrame.query Series.str.startswith Tuple returns Empty
How do I make Pandas resample starting first day of each year in DataFrame
Python Pandas - how to read in data from list (data) and columns (separate list)
Converting a pandas dataframe in wide format to long format
Reshape wide to long in pandas
How to drop columns which have same values in all rows via pandas or spark dataframe?
concatenate all strings in the dataframe column
TypeError: Cannot convert numpy.ndarray to numpy.ndarray
extracting days from a numpy.timedelta64 value
How can I run a code where I can plot and save multiple hours of the latest GFS model run?