In the latest version of PySpark, there’s a fantastic new feature that
simplifies how we run SQL queries on DataFrames.
Previously, we had to register our DataFrames as temporary views using
createOrReplaceTempView
or createOrReplaceGlobalTempView
before we could run SQL queries.
But now, we can skip that step and run SQL queries directly on our DataFrames!
How cool is that?
First, let’s start by creating a Spark session and a sample DataFrame.
Alright, so we have our Spark session and a DataFrame with some sample data.
Now,let’s move on to the exciting part – running SQL queries directly on this DataFrame without creating a temp view.
Here’s the magic line of code: