How to use ‘spark.sql()’ In Python

The spark.sql() method helps to run relational SQL queries inside spark itself. It allows the execution of relational queries, including those expressed in SQL using Spark.

Syntax: spark.sql(expression)

Example: Using ‘spark.sql()’

Python

reg_df.createOrReplaceTempView("reg_view") 
  
reg_df2 = spark.sql(''' 
SELECT  
  SUBSTR(LicenseNo, 1, 3)  AS State, 
  SUBSTR(LicenseNo, 3, 4)  AS RegYear, 
  SUBSTR(LicenseNo, 7, 8)  AS RegID, 
  SUBSTR(ExpiryDate, 1, 4) AS ExpYr, 
  SUBSTR(ExpiryDate, 6, 2) AS ExpMo, 
  SUBSTR(ExpiryDate, 9, 2) AS ExpDt 
FROM reg_view; 
''') 
  
reg_df2.show()

Output:

Here, we can see the expression used inside the spark.sql() is a relational SQL query. We can use the same in an SQL query editor as well to fetch the respective output.

How to check for a substring in a PySpark dataframe ?

In this article, we are going to see how to check for a substring in PySpark dataframe.

Substring is a continuous sequence of characters within a larger string size. For example, “learning pyspark” is a substring of “I am learning pyspark from w3wiki”. Let us look at different ways in which we can find a substring from one or more columns of a PySpark dataframe.

Creating Dataframe for demonstration:

Python

# importing module 
import pyspark 
  
# importing sparksession from pyspark.sql module 
from pyspark.sql import SparkSession 
  
# creating sparksession and giving an app name 
spark = SparkSession.builder.appName('sparkdf').getOrCreate() 
  
  
# Column names for the dataframe 
columns = ["LicenseNo", "ExpiryDate"] 
  
# Row data for the dataframe 
data = [ 
    ("MH201411094334", "2024-11-19"), 
    ("AR202027563890", "2030-03-16"), 
    ("UP202010345567", "2035-12-30"), 
    ("KN201822347800", "2028-10-29"), 
] 
  
# Create the dataframe using the above values 
reg_df = spark.createDataFrame(data=data, 
                               schema=columns) 
  
# View the dataframe 
reg_df.show() 

Output:

In the above dataframe, LicenseNo is composed of 3 information, 2-letter State Code + Year of registration + 8 digit registration number.

How to use ‘spark.sql()’ In Python

Python

How to check for a substring in a PySpark dataframe ?

Python

Categories

Contact US

How to use ‘spark.sql()’ In Python

Python

How to check for a substring in a PySpark dataframe ?

Python

Similar Reads

Categories

Contact US