Add Column to DataFrame using SQL Expression

In this method, the user has to use SQL expression with SQL function to add a column. Before that, we have to create a temporary view, From that view, we have to add and select columns.

Syntax:

dataframe.createOrReplaceTempView("name")
spark.sql("select 'value' as column_name from view")

where,

dataframe is the input dataframe
name is the temporary view name
sql function will take SQL expression as input to add a column
column_name is the new column name
value is the column value

Example:

Add new column named salary with 34000 value

Python3

# importing module
import pyspark
 
# import concat_ws and lit function
from pyspark.sql.functions import concat_ws, lit
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
# create view
dataframe.createOrReplaceTempView("view")
 
# add new column named salary with 34000 value
spark.sql("select '34000' as salary from view").show()

Output:

How to add a new column to a PySpark DataFrame ?

In this article, we will discuss how to add a new column to PySpark Dataframe.

Create the first data frame for demonstration:

Here, we will be creating the sample data frame which we will be used further to demonstrate the approach purpose.

Python3

# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
dataframe.show()

Output:

Add Column to DataFrame using SQL Expression

Python3

How to add a new column to a PySpark DataFrame ?

Python3

Categories

Contact US

Add Column to DataFrame using SQL Expression

Python3

How to add a new column to a PySpark DataFrame ?

Python3

Similar Reads

Categories

Contact US