Add Column to DataFrame using SQL Expression

In this method, the user has to use SQL expression with SQL function to add a column. Before that, we have to create a temporary view, From that view, we have to add and select columns.

Syntax:

dataframe.createOrReplaceTempView("name")
spark.sql("select 'value' as column_name from view")

where,

  • dataframe is the input dataframe
  • name is the temporary view name
  • sql function will take SQL expression as input to add a column
  • column_name is the new column name
  • value is the column value

Example:

Add new column named salary with 34000 value

Python3




# importing module
import pyspark
 
# import concat_ws and lit function
from pyspark.sql.functions import concat_ws, lit
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
# create view
dataframe.createOrReplaceTempView("view")
 
# add new column named salary with 34000 value
spark.sql("select '34000' as salary from view").show()


Output:

How to add a new column to a PySpark DataFrame ?

In this article, we will discuss how to add a new column to PySpark Dataframe.

Create the first data frame for demonstration:

Here, we will be creating the sample data frame which we will be used further to demonstrate the approach purpose.

Python3




# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
dataframe.show()


Output:

Similar Reads

Method 1: Add New Column With Constant Value

...

Method 2: Add Column Based on Another Column of DataFrame

In this approach to add a new column with constant values, the user needs to call the lit() function parameter of the withColumn() function and pass the required parameters into these functions. Here, the lit() is available in pyspark.sql. Functions module....

Method 3: Add Column When not Exists on DataFrame

...

Method 4: Add Column to DataFrame using select()

Under this approach, the user can add a new column based on an existing column in the given dataframe....

Method 5: Add Column to DataFrame using SQL Expression

...

Method 6: Add Column Value Based on Condition

...