Add Column When not Exists on DataFrame

In this method, the user can add a column when it is not existed by adding a column with the lit() function and checking using if the condition.

Syntax:

if 'column_name' not in dataframe.columns:
   dataframe.withColumn("column_name",lit(value))

where,

dataframe. columns are used to get the column names

Example:

In this example, we add a column of the salary to 34000 using the if condition with the withColumn() and the lit() function.

Python3

# importing module
import pyspark
 
# import concat_ws and lit function
from pyspark.sql.functions import concat_ws, lit
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
# add salary column by checking its existence
if 'salary' not in dataframe.columns:
    dataframe.withColumn("salary", lit(34000)).show()

Output:

How to add a new column to a PySpark DataFrame ?

In this article, we will discuss how to add a new column to PySpark Dataframe.

Create the first data frame for demonstration:

Here, we will be creating the sample data frame which we will be used further to demonstrate the approach purpose.

Python3

# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
dataframe.show()

Output:

Add Column When not Exists on DataFrame

Python3

How to add a new column to a PySpark DataFrame ?

Python3

Categories

Contact US

Add Column When not Exists on DataFrame

Python3

How to add a new column to a PySpark DataFrame ?

Python3

Similar Reads

Categories

Contact US