Add New Column With Constant Value

In this approach to add a new column with constant values, the user needs to call the lit() function parameter of the withColumn() function and pass the required parameters into these functions. Here, the lit() is available in pyspark.sql. Functions module.

Syntax:

dataframe.withColumn("column_name", lit(value))

where,

  • dataframe is the pyspark input dataframe
  • column_name is the new column to be added
  • value is the constant value to be assigned to this column

Example:

In this example, we add a column named salary with a value of 34000 to the above dataframe using the withColumn() function with the lit() function as its parameter in the python programming language.

Python3




# importing module
import pyspark
 
# import lit function
from pyspark.sql.functions import lit
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
# Add a column named salary with value as 34000
dataframe.withColumn("salary", lit(34000)).show()


Output:

How to add a new column to a PySpark DataFrame ?

In this article, we will discuss how to add a new column to PySpark Dataframe.

Create the first data frame for demonstration:

Here, we will be creating the sample data frame which we will be used further to demonstrate the approach purpose.

Python3




# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
dataframe.show()


Output:

Similar Reads

Method 1: Add New Column With Constant Value

...

Method 2: Add Column Based on Another Column of DataFrame

In this approach to add a new column with constant values, the user needs to call the lit() function parameter of the withColumn() function and pass the required parameters into these functions. Here, the lit() is available in pyspark.sql. Functions module....

Method 3: Add Column When not Exists on DataFrame

...

Method 4: Add Column to DataFrame using select()

Under this approach, the user can add a new column based on an existing column in the given dataframe....

Method 5: Add Column to DataFrame using SQL Expression

...

Method 6: Add Column Value Based on Condition

...