Add Column Based on Another Column of DataFrame

Under this approach, the user can add a new column based on an existing column in the given dataframe.

Example 1: Using withColumn() method

Here, under this example, the user needs to specify the existing column using the withColumn() function with the required parameters passed in the python programming language.

Syntax:

dataframe.withColumn("column_name", dataframe.existing_column)

where,

dataframe is the input dataframe
column_name is the new column
existing_column is the column which is existed

In this example, we are adding a column named salary from the ID column with multiply of 2300 using the withColumn() method in the python language,

Python3

# importing module
import pyspark
 
# import lit function
from pyspark.sql.functions import lit
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
# Add a column named salary from ID column with multiply of 2300
dataframe.withColumn("salary", dataframe.ID*2300).show()

Output:

Example 2 : Using concat_ws()

Under this example, the user has to concat the two existing columns and make them as a new column by importing this method from pyspark.sql.functions module.

Syntax:

dataframe.withColumn(“column_name”, concat_ws(“Separator”,”existing_column1″,’existing_column2′))

where,

dataframe is the input dataframe
column_name is the new column name
existing_column1 and existing_column2 are the two columns to be added with Separator to make values to the new column
Separator is like the operator between values with two columns

Example:

In this example, we add a column named Details from Name and Company columns separated by “-” in the python language.

Python3

# importing module
import pyspark
 
# import concat_ws function
from pyspark.sql.functions import concat_ws
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
# Add a column named Details from Name and Company columns separated by -
dataframe.withColumn("Details", concat_ws("-", "NAME", 'Company')).show()

Output:

How to add a new column to a PySpark DataFrame ?

In this article, we will discuss how to add a new column to PySpark Dataframe.

Create the first data frame for demonstration:

Here, we will be creating the sample data frame which we will be used further to demonstrate the approach purpose.

Python3

# importing module
import pyspark
 
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
 
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
 
# list  of employee data
data = [["1", "sravan", "company 1"],
        ["2", "ojaswi", "company 1"],
        ["3", "rohith", "company 2"],
        ["4", "sridevi", "company 1"],
        ["5", "bobby", "company 1"]]
 
# specify column names
columns = ['ID', 'NAME', 'Company']
 
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
 
dataframe.show()

Output:

Add Column Based on Another Column of DataFrame

Example 1: Using withColumn() method

Python3

Example 2 : Using concat_ws()

Python3

How to add a new column to a PySpark DataFrame ?

Python3

Categories

Contact US

Add Column Based on Another Column of DataFrame

Example 1: Using withColumn() method

Python3

Example 2 : Using concat_ws()

Python3

How to add a new column to a PySpark DataFrame ?

Python3

Similar Reads

Categories

Contact US