How to use unionByName() In Python

In Spark 3.1, you can easily achieve this using unionByName() for Concatenating the dataframe

Syntax: dataframe_1.unionByName(dataframe_2)

where,

dataframe_1 is the first dataframe

dataframe_2 is the second dataframe

Example:

Python3

# union the two dataftames by using unionByname 
result1 = df1.unionByName(df2) 
  
# display 
result1.show() 

Output:

+------+----------+------+------+
|  Name|       DOB|Gender|salary|
+------+----------+------+------+
|   Ram|1991-04-01|     M|  3000|
|  Mike|2000-05-19|     M|  4000|
|Rohini|1978-09-05|     M|  4000|
| Maria|1967-12-01|     F|  4000|
| Jenis|1980-02-17|     F|  1200|
|   Ram|1991-04-01|     M|  3000|
|  Mike|2000-05-19|     M|  4000|
|Rohini|1978-09-05|     M|  4000|
| Maria|1967-12-01|     F|  4000|
| Jenis|1980-02-17|     F|  1200|
+------+----------+------+------+

Concatenate two PySpark dataframes

In this article, we are going to see how to concatenate two pyspark dataframe using Python.

Creating Dataframe for demonstration:

Python3

# Importing necessary libraries 
from pyspark.sql import SparkSession 
  
# Create a spark session 
spark = SparkSession.builder.appName('pyspark - example join').getOrCreate() 
  
# Create data in dataframe 
data = [(('Ram'), '1991-04-01', 'M', 3000), 
        (('Mike'), '2000-05-19', 'M', 4000), 
        (('Rohini'), '1978-09-05', 'M', 4000), 
        (('Maria'), '1967-12-01', 'F', 4000), 
        (('Jenis'), '1980-02-17', 'F', 1200)] 
  
# Column names in dataframe 
columns = ["Name", "DOB", "Gender", "salary"] 
  
# Create the spark dataframe 
df1 = spark.createDataFrame(data=data, schema=columns) 
  
# Print the dataframe 
df1.show() 

Output:

+------+----------+------+------+
|  Name|       DOB|Gender|salary|
+------+----------+------+------+
|   Ram|1991-04-01|     M|  3000|
|  Mike|2000-05-19|     M|  4000|
|Rohini|1978-09-05|     M|  4000|
| Maria|1967-12-01|     F|  4000|
| Jenis|1980-02-17|     F|  1200|
+------+----------+------+------+

Creating Second dataframe for demonstration:

Python3

# Create data in dataframe 
data2 = [(('Mohi'), '1991-04-01', 'M', 3000), 
         (('Ani'), '2000-05-19', 'F', 4300), 
         (('Shipta'), '1978-09-05', 'F', 4200), 
         (('Jessy'), '1967-12-01', 'F', 4010), 
         (('kanne'), '1980-02-17', 'F', 1200)] 
  
# Column names in dataframe 
columns = ["Name", "DOB", "Gender", "salary"] 
  
# Create the spark dataframe 
df2 = spark.createDataFrame(data=data, schema=columns) 
  
# Print the dataframe 
df2.show() 

Output:

+------+----------+------+------+
|  Name|       DOB|Gender|salary|
+------+----------+------+------+
|   Ram|1991-04-01|     M|  3000|
|  Mike|2000-05-19|     M|  4000|
|Rohini|1978-09-05|     M|  4000|
| Maria|1967-12-01|     F|  4000|
| Jenis|1980-02-17|     F|  1200|
+------+----------+------+------+

How to use unionByName() In Python

Python3

Concatenate two PySpark dataframes

Python3

Python3

Categories

Contact US

How to use unionByName() In Python

Python3

Concatenate two PySpark dataframes

Python3

Python3

Similar Reads

Categories

Contact US