Example 1: Creating a JSON structure from a Pyspark DataFrame

In this example, we will create a Pyspark DataFrame and convert it to a JSON string. Firstly import all required modules and then create a spark session. Construct a Pyspark data frame schema using StructField() and then create a data frame using the creaDataFrame() function. Transform data frame to JSON object using toJSON() function and print that JSON file. We have saved this JSON file in “example1.json” file using file handling in Python.

Python3




from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql import SparkSession
  
# Create a SparkSession
spark = SparkSession.builder.appName("JSON Creation").getOrCreate()
  
# Define the PySpark DataFrame schema
schema = StructType([
    StructField("name", StringType()),
    StructField("age", IntegerType()),
    StructField("city", StringType())
])
  
# Create a PySpark DataFrame
data = [("Shyam", 25, "New York"),
        ("Ram", 30, "San Francisco")]
df = spark.createDataFrame(data, schema)
  
# Convert the PySpark DataFrame to a JSON string
json_string = df.toJSON().collect()[0]
  
print(json_string)
  
# Write the JSON string to file
with open("example1.json", "w") as f:
    f.write(json_string)


Output:

{"name":"Shyam","age":25,"city":"New York"}

Create a JSON structure in Pyspark

In this article, we are going to learn how to create a JSON structure using Pyspark in Python.

An influential and renowned means for dealing with massive amounts of information, Pyspark is an interface for Apache Spark in Python. Pyspark is a distributed processing system produced for managing large datasets which not just allows us to create Spark applications using Python, but also provides the Pyspark shell for interactively inspecting our data in a distributed environment. We’ll take a look at how to employ Pyspark to construct a JSON structure in this article.

In order to build a JSON structure in Pyspark, a Pyspark data frame must be converted into a JSON string. Numerous in-built modules and functions in Pyspark can make this transformation straightforward. The following Pyspark components and procedures will be engaged in the article:

  • Pyspark.sql.functions: furnishes pre-assembled procedures for connecting with Pyspark DataFrames.
  • Pyspark.sql.types: provides data types for defining Pyspark DataFrame schema.

Similar Reads

Example 1: Creating a JSON structure from a Pyspark DataFrame

In this example, we will create a Pyspark DataFrame and convert it to a JSON string. Firstly import all required modules and then create a spark session. Construct a Pyspark data frame schema using StructField() and then create a data frame using the creaDataFrame() function. Transform data frame to JSON object using toJSON() function and print that JSON file. We have saved this JSON file in “example1.json” file using file handling in Python....

Example 2: Transforming a Pyspark DataFrame with an array into a JSON format.

...