Example 1: Creating a JSON structure from a Pyspark DataFrame
In this example, we will create a Pyspark DataFrame and convert it to a JSON string. Firstly import all required modules and then create a spark session. Construct a Pyspark data frame schema using StructField() and then create a data frame using the creaDataFrame() function. Transform data frame to JSON object using toJSON() function and print that JSON file. We have saved this JSON file in “example1.json” file using file handling in Python.
Python3
from pyspark.sql.functions import * from pyspark.sql.types import * from pyspark.sql import SparkSession # Create a SparkSession spark = SparkSession.builder.appName( "JSON Creation" ).getOrCreate() # Define the PySpark DataFrame schema schema = StructType([ StructField( "name" , StringType()), StructField( "age" , IntegerType()), StructField( "city" , StringType()) ]) # Create a PySpark DataFrame data = [( "Shyam" , 25 , "New York" ), ( "Ram" , 30 , "San Francisco" )] df = spark.createDataFrame(data, schema) # Convert the PySpark DataFrame to a JSON string json_string = df.toJSON().collect()[ 0 ] print (json_string) # Write the JSON string to file with open ( "example1.json" , "w" ) as f: f.write(json_string) |
Output:
{"name":"Shyam","age":25,"city":"New York"}
Create a JSON structure in Pyspark
In this article, we are going to learn how to create a JSON structure using Pyspark in Python.
An influential and renowned means for dealing with massive amounts of information, Pyspark is an interface for Apache Spark in Python. Pyspark is a distributed processing system produced for managing large datasets which not just allows us to create Spark applications using Python, but also provides the Pyspark shell for interactively inspecting our data in a distributed environment. We’ll take a look at how to employ Pyspark to construct a JSON structure in this article.
In order to build a JSON structure in Pyspark, a Pyspark data frame must be converted into a JSON string. Numerous in-built modules and functions in Pyspark can make this transformation straightforward. The following Pyspark components and procedures will be engaged in the article:
- Pyspark.sql.functions: furnishes pre-assembled procedures for connecting with Pyspark DataFrames.
- Pyspark.sql.types: provides data types for defining Pyspark DataFrame schema.