Read Large JSON files in R using read_json()
read_json is a function from the jsonlite package that allows you to read JSON files in a memory-efficient way. It reads the file line by line, so it only loads a small portion of the data into memory at a time. This makes it a great choice for reading large JSON files.
Install the jsonlite library and load it
To read a large JSON file in R, one of the most popular packages is jsonlite. This package provides a simple and efficient way to parse JSON data and convert it into an R object. To install jsonlite, you can use the following command:
install.packages("jsonlite")
library(jsonlite)
Creating Random Dataset
Here we are creating our own dataset, you can create your own or you can use any JSON large dataset from any site.
R
library (jsonlite) # generate random id generate_id <- function () paste0 ( sample ( c ( letters , LETTERS , 0:9), 10, replace= TRUE ), collapse= "" ) # real first names of people first_names <- c ( "John" , "Jane" , "Michael" , "Emily" , "William" , "Ashley" , "David" , "Jessica" , "Andrew" , "Jennifer" , "Matthew" , "Sarah" , "Daniel" , "Amanda" , "Christopher" , "Elizabeth" , "Nicholas" , "Megan" , "Robert" , "Lauren" , "Joseph" , "Ava" , "Jacob" , "Sophia" , "Jonathan" , "Natalie" , "Ryan" , "Madison" , "Adam" , "Chloe" ) # real last names of people last_names <- c ( "Smith" , "Johnson" , "Williams" , "Jones" , "Brown" , "Davis" , "Miller" , "Wilson" , "Moore" , "Taylor" , "Anderson" , "Thomas" , "Jackson" , "White" , "Harris" , "Martin" , "Thompson" , "Garcia" , "Martinez" , "Robinson" , "Clark" , "Rodriguez" , "Lewis" , "Lee" , "Walker" , "Hall" , "Allen" , "King" , "Wright" , "Scott" ) # education qualifications qualifications <- c ( "Primary Education" , "Secondary Education" , "High School" , "Undergraduate" , "Postgraduate" ) # create a data frame df <- data.frame (ID = sapply (1:1000000, function (i) generate_id ()), First_Name = sample (first_names, 1000000, replace = TRUE ), Last_Name = sample (last_names, 1000000, replace = TRUE ), Age = sample (18:30, 1000000, replace = TRUE ), Highest_qualification = sample (qualifications, 1000000, replace = TRUE ), stringsAsFactors = FALSE ) # write the data frame to a JSON file write_json (df, "people.json" ) |
You can check the size of the file using the following code.
R
file.info ( "people.json" )$size |
Output:
113428352
Read the JSON file into R
The read_json() function will automatically detect the data structure of the JSON file and convert it into an R object, which can be a list or a data frame. Once you have the data in an R object, you can use all the standard R functions and packages to manipulate and analyze it.
You can use the read_json() function to read a JSON file into R. For example, to read a JSON file called “data.json” in your working directory, you would use the following code:
R
data <- jsonlite:: read_json ( "file.json" ) head (data, 3) |
Output:
How to Read Large JSON file in R
First, it is important to understand that JSON (JavaScript Object Notation), is a lightweight data-interchange format that is easy for humans to read and write and easy for machines to parse and generate. JSON files are often used for data transmission between a server and a web application and can be quite large in size.
In this article, we’ll cover the basics of using read_json and split to read large JSON files in R. We’ll also explore some advanced techniques for optimizing performance and reducing memory usage. Whether you’re a seasoned R programmer or a beginner, this article will provide you with the knowledge and skills you need to read large JSON files in R with confidence.