How to read multiline JSON in spark

How to read multiline JSON in spark



Let data be in below multiline format.

Multiline JSON is like where we declare the each of the row as an individual json object enclosed by the square brace.

Multiline JSON is not split-able in spark while loading the file while single-line JSON is split-able.

example.json

[
{"name": "shiva","gender": "male","id": "1"},
{"name": "ram","gender": "male","id": "2"},
{"name": "raju","gender": "male","id": "3"}
]

Row 1: {"name": "shiva","gender": "male","id": "1"},
Row 2: {"name": "ram","gender": "male","id": "2"},
Row 3: {"name": "raju","gender": "male","id": "3"}


Source Code:

df = spark.read.option("multiline", "true").json("s3://mybucket/example.json")

df.show()


Output:





Comments