How to create StructType schema from JSON schema | PySpark

BigDataEnthusiast
2 min readNov 25, 2024

Using Apache Spark class pyspark.sql.types.StructType method fromJson we can create StructType schema using a defined JSON schema.

Refer official documentation link.

classmethod fromJson(json: Dict[str, Any]) → pyspark.sql.types.StructType

Here is the simple example of converting a JSON schema to StructType. In order to get the correct StructType, we have adhere the syntax of json schema.

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
import json

spark = SparkSession.builder.getOrCreate()

json_schema = """
{
"fields": [
{
"metadata": {},
"name": "age",
"nullable": true,
"type": "long"
},
{
"metadata": {},
"name": "firstName",
"nullable": true,
"type": "string"
},
{
"metadata": {},
"name": "lastName",
"nullable": true,
"type": "string"
}
],
"type": "struct"
}
"""


structType_schema = StructType.fromJson(json.loads(json_schema))
print(structType_schema)

Output:

StructType(List(StructField(age,LongType,true),StructField(firstName,StringType,true),StructField(lastName,StringType,true)))

Another example, here reading a complex json file using spark.read and printing its json schema. As we know that DataFrame.schema return StructType &StructType.json() method return json schema.

import json
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType

spark = SparkSession.builder.getOrCreate()

df = spark.read.option('multiLine', True).json("data/cust.json")
json_schema = df.schema.json()
{
"fields":[
{
"metadata":{

},
"name":"address",
"nullable":true,
"type":{
"fields":[
{
"metadata":{

},
"name":"city",
"nullable":true,
"type":"string"
},
{
"metadata":{

},
"name":"postalCode",
"nullable":true,
"type":"string"
},
{
"metadata":{

},
"name":"state",
"nullable":true,
"type":"string"
},
{
"metadata":{

},
"name":"streetAddress",
"nullable":true,
"type":"string"
}
],
"type":"struct"
}
},
{
"metadata":{

},
"name":"age",
"nullable":true,
"type":"long"
},
{
"metadata":{

},
"name":"firstName",
"nullable":true,
"type":"string"
},
{
"metadata":{

},
"name":"lastName",
"nullable":true,
"type":"string"
},
{
"metadata":{

},
"name":"phoneNumber",
"nullable":true,
"type":{
"containsNull":true,
"elementType":{
"fields":[
{
"metadata":{

},
"name":"number",
"nullable":true,
"type":"string"
},
{
"metadata":{

},
"name":"type",
"nullable":true,
"type":"string"
}
],
"type":"struct"
},
"type":"array"
}
}
],
"type":"struct"
}

This we can use this to build more complex schemas.

References:

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

BigDataEnthusiast
BigDataEnthusiast

Written by BigDataEnthusiast

AWS Certified Data Engineer | Databricks Certified Apache Spark 3.0 Developer | Oracle Certified SQL Expert

No responses yet

Write a response