How to create StructType schema from JSON schema | PySpark
Using Apache Spark class pyspark.sql.types.StructType
method fromJson
we can create StructType schema using a defined JSON schema.
Refer official documentation link.

classmethod fromJson(json: Dict[str, Any]) → pyspark.sql.types.StructType
Here is the simple example of converting a JSON schema to StructType
. In order to get the correct StructType, we have adhere the syntax of json schema.
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
import json
spark = SparkSession.builder.getOrCreate()
json_schema = """
{
"fields": [
{
"metadata": {},
"name": "age",
"nullable": true,
"type": "long"
},
{
"metadata": {},
"name": "firstName",
"nullable": true,
"type": "string"
},
{
"metadata": {},
"name": "lastName",
"nullable": true,
"type": "string"
}
],
"type": "struct"
}
"""
structType_schema = StructType.fromJson(json.loads(json_schema))
print(structType_schema)
Output:
StructType(List(StructField(age,LongType,true),StructField(firstName,StringType,true),StructField(lastName,StringType,true)))
Another example, here reading a complex json file using spark.read
and printing its json schema. As we know that DataFrame.schema
return StructType
&StructType.json()
method return json schema.
import json
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType
spark = SparkSession.builder.getOrCreate()
df = spark.read.option('multiLine', True).json("data/cust.json")
json_schema = df.schema.json()
{
"fields":[
{
"metadata":{
},
"name":"address",
"nullable":true,
"type":{
"fields":[
{
"metadata":{
},
"name":"city",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"postalCode",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"state",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"streetAddress",
"nullable":true,
"type":"string"
}
],
"type":"struct"
}
},
{
"metadata":{
},
"name":"age",
"nullable":true,
"type":"long"
},
{
"metadata":{
},
"name":"firstName",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"lastName",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"phoneNumber",
"nullable":true,
"type":{
"containsNull":true,
"elementType":{
"fields":[
{
"metadata":{
},
"name":"number",
"nullable":true,
"type":"string"
},
{
"metadata":{
},
"name":"type",
"nullable":true,
"type":"string"
}
],
"type":"struct"
},
"type":"array"
}
}
],
"type":"struct"
}
This we can use this to build more complex schemas.