Apache Spark — Column Object

3 min readOct 8, 2022

There are many ways to create a column object in Apache Spark.

Let’s explore.

A column can be created using the col & column function.

Another way is by creating dollar($)/symbol — backtick(`) prefixed strings. This will only work if you have imported the spark.implicits conversion. Check this blog for spark implicits link

In spark-shell in scala spark.implicits is already imported. So you directly create columns using $ strings.

Spark Implicits has member:

implicit class StringToColumn extends AnyRef Converts $"col name" into a Column.

implicit def symbolToColumn(s: Symbol): ColumnName An implicit conversion that turns a Scala Symbol into a Column.

See below example:

We can also create TypedColumns (having encoder for type). TypedColumn is created using as operator on a Column.

def as[U](implicit arg0: Encoder[U]): TypedColumn[Any, U] Provides a type hint about the expected return value of this column. This information can be used by operations such as select on a Dataset to automatically convert the results into the correct JVM types.

Till now we have seen columns which are generic columns not tied with any Dataframe.

+------------------------------+
|  columns                     |    
+------------------------------+
| $”first_name”                | 
| `first_name                  |
| col(“first_name”)            |
| column(“first_name”)         | 
+------------------------------+

A new column can be constructed based on the input columns present in a DataFrame:

Extracting a struct field:

Column objects can be composed to form complex expressions:

References:

Apache Spark — Column Object

Written by BigDataEnthusiast

No responses yet