Apache Spark — Column Object

BigDataEnthusiast
3 min readOct 8, 2022

--

There are many ways to create a column object in Apache Spark.

Let’s explore.

A column can be created using the col & column function.

Another way is by creating dollar($)/symbol — backtick(`) prefixed strings. This will only work if you have imported the spark.implicits conversion. Check this blog for spark implicits link

In spark-shell in scala spark.implicits is already imported. So you directly create columns using $ strings.

Spark Implicits has member:

implicit class StringToColumn extends AnyRef Converts $"col name" into a Column.
implicit def symbolToColumn(s: Symbol): ColumnName An implicit conversion that turns a Scala Symbol into a Column.

See below example:

We can also create TypedColumns (having encoder for type). TypedColumn is created using as operator on a Column.

def as[U](implicit arg0: Encoder[U]): TypedColumn[Any, U] Provides a type hint about the expected return value of this column. This information can be used by operations such as select on a Dataset to automatically convert the results into the correct JVM types.

Till now we have seen columns which are generic columns not tied with any Dataframe.

+------------------------------+
| columns |
+------------------------------+
| $”first_name” |
| `first_name |
| col(“first_name”) |
| column(“first_name”) |
+------------------------------+

A new column can be constructed based on the input columns present in a DataFrame:

Extracting a struct field:

Column objects can be composed to form complex expressions:

--

--

BigDataEnthusiast

AWS Certified Data Engineer | Databricks Certified Apache Spark 3.0 Developer | Oracle Certified SQL Expert