Apache Spark — Column Object

BigDataEnthusiast
3 min readOct 8, 2022

There are many ways to create a column object in Apache Spark.

Let’s explore.

A column can be created using the col & column function.

Another way is by creating dollar($)/symbol — backtick(`) prefixed strings. This will only work if you have imported the spark.implicits conversion. Check this blog for spark implicits link

In spark-shell in scala spark.implicits is already imported. So you directly create columns using $ strings.

Spark Implicits has member:

implicit class StringToColumn extends AnyRef Converts $"col name" into a Column.
implicit def symbolToColumn(s: Symbol): ColumnName An implicit conversion that turns a Scala Symbol into a Column.

See below example:

We can also create TypedColumns (having encoder for type). TypedColumn is created using as operator on a Column.

def as[U](implicit arg0: Encoder[U]): TypedColumn[Any, U] Provides a type hint about the expected return value of this column. This information can be used by operations such as select on a Dataset to automatically convert the results into the correct JVM types.

Till now we have seen columns which are generic columns not tied with any Dataframe.

+------------------------------+
| columns |
+------------------------------+
| $”first_name” |
| `first_name |
| col(“first_name”) |
| column(“first_name”) |
+------------------------------+

A new column can be constructed based on the input columns present in a DataFrame:

Extracting a struct field:

Column objects can be composed to form complex expressions:

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

BigDataEnthusiast
BigDataEnthusiast

Written by BigDataEnthusiast

AWS Certified Data Engineer | Databricks Certified Apache Spark 3.0 Developer | Oracle Certified SQL Expert

No responses yet

Write a response