pyspark.sql.DataFrame.__getitem__

DataFrame.__getitem__(item: Union[int, str, pyspark.sql.column.Column, List, Tuple]) → Union[pyspark.sql.column.Column, pyspark.sql.dataframe.DataFrame][source]

Returns the column as a Column.

New in version 1.3.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
itemint, str, Column, list or tuple

column index, column name, column, or a list or tuple of columns

Returns
Column or DataFrame

a specified column, or a filtered or projected dataframe.

  • If the input item is an int or str, the output is a Column.

  • If the input item is a Column, the output is a DataFrame

    filtered by this given Column.

  • If the input item is a list or tuple, the output is a DataFrame

    projected by this given list or tuple.

Examples

>>> df = spark.createDataFrame([
...     (2, "Alice"), (5, "Bob")], schema=["age", "name"])

Retrieve a column instance.

>>> df.select(df['age']).show()
+---+
|age|
+---+
|  2|
|  5|
+---+
>>> df.select(df[1]).show()
+-----+
| name|
+-----+
|Alice|
|  Bob|
+-----+

Select multiple string columns as index.

>>> df[["name", "age"]].show()
+-----+---+
| name|age|
+-----+---+
|Alice|  2|
|  Bob|  5|
+-----+---+
>>> df[df.age > 3].show()
+---+----+
|age|name|
+---+----+
|  5| Bob|
+---+----+
>>> df[df[0] > 3].show()
+---+----+
|age|name|
+---+----+
|  5| Bob|
+---+----+