pyspark.sql.functions.array_max#

pyspark.sql.functions.array_max(col)[source]#

Array function: returns the maximum value of the array.

New in version 2.4.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters
colColumn or str

The name of the column or an expression that represents the array.

Returns
Column

A new column that contains the maximum value of each array.

Examples

Example 1: Basic usage with integer array

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([2, 1, 3],), ([None, 10, -1],)], ['data'])
>>> df.select(sf.array_max(df.data)).show()
+---------------+
|array_max(data)|
+---------------+
|              3|
|             10|
+---------------+

Example 2: Usage with string array

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(['apple', 'banana', 'cherry'],)], ['data'])
>>> df.select(sf.array_max(df.data)).show()
+---------------+
|array_max(data)|
+---------------+
|         cherry|
+---------------+

Example 3: Usage with mixed type array

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(['apple', 1, 'cherry'],)], ['data'])
>>> df.select(sf.array_max(df.data)).show()
+---------------+
|array_max(data)|
+---------------+
|         cherry|
+---------------+

Example 4: Usage with array of arrays

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([([[2, 1], [3, 4]],)], ['data'])
>>> df.select(sf.array_max(df.data)).show()
+---------------+
|array_max(data)|
+---------------+
|         [3, 4]|
+---------------+

Example 5: Usage with empty array

>>> from pyspark.sql import functions as sf
>>> from pyspark.sql.types import ArrayType, IntegerType, StructType, StructField
>>> schema = StructType([
...   StructField("data", ArrayType(IntegerType()), True)
... ])
>>> df = spark.createDataFrame([([],)], schema=schema)
>>> df.select(sf.array_max(df.data)).show()
+---------------+
|array_max(data)|
+---------------+
|           NULL|
+---------------+