spark.
transform
Applies a function that takes and returns a Spark column. It allows to natively apply a Spark function and column APIs with the Spark column internally used in Series or Index. The output length of the Spark column should be same as input’s.
Note
It requires to have the same input and output length; therefore, the aggregate Spark functions such as count does not work.
Function to use for transforming the data by using Spark columns.
Examples
>>> from pyspark.sql.functions import log >>> df = ps.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, columns=["a", "b"]) >>> df a b 0 1 4 1 2 5 2 3 6
>>> df.a.spark.transform(lambda c: log(c)) 0 0.000000 1 0.693147 2 1.098612 Name: a, dtype: float64
>>> df.index.spark.transform(lambda c: c + 10) Int64Index([10, 11, 12], dtype='int64')
>>> df.a.spark.transform(lambda c: c + df.b.spark.column) 0 5 1 7 2 9 Name: a, dtype: int64