pyspark.sql.functions.instr#

pyspark.sql.functions.instr(str, substr)[source]#

Locate the position of the first occurrence of substr column in the given string. Returns null if either of the arguments are null.

New in version 1.5.0.

Changed in version 3.4.0: Supports Spark Connect.

Parameters

strColumn or column name: target column to work on.
substrColumn or literal string: substring to look for.

Changed in version 4.0.0: substr now accepts column.

Returns

Column: location of the first occurrence of the substring as integer.

See also

pyspark.sql.functions.locate()
pyspark.sql.functions.substr()
pyspark.sql.functions.substring()
pyspark.sql.functions.substring_index()

Notes

The position is not zero based, but 1 based index. Returns 0 if substr could not be found in str.

Examples

Example 1: Using a literal string as the ‘substring’

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([("abcd",), ("xyz",)], ["s",])
>>> df.select("*", sf.instr(df.s, "b")).show()
+----+-----------+
|   s|instr(s, b)|
+----+-----------+
|abcd|          2|
| xyz|          0|
+----+-----------+

Example 2: Using a Column ‘substring’

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([("abcd",), ("xyz",)], ["s",])
>>> df.select("*", sf.instr("s", sf.lit("abc").substr(0, 2))).show()
+----+---------------------------+
|   s|instr(s, substr(abc, 0, 2))|
+----+---------------------------+
|abcd|                          1|
| xyz|                          0|
+----+---------------------------+