pyspark.pandas.DataFrame.insert¶

DataFrame.insert(loc: int, column: Union[Any, Tuple[Any, …]], value: Union[int, float, bool, str, bytes, decimal.Decimal, datetime.date, datetime.datetime, None, Series, Iterable], allow_duplicates: bool = False) → None[source]¶

Insert column into DataFrame at specified location.

Raises a ValueError if column is already contained in the DataFrame, unless allow_duplicates is set to True.

Parameters

locint: Insertion index. Must verify 0 <= loc <= len(columns).
columnstr, number, or hashable object: Label of the inserted column.
valueint, Series, or array-like
allow_duplicatesbool, optional

Examples

>>> psdf = ps.DataFrame([1, 2, 3])
>>> psdf.sort_index()
   0
0  1
1  2
2  3
>>> psdf.insert(0, 'x', 4)
>>> psdf.sort_index()
   x  0
0  4  1
1  4  2
2  4  3

>>> from pyspark.pandas.config import set_option, reset_option
>>> set_option("compute.ops_on_diff_frames", True)

>>> psdf.insert(1, 'y', [5, 6, 7])
>>> psdf.sort_index()
   x  y  0
0  4  5  1
1  4  6  2
2  4  7  3

>>> psdf.insert(2, 'z', ps.Series([8, 9, 10]))
>>> psdf.sort_index()
   x  y   z  0
0  4  5   8  1
1  4  6   9  2
2  4  7  10  3

>>> reset_option("compute.ops_on_diff_frames")

pyspark.pandas.DataFrame.update pyspark.pandas.DataFrame.shift