pyspark.pandas.DataFrame.align

DataFrame.align(other: Union[DataFrame, Series], join: str = 'outer', axis: Union[int, str, None] = None, copy: bool = True) → Tuple[DataFrame, Union[DataFrame, Series]][source]

Align two objects on their axes with the specified join method.

Join method is specified for each axis Index.

Parameters
otherDataFrame or Series
join{{‘outer’, ‘inner’, ‘left’, ‘right’}}, default ‘outer’
axisallowed axis of the other object, default None

Align on index (0), columns (1), or both (None).

copybool, default True

Always returns new objects. If copy=False and no reindexing is required then original objects are returned.

Returns
(left, right)(DataFrame, type of other)

Aligned objects.

Examples

>>> ps.set_option("compute.ops_on_diff_frames", True)
>>> df1 = ps.DataFrame({"a": [1, 2, 3], "b": ["a", "b", "c"]}, index=[10, 20, 30])
>>> df2 = ps.DataFrame({"a": [4, 5, 6], "c": ["d", "e", "f"]}, index=[10, 11, 12])

Align both axis:

>>> aligned_l, aligned_r = df1.align(df2)
>>> aligned_l.sort_index()
      a     b   c
10  1.0     a NaN
11  NaN  None NaN
12  NaN  None NaN
20  2.0     b NaN
30  3.0     c NaN
>>> aligned_r.sort_index()
      a   b     c
10  4.0 NaN     d
11  5.0 NaN     e
12  6.0 NaN     f
20  NaN NaN  None
30  NaN NaN  None

Align only axis=0 (index):

>>> aligned_l, aligned_r = df1.align(df2, axis=0)
>>> aligned_l.sort_index()
      a     b
10  1.0     a
11  NaN  None
12  NaN  None
20  2.0     b
30  3.0     c
>>> aligned_r.sort_index()
      a     c
10  4.0     d
11  5.0     e
12  6.0     f
20  NaN  None
30  NaN  None

Align only axis=1 (column):

>>> aligned_l, aligned_r = df1.align(df2, axis=1)
>>> aligned_l.sort_index()
    a  b   c
10  1  a NaN
20  2  b NaN
30  3  c NaN
>>> aligned_r.sort_index()
    a   b  c
10  4 NaN  d
11  5 NaN  e
12  6 NaN  f

Align with the join type “inner”:

>>> aligned_l, aligned_r = df1.align(df2, join="inner")
>>> aligned_l.sort_index()
    a
10  1
>>> aligned_r.sort_index()
    a
10  4

Align with a Series:

>>> s = ps.Series([7, 8, 9], index=[10, 11, 12])
>>> aligned_l, aligned_r = df1.align(s, axis=0)
>>> aligned_l.sort_index()
      a     b
10  1.0     a
11  NaN  None
12  NaN  None
20  2.0     b
30  3.0     c
>>> aligned_r.sort_index()
10    7.0
11    8.0
12    9.0
20    NaN
30    NaN
dtype: float64
>>> ps.reset_option("compute.ops_on_diff_frames")