DataFrame.
kde
Generate Kernel Density Estimate plot using Gaussian kernels.
The method used to calculate the estimator bandwidth. See KernelDensity in PySpark for more information.
Evaluation points for the estimated PDF. If None (default), 1000 equally spaced points are used. If ind is a NumPy array, the KDE is evaluated at the points passed. If ind is an integer, ind number of equally spaced points are used.
Keyword arguments to pass on to pandas-on-Spark.Series.plot().
pandas-on-Spark.Series.plot()
plotly.graph_objs.Figure
Return an custom object when backend!=plotly. Return an ndarray when subplots=True (matplotlib-only).
backend!=plotly
subplots=True
Examples
A scalar bandwidth should be specified. Using a small bandwidth value can lead to over-fitting, while using a large bandwidth value may result in under-fitting:
>>> s = ps.Series([1, 2, 2.5, 3, 3.5, 4, 5]) >>> s.plot.kde(bw_method=0.3)
>>> s = ps.Series([1, 2, 2.5, 3, 3.5, 4, 5]) >>> s.plot.kde(bw_method=3)
The ind parameter determines the evaluation points for the plot of the estimated KDF:
>>> s = ps.Series([1, 2, 2.5, 3, 3.5, 4, 5]) >>> s.plot.kde(ind=[1, 2, 3, 4, 5], bw_method=0.3)
For DataFrame, it works in the same way as Series:
>>> df = ps.DataFrame({ ... 'x': [1, 2, 2.5, 3, 3.5, 4, 5], ... 'y': [4, 4, 4.5, 5, 5.5, 6, 6], ... }) >>> df.plot.kde(bw_method=0.3)
>>> df = ps.DataFrame({ ... 'x': [1, 2, 2.5, 3, 3.5, 4, 5], ... 'y': [4, 4, 4.5, 5, 5.5, 6, 6], ... }) >>> df.plot.kde(bw_method=3)
>>> df = ps.DataFrame({ ... 'x': [1, 2, 2.5, 3, 3.5, 4, 5], ... 'y': [4, 4, 4.5, 5, 5.5, 6, 6], ... }) >>> df.plot.kde(ind=[1, 2, 3, 4, 5, 6], bw_method=0.3)