pyspark.sql.DataFrameStatFunctions.freqItems¶
- 
DataFrameStatFunctions.freqItems(cols: List[str], support: Optional[float] = None) → pyspark.sql.dataframe.DataFrame[source]¶
- Finding frequent items for columns, possibly with false positives. Using the frequent element count algorithm described in “https://doi.org/10.1145/762471.762473, proposed by Karp, Schenker, and Papadimitriou”. - DataFrame.freqItems()and- DataFrameStatFunctions.freqItems()are aliases.- New in version 1.4.0. - Changed in version 3.4.0: Supports Spark Connect. - Parameters
- colslist or tuple
- Names of the columns to calculate frequent items for as a list or tuple of strings. 
- supportfloat, optional
- The frequency with which to consider an item ‘frequent’. Default is 1%. The support must be greater than 1e-4. 
 
- Returns
- DataFrame
- DataFrame with frequent items. 
 
 - Notes - This function is meant for exploratory data analysis, as we make no guarantee about the backward compatibility of the schema of the resulting - DataFrame.- Examples - >>> df = spark.createDataFrame([(1, 11), (1, 11), (3, 10), (4, 8), (4, 8)], ["c1", "c2"]) >>> df.freqItems(["c1", "c2"]).show() +------------+------------+ |c1_freqItems|c2_freqItems| +------------+------------+ | [4, 1, 3]| [8, 11, 10]| +------------+------------+