IDFModel¶
- 
class pyspark.mllib.feature.IDFModel(java_model: py4j.java_gateway.JavaObject)[source]¶
- Represents an IDF model that can transform term frequency vectors. - New in version 1.2.0. - Methods - call(name, *a)- Call method of java_model - docFreq()- Returns the document frequency. - idf()- Returns the current IDF vector. - numDocs()- Returns number of documents evaluated to compute idf - transform(x)- Transforms term frequency (TF) vectors to TF-IDF vectors. - Methods Documentation - 
call(name: str, *a: Any) → Any¶
- Call method of java_model 
 - 
idf() → pyspark.mllib.linalg.Vector[source]¶
- Returns the current IDF vector. - New in version 1.4.0. 
 - 
transform(x: Union[VectorLike, pyspark.rdd.RDD[VectorLike]]) → Union[pyspark.mllib.linalg.Vector, pyspark.rdd.RDD[pyspark.mllib.linalg.Vector]][source]¶
- Transforms term frequency (TF) vectors to TF-IDF vectors. - If minDocFreq was set for the IDF calculation, the terms which occur in fewer than minDocFreq documents will have an entry of 0. - New in version 1.2.0. - Parameters
- xpyspark.mllib.linalg.Vectororpyspark.RDD
- an RDD of term frequency vectors or a term frequency vector 
 
- x
- Returns
- pyspark.mllib.linalg.Vectoror- pyspark.RDD
- an RDD of TF-IDF vectors or a TF-IDF vector 
 
 - Notes - In Python, transform cannot currently be used within an RDD transformation or action. Call transform directly on the RDD instead. 
 
-