pyspark.RDD.subtractByKey¶
- 
RDD.subtractByKey(other: pyspark.rdd.RDD[Tuple[K, Any]], numPartitions: Optional[int] = None) → pyspark.rdd.RDD[Tuple[K, V]][source]¶
- Return each (key, value) pair in self that has no pair with matching key in other. - New in version 0.9.1. - Parameters
- Returns
 - See also - Examples - >>> rdd1 = sc.parallelize([("a", 1), ("b", 4), ("b", 5), ("a", 2)]) >>> rdd2 = sc.parallelize([("a", 3), ("c", None)]) >>> sorted(rdd1.subtractByKey(rdd2).collect()) [('b', 4), ('b', 5)]