package rdd
Provides several RDD implementations. See org.apache.spark.rdd.RDD.
- Source
- package.scala
- Alphabetic
- By Inheritance
- rdd
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
- 
      
      
      
        
      
    
      
        
        class
      
      
        AsyncRDDActions[T] extends Serializable with Logging
      
      
      A set of asynchronous RDD actions available through an implicit conversion. 
- 
      
      
      
        
      
    
      
        
        class
      
      
        CoGroupedRDD[K] extends RDD[(K, Array[Iterable[_]])]
      
      
      :: DeveloperApi :: An RDD that cogroups its parents. :: DeveloperApi :: An RDD that cogroups its parents. For each key k in parent RDDs, the resulting RDD contains a tuple with the list of values for that key. - Annotations
- @DeveloperApi()
- Note
- This is an internal API. We recommend users use RDD.cogroup(...) instead of instantiating this directly. 
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        DoubleRDDFunctions extends Logging with Serializable
      
      
      Extra functions available on RDDs of Doubles through an implicit conversion. 
- 
      
      
      
        
      
    
      
        
        class
      
      
        HadoopRDD[K, V] extends RDD[(K, V)] with Logging
      
      
      :: DeveloperApi :: An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS, sources in HBase, or S3), using the older MapReduce API ( org.apache.hadoop.mapred).:: DeveloperApi :: An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS, sources in HBase, or S3), using the older MapReduce API ( org.apache.hadoop.mapred).- Annotations
- @DeveloperApi()
- Note
- Instantiating this class directly is not recommended, please use - org.apache.spark.SparkContext.hadoopRDD()
 
-  type IsWritable[A] = (A) ⇒ Writable
- 
      
      
      
        
      
    
      
        
        class
      
      
        JdbcRDD[T] extends RDD[T] with Logging
      
      
      An RDD that executes a SQL query on a JDBC connection and reads results. An RDD that executes a SQL query on a JDBC connection and reads results. For usage example, see test case JdbcRDDSuite. 
- 
      
      
      
        
      
    
      
        
        class
      
      
        NewHadoopRDD[K, V] extends RDD[(K, V)] with Logging
      
      
      :: DeveloperApi :: An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS, sources in HBase, or S3), using the new MapReduce API ( org.apache.hadoop.mapreduce).:: DeveloperApi :: An RDD that provides core functionality for reading data stored in Hadoop (e.g., files in HDFS, sources in HBase, or S3), using the new MapReduce API ( org.apache.hadoop.mapreduce).- Annotations
- @DeveloperApi()
- Note
- Instantiating this class directly is not recommended, please use - org.apache.spark.SparkContext.newAPIHadoopRDD()
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        OrderedRDDFunctions[K, V, P <: Product2[K, V]] extends Logging with Serializable
      
      
      Extra functions available on RDDs of (key, value) pairs where the key is sortable through an implicit conversion. Extra functions available on RDDs of (key, value) pairs where the key is sortable through an implicit conversion. They will work with any key type Kthat has an implicitOrdering[K]in scope. Ordering objects already exist for all of the standard primitive types. Users can also define their own orderings for custom types, or to override the default ordering. The implicit ordering that is in the closest scope will be used.import org.apache.spark.SparkContext._ val rdd: RDD[(String, Int)] = ... implicit val caseInsensitiveOrdering = new Ordering[String] { override def compare(a: String, b: String) = a.toLowerCase(Locale.ROOT).compare(b.toLowerCase(Locale.ROOT)) } // Sort by key, using the above case insensitive ordering. rdd.sortByKey() 
- 
      
      
      
        
      
    
      
        
        class
      
      
        PairRDDFunctions[K, V] extends Logging with Serializable
      
      
      Extra functions available on RDDs of (key, value) pairs through an implicit conversion. 
- 
      
      
      
        
      
    
      
        
        trait
      
      
        PartitionCoalescer extends AnyRef
      
      
      ::DeveloperApi:: A PartitionCoalescer defines how to coalesce the partitions of a given RDD. ::DeveloperApi:: A PartitionCoalescer defines how to coalesce the partitions of a given RDD. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        PartitionGroup extends AnyRef
      
      
      ::DeveloperApi:: A group of Partitions::DeveloperApi:: A group of Partitions- Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        PartitionPruningRDD[T] extends RDD[T]
      
      
      :: DeveloperApi :: An RDD used to prune RDD partitions/partitions so we can avoid launching tasks on all partitions. :: DeveloperApi :: An RDD used to prune RDD partitions/partitions so we can avoid launching tasks on all partitions. An example use case: If we know the RDD is partitioned by range, and the execution DAG has a filter on the key, we can avoid launching tasks on partitions that don't have the range covering the key. - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        abstract 
        class
      
      
        RDD[T] extends Serializable with Logging
      
      
      A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. A Resilient Distributed Dataset (RDD), the basic abstraction in Spark. Represents an immutable, partitioned collection of elements that can be operated on in parallel. This class contains the basic operations available on all RDDs, such as map,filter, andpersist. In addition, org.apache.spark.rdd.PairRDDFunctions contains operations available only on RDDs of key-value pairs, such asgroupByKeyandjoin; org.apache.spark.rdd.DoubleRDDFunctions contains operations available only on RDDs of Doubles; and org.apache.spark.rdd.SequenceFileRDDFunctions contains operations available on RDDs that can be saved as SequenceFiles. All operations are automatically available on any RDD of the right type (e.g. RDD[(Int, Int)]) through implicit.Internally, each RDD is characterized by five main properties: - A list of partitions
- A function for computing each split
- A list of dependencies on other RDDs
- Optionally, a Partitioner for key-value RDDs (e.g. to say that the RDD is hash-partitioned)
- Optionally, a list of preferred locations to compute each split on (e.g. block locations for an HDFS file)
 All of the scheduling and execution in Spark is done based on these methods, allowing each RDD to implement its own way of computing itself. Indeed, users can implement custom RDDs (e.g. for reading data from a new storage system) by overriding these functions. Please refer to the Spark paper for more details on RDD internals. 
- 
      
      
      
        
      
    
      
        
        class
      
      
        RDDBarrier[T] extends AnyRef
      
      
      :: Experimental :: Wraps an RDD in a barrier stage, which forces Spark to launch tasks of this stage together. :: Experimental :: Wraps an RDD in a barrier stage, which forces Spark to launch tasks of this stage together. org.apache.spark.rdd.RDDBarrier instances are created by org.apache.spark.rdd.RDD#barrier. - Annotations
- @Experimental() @Since( "2.4.0" )
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        SequenceFileRDDFunctions[K, V] extends Logging with Serializable
      
      
      Extra functions available on RDDs of (key, value) pairs to create a Hadoop SequenceFile, through an implicit conversion. Extra functions available on RDDs of (key, value) pairs to create a Hadoop SequenceFile, through an implicit conversion. - Note
- This can't be part of PairRDDFunctions because we need more implicit parameters to convert our keys and values to Writable. 
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        ShuffledRDD[K, V, C] extends RDD[(K, C)]
      
      
      :: DeveloperApi :: The resulting RDD from a shuffle (e.g. :: DeveloperApi :: The resulting RDD from a shuffle (e.g. repartitioning of data). - K
- the key class. 
- V
- the value class. 
- C
- the combiner class. 
 - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        class
      
      
        UnionRDD[T] extends RDD[T]
      
      
      - Annotations
- @DeveloperApi()
 
Value Members
-  object JdbcRDD extends Serializable
- 
      
      
      
        
      
    
      
        
        object
      
      
        PartitionPruningRDD extends Serializable
      
      
      - Annotations
- @DeveloperApi()
 
- 
      
      
      
        
      
    
      
        
        object
      
      
        RDD extends Serializable
      
      
      Defines implicit functions that provide extra functionalities on RDDs of specific types. Defines implicit functions that provide extra functionalities on RDDs of specific types. For example, RDD.rddToPairRDDFunctions converts an RDD into a PairRDDFunctions for key-value-pair RDDs, and enabling extra functionalities such as PairRDDFunctions.reduceByKey.
-  object UnionRDD extends Serializable