Skip to content

Commit

Permalink
RDD Contract impls
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Jan 1, 2024
1 parent dc43e46 commit 541514f
Show file tree
Hide file tree
Showing 7 changed files with 22 additions and 22 deletions.
23 changes: 19 additions & 4 deletions docs/rdd/RDD.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,34 +17,49 @@ compute(
context: TaskContext): Iterator[T]
```

Computes the input [Partition](Partition.md) (with the [TaskContext](../scheduler/TaskContext.md)) to produce values (of type `T`).
Computes the input [Partition](Partition.md) (with the [TaskContext](../scheduler/TaskContext.md)) to produce values (of type `T`)

See:

* [LocalCheckpointRDD](LocalCheckpointRDD.md#compute)
* [MapPartitionsRDD](MapPartitionsRDD.md#compute)
* [ReliableCheckpointRDD](ReliableCheckpointRDD.md#compute)
* [ShuffledRDD](ShuffledRDD.md#compute)

Used when:

* `RDD` is requested to [computeOrReadCheckpoint](#computeOrReadCheckpoint)

### getPartitions { #getPartitions }
### Partitions { #getPartitions }

```scala
getPartitions: Array[Partition]
```

[Partition](Partition.md)s of this `RDD`

See:

* [LocalCheckpointRDD](LocalCheckpointRDD.md#getPartitions)
* [MapPartitionsRDD](MapPartitionsRDD.md#getPartitions)
* [ReliableCheckpointRDD](ReliableCheckpointRDD.md#getPartitions)
* [ShuffledRDD](ShuffledRDD.md#getPartitions)

Used when:

* `RDD` is requested for the [partitions](#partitions)

## Implementations

* [CheckpointRDD](CheckpointRDD.md)
* CoalescedRDD
* [CoalescedRDD](CoalescedRDD.md)
* [CoGroupedRDD](CoGroupedRDD.md)
* [HadoopRDD](HadoopRDD.md)
* [MapPartitionsRDD](MapPartitionsRDD.md)
* [NewHadoopRDD](NewHadoopRDD.md)
* [ParallelCollectionRDD](ParallelCollectionRDD.md)
* [ReliableCheckpointRDD](ReliableCheckpointRDD.md)
* [ShuffledRDD](ShuffledRDD.md)
* [SubtractedRDD](SubtractedRDD.md)
* _others_

## Creating Instance
Expand Down
3 changes: 1 addition & 2 deletions docs/rdd/ShuffleDependency.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ ShuffleDependency[K: ClassTag, V: ClassTag, C: ClassTag]

* `CoGroupedRDD` is requested for the [dependencies](CoGroupedRDD.md#getDependencies) (for RDDs with different partitioners)
* `ShuffledRDD` is requested for the [dependencies](ShuffledRDD.md#getDependencies)
* `SubtractedRDD` is requested for the [dependencies](SubtractedRDD.md#getDependencies) (for an RDD with different partitioner)
* `ShuffleExchangeExec` ([Spark SQL]({{ book.spark_sql }}/physical-operators/ShuffleExchangeExec)) physical operator is requested to prepare a `ShuffleDependency`

When created, `ShuffleDependency` gets the [shuffle id](../SparkContext.md#nextShuffleId).
Expand Down Expand Up @@ -100,5 +99,5 @@ shuffleId: Int

The `ShuffleHandle` is used when:

* [CoGroupedRDDs](CoGroupedRDD.md#compute), [ShuffledRDD](ShuffledRDD.md#compute), [SubtractedRDD](SubtractedRDD.md#compute), and `ShuffledRowRDD` ([Spark SQL]({{ book.spark_sql }}/ShuffledRowRDD)) are requested to compute a partition (to get a [ShuffleReader](../shuffle/ShuffleReader.md) for a `ShuffleDependency`)
* [CoGroupedRDDs](CoGroupedRDD.md#compute), [ShuffledRDD](ShuffledRDD.md#compute), and `ShuffledRowRDD` ([Spark SQL]({{ book.spark_sql }}/ShuffledRowRDD)) are requested to compute a partition (to get a [ShuffleReader](../shuffle/ShuffleReader.md) for a `ShuffleDependency`)
* `ShuffleMapTask` is requested to [run](../scheduler/ShuffleMapTask.md#runTask) (to get a `ShuffleWriter` for a ShuffleDependency).
12 changes: 0 additions & 12 deletions docs/rdd/SubtractedRDD.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/rdd/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@

Read the paper and skip the rest of this page. You'll save a great deal of your precious time 😎

An `RDD` is a description of a fault-tolerant and resilient computation over a distributed collection of records (spread over [one or many partitions](#getPartitions)).
An `RDD` is a description of a fault-tolerant and resilient computation over a distributed collection of records (spread over [one or many partitions](RDD.md#getPartitions)).

!!! note "RDDs and Scala Collections"
RDDs are like Scala collections, and they only differ by their distribution, i.e. a RDD is computed on many JVMs while a Scala collection lives on a single JVM.
Expand Down
1 change: 0 additions & 1 deletion docs/shuffle/ShuffleManager.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ Used when the following `RDD`s are requested to [compute a partition](../rdd/RDD

* `CoGroupedRDD` is requested to [compute a partition](../rdd/CoGroupedRDD.md#compute)
* `ShuffledRDD` is requested to [compute a partition](../rdd/ShuffledRDD.md#compute)
* `SubtractedRDD` is requested to [compute a partition](../rdd/SubtractedRDD.md#compute)
* `ShuffledRowRDD` ([Spark SQL]({{ book.spark_sql }}/ShuffledRowRDD)) is requested to `compute` a partition

### <span id="getReaderForRange"> getReaderForRange
Expand Down
2 changes: 1 addition & 1 deletion docs/shuffle/ShuffleReader.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ read(): Iterator[Product2[K, C]]

Used when:

* [CoGroupedRDD](../rdd/CoGroupedRDD.md#compute), [ShuffledRDD](../rdd/ShuffledRDD.md#compute), and [SubtractedRDD](../rdd/SubtractedRDD.md#compute) are requested to compute a partition (for a `ShuffleDependency` dependency)
* [CoGroupedRDD](../rdd/CoGroupedRDD.md#compute), [ShuffledRDD](../rdd/ShuffledRDD.md#compute) are requested to compute a partition (for a `ShuffleDependency` dependency)
* `ShuffledRowRDD` ([Spark SQL]({{ book.spark_sql }}/ShuffledRowRDD)) is requested to `compute` a partition

## Implementations
Expand Down
1 change: 0 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -529,7 +529,6 @@ nav:
- ReliableCheckpointRDD: rdd/ReliableCheckpointRDD.md
- ShuffleDependency: rdd/ShuffleDependency.md
- ShuffledRDD: rdd/ShuffledRDD.md
- SubtractedRDD: rdd/SubtractedRDD.md
- Operators:
- Operators: rdd/spark-rdd-operations.md
- Transformations: rdd/spark-rdd-transformations.md
Expand Down

0 comments on commit 541514f

Please sign in to comment.