Skip to content

Commit

Permalink
Deployed d139926 with MkDocs version: 1.5.3
Browse files Browse the repository at this point in the history
  • Loading branch information
Unknown committed Feb 17, 2024
1 parent 051f2cd commit fedafc4
Show file tree
Hide file tree
Showing 5 changed files with 18 additions and 27 deletions.
22 changes: 5 additions & 17 deletions SparkContext/index.html

Large diffs are not rendered by default.

4 changes: 2 additions & 2 deletions scheduler/DAGScheduler/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -95,9 +95,9 @@
</code></pre></div> <p><code>submitMissingTasks</code> requests the <a href=#taskScheduler>TaskScheduler</a> to TaskScheduler.md#submitTasks[submit the tasks for execution] (as a new TaskSet.md[TaskSet]).</p> <p>With no tasks to submit for execution, <code>submitMissingTasks</code> <a href=#markStageAsFinished>marks the stage as finished successfully</a>.</p> <p><code>submitMissingTasks</code> prints out the following DEBUG messages based on the type of the stage:</p> <div class=highlight><pre><span></span><code>Stage [stage] is actually done; (available: [isAvailable],available outputs: [numAvailableOutputs],partitions: [numPartitions])
</code></pre></div> <p>or</p> <div class=highlight><pre><span></span><code>Stage [stage] is actually done; (partitions: [numPartitions])
</code></pre></div> <p>for <code>ShuffleMapStage</code> and <code>ResultStage</code>, respectively.</p> <p>In the end, with no tasks to submit for execution, <code>submitMissingTasks</code> <a href=#submitWaitingChildStages>submits waiting child stages for execution</a> and exits.</p> <p><code>submitMissingTasks</code> is used when <code>DAGScheduler</code> is requested to <a href=#submitStage>submit a stage for execution</a>.</p> <h2 id=getPreferredLocs>Finding Preferred Locations for Missing Partitions<a class=headerlink href=#getPreferredLocs title="Permanent link">&para;</a></h2> <div class=highlight><pre><span></span><code><span class=n>getPreferredLocs</span><span class=p>(</span>
<span class=w> </span><span class=n>rdd</span><span class=p>:</span><span class=w> </span><span class=nc>RDD</span><span class=p>[</span><span class=n>_</span><span class=p>],</span>
<span class=w> </span><span class=n>rdd</span><span class=p>:</span><span class=w> </span><span class=nc>RDD</span><span class=p>[</span><span class=n>_</span><span class=p>],</span>
<span class=w> </span><span class=n>partition</span><span class=p>:</span><span class=w> </span><span class=nc>Int</span><span class=p>):</span><span class=w> </span><span class=nc>Seq</span><span class=p>[</span><span class=nc>TaskLocation</span><span class=p>]</span>
</code></pre></div> <p><code>getPreferredLocs</code> is simply an alias for the internal (recursive) <a href=#getPreferredLocsInternal>getPreferredLocsInternal</a>.</p> <p><code>getPreferredLocs</code> is used when...FIXME</p> <h2 id=getCacheLocs>Finding BlockManagers (Executors) for Cached RDD Partitions (aka Block Location Discovery)<a class=headerlink href=#getCacheLocs title="Permanent link">&para;</a></h2> <div class=highlight><pre><span></span><code><span class=n>getCacheLocs</span><span class=p>(</span>
</code></pre></div> <p><code>getPreferredLocs</code> is simply an alias for the internal (recursive) <a href=#getPreferredLocsInternal>getPreferredLocsInternal</a>.</p> <hr> <p><code>getPreferredLocs</code> is used when:</p> <ul> <li><code>SparkContext</code> is requested to <a href=../../SparkContext/#getPreferredLocs>getPreferredLocs</a></li> <li><code>DAGScheduler</code> is requested to <a href=#submitMissingTasks>submit the missing tasks of a stage</a></li> </ul> <h2 id=getCacheLocs>Finding BlockManagers (Executors) for Cached RDD Partitions (aka Block Location Discovery)<a class=headerlink href=#getCacheLocs title="Permanent link">&para;</a></h2> <div class=highlight><pre><span></span><code><span class=n>getCacheLocs</span><span class=p>(</span>
<span class=w> </span><span class=n>rdd</span><span class=p>:</span><span class=w> </span><span class=nc>RDD</span><span class=p>[</span><span class=n>_</span><span class=p>]):</span><span class=w> </span><span class=nc>IndexedSeq</span><span class=p>[</span><span class=nc>Seq</span><span class=p>[</span><span class=nc>TaskLocation</span><span class=p>]]</span>
</code></pre></div> <p><code>getCacheLocs</code> gives <a href=../TaskLocation/ >TaskLocations</a> (block locations) for the partitions of the input <code>rdd</code>. <code>getCacheLocs</code> caches lookup results in <a href=#cacheLocs>cacheLocs</a> internal registry.</p> <p>NOTE: The size of the collection from <code>getCacheLocs</code> is exactly the number of partitions in <code>rdd</code> RDD.</p> <p>NOTE: The size of every <a href=../TaskLocation/ >TaskLocation</a> collection (i.e. every entry in the result of <code>getCacheLocs</code>) is exactly the number of blocks managed using storage:BlockManager.md[BlockManagers] on executors.</p> <p>Internally, <code>getCacheLocs</code> finds <code>rdd</code> in the <a href=#cacheLocs>cacheLocs</a> internal registry (of partition locations per RDD).</p> <p>If <code>rdd</code> is not in <a href=#cacheLocs>cacheLocs</a> internal registry, <code>getCacheLocs</code> branches per its storage:StorageLevel.md[storage level].</p> <p>For <code>NONE</code> storage level (i.e. no caching), the result is an empty locations (i.e. no location preference).</p> <p>For other non-<code>NONE</code> storage levels, <code>getCacheLocs</code> storage:BlockManagerMaster.md#getLocations-block-array[requests <code>BlockManagerMaster</code> for block locations] that are then mapped to <a href=../TaskLocation/ >TaskLocations</a> with the hostname of the owning <code>BlockManager</code> for a block (of a partition) and the executor id.</p> <p><code>getCacheLocs</code> records the computed block locations per partition (as <a href=../TaskLocation/ >TaskLocation</a>) in <a href=#cacheLocs>cacheLocs</a> internal registry.</p> <p>NOTE: <code>getCacheLocs</code> requests locations from <code>BlockManagerMaster</code> using storage:BlockId.md#RDDBlockId[RDDBlockId] with the RDD id and the partition indices (which implies that the order of the partitions matters to request proper blocks).</p> <p>NOTE: DAGScheduler uses TaskLocation.md[TaskLocations] (with host and executor) while storage:BlockManagerMaster.md[BlockManagerMaster] uses storage:BlockManagerId.md[] (to track similar information, i.e. block locations).</p> <p><code>getCacheLocs</code> is used when <code>DAGScheduler</code> is requested to find <a href=#getMissingParentStages>missing parent MapStages</a> and <a href=#getPreferredLocsInternal>getPreferredLocsInternal</a>.</p> <h2 id=getPreferredLocsInternal>Finding Placement Preferences for RDD Partition (recursively)<a class=headerlink href=#getPreferredLocsInternal title="Permanent link">&para;</a></h2> <div class=highlight><pre><span></span><code><span class=n>getPreferredLocsInternal</span><span class=p>(</span>
<span class=w> </span><span class=n>rdd</span><span class=p>:</span><span class=w> </span><span class=nc>RDD</span><span class=p>[</span><span class=n>_</span><span class=p>],</span>
Expand Down
17 changes: 10 additions & 7 deletions scheduler/TaskSetManager/index.html

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion search/search_index.json

Large diffs are not rendered by default.

Binary file modified sitemap.xml.gz
Binary file not shown.

0 comments on commit fedafc4

Please sign in to comment.