[NSE-1075] Dynamically adjust input partition size #1076

PHILO-HE · 2022-08-19T09:07:20Z

No description provided.

github-actions · 2022-08-19T09:07:38Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/native-sql-engine/issues

Then could you also rename commit message and pull request title in the following format?

[NSE-${ISSUES_ID}] ${detailed message}

See also:

Other pull requests

github-actions · 2022-08-19T09:08:11Z

#1075

zhouyuan · 2022-08-22T02:16:50Z

@jackylee-ch

jackylee-ch · 2022-08-22T02:38:31Z

...andard/src/main/scala/com/intel/oap/spark/sql/execution/datasources/v2/arrow/ArrowScan.scala

+    val minPartitionNum = sparkSession.sessionState.conf.filesMinPartitionNum
+      .getOrElse(SparkShimLoader.getSparkShims.leafNodeDefaultParallelism(sparkSession))
+    val PREFERRED_PARTITION_SIZE_LOWER_BOUND: Long = 128 * 1024 * 1024
+    val PREFERRED_PARTITION_SIZE_UPPER_BOUND: Long = 512 * 1024 * 1024


Maybe add a new config for these tow value?

Thanks for your advice. The PREFERRED_PARTITION_SIZE_UPPER_BOUND may do the same limitation with spark's max partition size configuration. They can be unified. Maybe, we can make PREFERRED_PARTITION_SIZE_LOWER_BOUND configurable.

PHILO-HE · 2022-08-23T08:56:45Z

...andard/src/main/scala/com/intel/oap/spark/sql/execution/datasources/v2/arrow/ArrowScan.scala

+
+  // This implementation is ported from spark FilePartition.scala with changes for
+  // adjusting openCost.
+  def getFilePartitions(sparkSession: SparkSession,


@jackylee-ch, please put your code changes for open cost here. It should be workable.

PHILO-HE · 2022-08-23T09:01:40Z

...andard/src/main/scala/com/intel/oap/spark/sql/execution/datasources/v2/arrow/ArrowScan.scala

+//    val openCostInBytes = sparkSession.sessionState.conf.filesOpenCostInBytes
+//    // val minPartitionNum = sparkSession.sessionState.conf.filesMinPartitionNum
+//    //  .getOrElse(sparkSession.leafNodeDefaultParallelism)
+//    val minPartitionNum = sparkSession.sessionState.conf.filesMinPartitionNum


@jackylee-ch, I note you have introduced a sort of computation for taskParallelismNum, is it as same as minPartitionNum? This piece of code is ported from spark source code.

No, they are not same. The taskParallelismNum is actually the spark.sql.files.expectedPartitionNum, which can be configured by the user and the default value is the maximum number of tasks that can be parallelized in the current application.

PHILO-HE added 3 commits August 19, 2022 15:42

Initial commit

58996d6

Refine the code

c5f51e9

Remove some changes

758246f

PHILO-HE changed the title ~~Dynamically adjust input partition size~~ [NSE-1075] Dynamically adjust input partition size Aug 19, 2022

PHILO-HE added 2 commits August 19, 2022 18:29

Make required config accessible

633a548

Adjust partition size empirically

de19cbc

PHILO-HE force-pushed the partition-optimize branch from 3005a4e to de19cbc Compare August 19, 2022 13:20

Change upper/lower bound

6fbb54a

jackylee-ch reviewed Aug 22, 2022

View reviewed changes

PHILO-HE added 2 commits August 23, 2022 16:50

Port the code for getFilePartitions

66269c2

Make WIP code commented

0bc4f43

PHILO-HE commented Aug 23, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NSE-1075] Dynamically adjust input partition size #1076

[NSE-1075] Dynamically adjust input partition size #1076

PHILO-HE commented Aug 19, 2022

github-actions bot commented Aug 19, 2022

github-actions bot commented Aug 19, 2022

zhouyuan commented Aug 22, 2022

jackylee-ch Aug 22, 2022

PHILO-HE Aug 23, 2022

PHILO-HE Aug 23, 2022

jackylee-ch Aug 23, 2022

PHILO-HE Aug 23, 2022

jackylee-ch Aug 23, 2022

[NSE-1075] Dynamically adjust input partition size #1076

Are you sure you want to change the base?

[NSE-1075] Dynamically adjust input partition size #1076

Conversation

PHILO-HE commented Aug 19, 2022

github-actions bot commented Aug 19, 2022

github-actions bot commented Aug 19, 2022

zhouyuan commented Aug 22, 2022

jackylee-ch Aug 22, 2022

Choose a reason for hiding this comment

PHILO-HE Aug 23, 2022

Choose a reason for hiding this comment

PHILO-HE Aug 23, 2022

Choose a reason for hiding this comment

jackylee-ch Aug 23, 2022

Choose a reason for hiding this comment

PHILO-HE Aug 23, 2022

Choose a reason for hiding this comment

jackylee-ch Aug 23, 2022

Choose a reason for hiding this comment