add support of init_spark from existing SparkSession? #416

zhh210 · 2024-09-24T14:33:07Z

Is it possible to initialize the spark object from an existing SparkSession? The use case is that my work environment needs a special customized SparkSession that were wrapped up with complicated corporate credentials and setups. Running init_spark() from the raydp example won't work as it is not aware of them. I can create a SparkSession object using the customized wrapper though but don't know how I can pass it over to raydp.

The raydp example using standard spark:

import ray
import raydp

# connect to ray cluster
ray.init(address='auto')

# create a Spark cluster with specified resource requirements
spark = raydp.init_spark(app_name='RayDP Example',
                         num_executors=2,
                         executor_cores=2,
                         executor_memory='4GB')

# normal data processesing with Spark
df = spark.createDataFrame([('look',), ('spark',), ('tutorial',), ('spark',), ('look', ), ('python', )], ['word'])
df.show()
word_count = df.groupBy('word').count()
word_count.show()

# stop the spark cluster
raydp.stop_spark()

Proposed raydp using existing SparkSession:

spark_session = get_customized_ss()
spark = spark_init(spark_session)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support of init_spark from existing SparkSession? #416

add support of init_spark from existing SparkSession? #416

zhh210 commented Sep 24, 2024

add support of init_spark from existing SparkSession? #416

add support of init_spark from existing SparkSession? #416

Comments

zhh210 commented Sep 24, 2024