Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support of init_spark from existing SparkSession? #416

Open
zhh210 opened this issue Sep 24, 2024 · 0 comments
Open

add support of init_spark from existing SparkSession? #416

zhh210 opened this issue Sep 24, 2024 · 0 comments

Comments

@zhh210
Copy link

zhh210 commented Sep 24, 2024

Is it possible to initialize the spark object from an existing SparkSession? The use case is that my work environment needs a special customized SparkSession that were wrapped up with complicated corporate credentials and setups. Running init_spark() from the raydp example won't work as it is not aware of them. I can create a SparkSession object using the customized wrapper though but don't know how I can pass it over to raydp.

The raydp example using standard spark:

import ray
import raydp

# connect to ray cluster
ray.init(address='auto')

# create a Spark cluster with specified resource requirements
spark = raydp.init_spark(app_name='RayDP Example',
                         num_executors=2,
                         executor_cores=2,
                         executor_memory='4GB')

# normal data processesing with Spark
df = spark.createDataFrame([('look',), ('spark',), ('tutorial',), ('spark',), ('look', ), ('python', )], ['word'])
df.show()
word_count = df.groupBy('word').count()
word_count.show()

# stop the spark cluster
raydp.stop_spark()

Proposed raydp using existing SparkSession:

spark_session = get_customized_ss()
spark = spark_init(spark_session)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant