How to Connect Python to Spark Session and Keep RDDs Alive -


how small python script hook existing instance of spark , operations on existing rdds?

i'm in stages of working spark on windows 10, trying scripts on "local" instance. i'm working latest stable build of spark (spark 2.0.1 hadoop 2.7). i've installed , set environment variables hadoop 2.7.3. i'm experimenting both pyspark shell , visual studio 2015 community python.

i'm trying build large engine, on i'll run individual scripts load, massage, format, , access data. i'm sure there's normal way that; isn't point of spark?

anyway, here's experience have far. expected. when build small spark script in python , run using visual studio, script runs, job, , exits. in process of exiting, exits spark context using.

so had following thought: if started persistent spark context in pyspark , set sparkconf , sparkcontext in each python script connect spark context? so, looking online defaults pyspark, tried following:

conf = sparkconf().setmaster("local[*]").setappname("pysparkshell") sc = sparkcontext(conf = conf) 

i started pyspark. in separate script in visual studio, used code sparkcontext. loaded text file rdd named rddfromfilename . couldn't access rdd in pyspark shell once script had run.

how start persistent spark context, create rdd in in 1 python script, , access rdd subsequent python scripts? particularly in windows?

there no solution in spark. may consider:

i think out of these zeppelin officially supports windows.


Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -