join - IndexedRDD used in streaming context? -

i want use fast join operation in spark streaming context, such join b, fixed dataset reading file, b small streaming rdd read socket. i've tried common way provided spark, 5,000,000 rdd joining 10 streaming rdd costs 4 seconds. later i've tried using indexedrdd, can't make it. have following questions:

is 4 seconds slow? can use performance tuning method such broadcast join improve? if slow, why? heard rdd's join operation linear search, true?
can indexedrdd's join operation faster common way?

how use indexedrdd in streaming context? i've tried way:

streaming_rdd.transform{ rdd =>                          indexed_data.innerjoin(indexedrdd(rdd)){(id, a, b) => (a, b)}

it pass compile when running got error:

java.lang.classcastexception: scala.collection.immutable.$colon$colon cannot cast [lscala.tuple2;

i don't know if proper way use indexedrdd, , don't know caused error either. can 1 me?

Search This Blog

WIKI

join - IndexedRDD used in streaming context? -

Comments

Post a Comment

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -