sql - MySQL query to Hiveql -


work(id, rank)

data:

work ------------------ 1 | 1 | b 1 | c 1 | d 2 | 2 | c 2 | b 3 | c 

i need find pairs of ids have common rank count , should display if count of rank greater 2 , print them in descending order. have written mysql query but, new sparksql , hiveql. please me how that. example using data above result set should be:

mysql query is:

select a.id,b.id work a, work b a.id>b.id group a.id,b.id having group_concat(distinct a.rank order a.rank)=group_concat(distinct b.rank order b.rank)  --------------------- id1 | id2 | count ---------------------   | b   |  3  b  | c   |  3 

i don't think hive supports group_concat(). think same thing:

select a.id, b.id, a.cnt (select a.*, count(*) on (partition a.id) cnt       work      ) join      (select b.*, count(*) on (partition b.id) cnt       work b      ) b      on a.rank = b.rank , a.cnt = b.cnt a.id < b.id   -- *think* allowed in hive; not, subquery or expression in `having` clause same thing group a.id, b.id, a.cnt having count(*) = a.cnt; 

this more natural way pairs of ids same ranking. in fact, should more efficient in database mysql version. cross join generates lot of data.


Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -