sql - MySQL query to Hiveql -

work(id, rank)

data:

work ------------------ 1 | 1 | b 1 | c 1 | d 2 | 2 | c 2 | b 3 | c

i need find pairs of ids have common rank count , should display if count of rank greater 2 , print them in descending order. have written mysql query but, new sparksql , hiveql. please me how that. example using data above result set should be:

mysql query is:

select a.id,b.id work a, work b a.id>b.id group a.id,b.id having group_concat(distinct a.rank order a.rank)=group_concat(distinct b.rank order b.rank)  --------------------- id1 | id2 | count ---------------------   | b   |  3  b  | c   |  3

i don't think hive supports group_concat(). think same thing:

select a.id, b.id, a.cnt (select a.*, count(*) on (partition a.id) cnt       work      ) join      (select b.*, count(*) on (partition b.id) cnt       work b      ) b      on a.rank = b.rank , a.cnt = b.cnt a.id < b.id   -- *think* allowed in hive; not, subquery or expression in `having` clause same thing group a.id, b.id, a.cnt having count(*) = a.cnt;

this more natural way pairs of ids same ranking. in fact, should more efficient in database mysql version. cross join generates lot of data.

Search This Blog

WIKI

sql - MySQL query to Hiveql -

Comments

Post a Comment

Popular posts from this blog

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - SSE Emitter : Manage timeouts and complete() -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -