sql - MySQL query to Hiveql -
work(id, rank)
data:
work ------------------ 1 | 1 | b 1 | c 1 | d 2 | 2 | c 2 | b 3 | c
i need find pairs of ids have common rank count , should display if count of rank greater 2 , print them in descending order. have written mysql query but, new sparksql , hiveql. please me how that. example using data above result set should be:
mysql query is:
select a.id,b.id work a, work b a.id>b.id group a.id,b.id having group_concat(distinct a.rank order a.rank)=group_concat(distinct b.rank order b.rank) --------------------- id1 | id2 | count --------------------- | b | 3 b | c | 3
i don't think hive supports group_concat()
. think same thing:
select a.id, b.id, a.cnt (select a.*, count(*) on (partition a.id) cnt work ) join (select b.*, count(*) on (partition b.id) cnt work b ) b on a.rank = b.rank , a.cnt = b.cnt a.id < b.id -- *think* allowed in hive; not, subquery or expression in `having` clause same thing group a.id, b.id, a.cnt having count(*) = a.cnt;
this more natural way pairs of ids same ranking. in fact, should more efficient in database mysql version. cross join
generates lot of data.
Comments
Post a Comment