hadoop - Apache pig program -
need write pig script counting no:of words in
file containing below text
what|is|hadoop history|of|hadoop how|hadoop|name|was|given problems|with|traditional|large-scale|systems|and|need|for|hadoop understanding|hadoop|architecture fundamental|of|hdfs|(blocks,|name|node,|data|node,|secondary|name|node) rack|awareness read/write|from|hdfs hdfs|federation|and|high|availability
load data chararray.replace '|' space i.e. ' ' , tokenize line give words , group , count words
a = load '/user/hadoop/data.txt' (line:chararray); b = foreach generate flatten(tokenize(replace(line,'\\|',' '))); c = group b $0; d = foreach c generate group, count(b); dump d;
output
Comments
Post a Comment