hadoop - Apache pig program -


need write pig script counting no:of words in

file containing below text

what|is|hadoop history|of|hadoop how|hadoop|name|was|given problems|with|traditional|large-scale|systems|and|need|for|hadoop understanding|hadoop|architecture fundamental|of|hdfs|(blocks,|name|node,|data|node,|secondary|name|node) rack|awareness read/write|from|hdfs hdfs|federation|and|high|availability 

load data chararray.replace '|' space i.e. ' ' , tokenize line give words , group , count words

a = load '/user/hadoop/data.txt' (line:chararray); b = foreach generate flatten(tokenize(replace(line,'\\|',' '))); c = group b $0; d = foreach c generate group, count(b); dump d; 

output

enter image description here


Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -