r - remove value using grepl -
i trying remove retweets (strings start rt
) dataset, grepl
command doesn't seem work right.
this works fine:
grepl("[^rt|rt][:alnum]",c("rt hi","rt boo","rtlolo","im goodrt"),ignore.case=t)
this fails. why?
data<-structure(list(data = c("rt @4mysquad: makes me sick!\\n#whiteprivilege\\n#blacklivesmatter \\n#policestate https:\\/\\/t.co\\/ndl0ahwwtd", "rt @weaselzippers: d.c. police want identifying #blacklivesmatter supporters beat , left hero marine dead\\u2026 https:\\/\\/t.co\\/tbmo\\u2026", "rt @vicegandako: #prayformannypacquiao #lovewins", "\\dig out of binaries of right , wrong\\ - #blacklivesmatter @ mizzou", "even democrats think #bernie 's ideas unrealistic #insane #unlv #bigbangtheory #hillary2016 #blacklivesmatter https:\\/\\/t.co\\/itdyxoavtk", "rt @eelawl1966: former naacp president ben jealous endorses bernie sanders\\n#blacklivesmatter #blm #bernie2016 \\n https:\\/\\/t.co\\/qom1kmwlhs", "#saynotohillary #nomoreclintons #feelthebern #berniesanders #blacklivesmatter #disabled4bernie #women4bernie... https:\\/\\/t.co\\/i8f21iljgv", "rt @joshuamannery: #blacklivesmatter \\ud83d\\udc4a\\ud83c\\udffd https:\\/\\/t.co\\/tceitkkghd", "lang:und", "@foxnews did not say, \\yes\\? story won't gain traction bc it's not reflective of #blacklivesmatter movement", "president barack obama doing big things cuba + #blacklivesmatter https:\\/\\/t.co\\/6gejreoiuc", "rt @uberarabic: \\u0644\\u0644\\u0639\\u0644\\u0645 \\u0639\\u0642\\u0648\\u0628\\u0629 \\u0627\\u0644\\u0645\\u062b\\u0644\\u064a\\u064a\\u0646 \\u0641\\u064a \\u062c\\u0645\\u064a\\u0639 \\u0627\\u0644\\u062f\\u0627\\u064a\\u0627\\u0646\\u0627\\u062a \\u0627\\u0644\\u0633\\u0645\\u0627\\u0648\\u064a\\u0629 \\u0647\\u064a \\u0627\\u0644\\u0642\\u062a\\u0644\\n\\n#lovewins", "rt @aishayesufu: let's not forget 219#chibokgirls still in captivity today 676 days \\n#nevertobeforgotten #cryingtoberescued #bringbackourgi\\u2026", "rt @arctic_matters: chukchi sea. #lovewins https:\\/\\/t.co\\/gh8kzgvzk3", ". @doublefine r u joking, tim u know servers aren't working dumb asshole #gamergate", "rt @realkingcalii: #blacklivesmatter kendrick lamar \\alright\\ - https:\\/\\/t.co\\/amlrn0fksa", "rt @dreamersmoms: community representing #cca & @geogroups making dirty $$$$ w\\/immigrants. #weareflorida #not1more #immigration https:\\/\\/t.c\\u2026", "id_str:700012325831581696", "rt @dreamersmoms: con compa\\u00f1eras de carolina del norte apoy\\u00e1ndonos en #tallahassee. #proteccionnodeportation #not1more @grisalonso https:\\/\\/\\u2026", "rt @ikeisaacson2: hey #blacklivesmatter hate crime done racists in name. https:\\/\\/t.co\\/6ugsxajcrm" )), .names = "data", row.names = c(na, 20l), class = "data.frame") data[grepl("[^rt|rt][:alnum]",data,ignore.case=t)]
this question uses twitter data, has different approach
we specify pattern characters start (^
) rt
followed 1 or more spaces (\\s+
) , ignore.case = true
, elements start rt
followed space.
grepl("^rt\\s+",c("rt hi","rt boo","rtlolo","im goodrt"), ignore.case=true) #[1] true true false false grep("^rt\\s+", data$data, ignore.case=true, value = true) #[1] "rt @4mysquad: makes me sick!\\n#whiteprivilege\\n#blacklivesmatter \\n#policestate https:\\/\\/t.co\\/ndl0ahwwtd" #[2] "rt @weaselzippers: d.c. police want identifying #blacklivesmatter supporters beat , left hero marine dead\\u2026 https:\\/\\/t.co\\/tbmo\\u2026" #[3] "rt @vicegandako: #prayformannypacquiao #lovewins" #[4] "rt @eelawl1966: former naacp president ben jealous endorses bernie sanders\\n#blacklivesmatter #blm #bernie2016 \\n https:\\/\\/t.co\\/qom1kmwlhs" #[5] "rt @joshuamannery: #blacklivesmatter \\ud83d\\udc4a\\ud83c\\udffd https:\\/\\/t.co\\/tceitkkghd" #[6] "rt @uberarabic: \\u0644\\u0644\\u0639\\u0644\\u0645 \\u0639\\u0642\\u0648\\u0628\\u0629 \\u0627\\u0644\\u0645\\u062b\\u0644\\u064a\\u064a\\u0646 \\u0641\\u064a \\u062c\\u0645\\u064a\\u0639 \\u0627\\u0644\\u062f\\u0627\\u064a\\u0627\\u0646\\u0627\\u062a \\u0627\\u0644\\u0633\\u0645\\u0627\\u0648\\u064a\\u0629 \\u0647\\u064a \\u0627\\u0644\\u0642\\u062a\\u0644\\n\\n#lovewins" #[7] "rt @aishayesufu: let's not forget 219#chibokgirls still in captivity today 676 days \\n#nevertobeforgotten #cryingtoberescued #bringbackourgi\\u2026" #[8] "rt @arctic_matters: chukchi sea. #lovewins https:\\/\\/t.co\\/gh8kzgvzk3" #[9] "rt @realkingcalii: #blacklivesmatter kendrick lamar \\alright\\ - https:\\/\\/t.co\\/amlrn0fksa" #[10] "rt @dreamersmoms: community representing #cca & @geogroups making dirty $$$$ w\\/immigrants. #weareflorida #not1more #immigration https:\\/\\/t.c\\u2026" #[11] "rt @dreamersmoms: con compa\\u00f1eras de carolina del norte apoy\\u00e1ndonos en #tallahassee. #proteccionnodeportation #not1more @grisalonso https:\\/\\/\\u2026" #[12] "rt @ikeisaacson2: hey #blacklivesmatter hate crime done racists in name. https:\\/\\/t.co\\/6ugsxajcrm"
Comments
Post a Comment