r - String replacement using sub function -
i attempting extract names of nba players column in database. however, format of the names in names column following:
"lebron james\\jamesle01"
i used following regex expression inside sub function attempt keep name portion:
sub("([a-z]\\w+\\s*-*'*[a-z]*\\s*\\.*|[a-z]\\.\\s*)\\*\\*[a-z]*\\d*\\d*", replacement = "\\1", x = nba_salaries$names)
the expression meant take account unusual names contain more alphanumeric characters (e.g. michael kidd-gilchrist, de'andre jordan, luc mbah moute, etc.)
however, when run following,
head(nba_salaries$names)
the names end being in same format.
i have used regexr.com ensure regex expression captures strings properly.
how this, can split text "\\" string, , take first element:
text <- c( "lebron james\\jamesle01", "michael jordan\\jamesle01" ) sapply( strsplit( text, "\\\\" ), "[", 1 )
which gives
[1] "lebron james" "michael jordan"
to explain. "[" function*, being called within sapply
. pass result of strsplit
x
in sapply
, , apply [
function it* parameter 1
take 1st element. here's way put it:
text <- strsplit( text, "\\\\" )
this output list, each list element containing vector, first element text before "\\" string, , second element contains text after it. use "[" function*, passing parameter 1
, take first element of each of vectors:
text <- sapply( x = text, fun = "[", 1 )
edit add, using magrittr
pipe things this, make little more readable:
library( magrittr ) text <- strsplit( x = text, split = "\\\\" ) %>% sapply( fun = "[", 1 )
- the "[" function function called when subset
[]
. eg:vector[1:3]
or in casevector[1]
(thanks @mathewlundberg suggestion here)
Comments
Post a Comment