r - String replacement using sub function -


i attempting extract names of nba players column in database. however, format of the names in names column following:

"lebron james\\jamesle01"

i used following regex expression inside sub function attempt keep name portion:

sub("([a-z]\\w+\\s*-*'*[a-z]*\\s*\\.*|[a-z]\\.\\s*)\\*\\*[a-z]*\\d*\\d*", replacement = "\\1", x = nba_salaries$names) 

the expression meant take account unusual names contain more alphanumeric characters (e.g. michael kidd-gilchrist, de'andre jordan, luc mbah moute, etc.)

however, when run following,

head(nba_salaries$names) 

the names end being in same format.

i have used regexr.com ensure regex expression captures strings properly.

how this, can split text "\\" string, , take first element:

text <- c( "lebron james\\jamesle01", "michael jordan\\jamesle01" )  sapply( strsplit( text, "\\\\" ), "[", 1 ) 

which gives

[1] "lebron james"   "michael jordan" 

to explain. "[" function*, being called within sapply. pass result of strsplit x in sapply, , apply [ function it* parameter 1 take 1st element. here's way put it:

text <- strsplit( text, "\\\\" ) 

this output list, each list element containing vector, first element text before "\\" string, , second element contains text after it. use "[" function*, passing parameter 1, take first element of each of vectors:

text <- sapply( x = text, fun = "[", 1 ) 

edit add, using magrittr pipe things this, make little more readable:

library( magrittr )  text <- strsplit( x = text, split = "\\\\" ) %>%     sapply( fun = "[", 1 ) 
  • the "[" function function called when subset []. eg: vector[1:3] or in case vector[1] (thanks @mathewlundberg suggestion here)

Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -