Removing Duplicates From a Dataframe in R -


my situation trying clean data set of student results processing , i'm having issues removing duplicates wanting @ "first attempts" students have taken course multiple times. example of data using 1 of duplicates is:

        id     period                                           desc 632   1507       1101 90714 research contemporary biological issue 633   1507       1101         6317 explain process of speciation 634   1507       1101                  8931 describe gene expression 14448 1507       1201                  8931 describe gene expression 14449 1507       1201         6317 explain process of speciation 14450 1507       1201 90714 research contemporary biological issue 25884 1507       1301         6317 explain process of speciation 25885 1507       1301                  8931 describe gene expression 25886 1507       1301 90714 research contemporary biological issue 

the first 2 digits of reg_period year sat paper. can seen, want keeping id 1507 , reg_period 1101. far, example of code values want trimming is:

unique.rows <- unique(df[c("id", "period")]) dups <- (unique.rows[duplicated(unique.rows$id),]) 

however, there couple of problems running in to. works because data ordered id , reg_period , isn't guaranteed in future. plus don't know how take list of duplicate entries , select rows not in because %in% doesn't seem work , loop rbind runs out of memory.

what's best way handle this?

i use dplyr. calling data df:

result = df %>% group_by(id) %>%     filter(period == min(period)) 

if prefer base, pull id/period combinations keep separate data frame , inner join original data:

id_pd = df[order(df$id, df$pd), c("id", "period")] id_pd = id_pd[!duplicated(df$id), ] result = merge(df, id_pd) 

Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -