r - Moving average over 5 years with irregular dates -


i have large number of files (~1200) each contains large timeserie data height of groundwater. starting date , length of serie different each file. there can large data gaps between dates, example (small part of such file):

date        height (cm) 14-1-1980   7659 28-1-1980   7632 14-2-1980   7661 14-3-1980   7638 28-3-1980   7642 14-4-1980   7652 25-4-1980   7646 14-5-1980   7635 29-5-1980   7622 13-6-1980   7606 27-6-1980   7598 14-7-1980   7654 28-7-1980   7654 14-8-1980   7627 28-8-1980   7600 12-9-1980   7617 14-10-1980  7596 28-10-1980  7601 14-11-1980  7592 28-11-1980  7614 11-12-1980  7650 29-12-1980  7670 14-1-1981   7698 28-1-1981   7700 13-2-1981   7694 17-3-1981   7740 30-3-1981   7683 14-4-1981   7692 14-5-1981   7682 15-6-1981   7696 17-7-1981   7706 28-7-1981   7699 28-8-1981   7686 30-9-1981   7678 17-11-1981  7723 11-12-1981  7803 18-2-1982   7757 16-3-1982   7773 13-5-1982   7753 11-6-1982   7740 14-7-1982   7731 15-8-1982   7739 14-9-1982   7722 14-10-1982  7794 15-11-1982  7764 14-12-1982  7790 14-1-1983   7810 28-3-1983   7836 28-4-1983   7815 31-5-1983   7857 29-6-1983   7801 28-7-1983   7774 24-8-1983   7758 28-9-1983   7748 26-10-1983  7727 29-11-1983  7782 27-1-1984   7801 28-3-1984   7764 27-4-1984   7752 28-5-1984   7795 27-7-1984   7748 27-8-1984   7729 28-9-1984   7752 26-10-1984  7789 28-11-1984  7797 18-12-1984  7781 28-1-1985   7833 21-2-1985   7778 22-4-1985   7794 28-5-1985   7768 28-6-1985   7836 26-8-1985   7765 19-9-1985   7760 31-10-1985  7756 26-11-1985  7760 20-12-1985  7781 17-1-1986   7813 28-1-1986   7852 26-2-1986   7797 25-3-1986   7838 22-4-1986   7807 27-5-1986   7785 24-6-1986   7787 26-8-1986   7744 23-9-1986   7742 22-10-1986  7752 1-12-1986   7749 17-12-1986  7758 

i want calculate average height on 5 years. so, in case of example 14-1-1980 + 5 years, 14-1-1985 + 5 years, .... amount of datapoints different each calculation of average. date 5 years later not in dataset datapoint. hence, think need tell r somehow take average in timespan.

i searched on internet didn't find fitted needs. lot of useful packages uts, zoo, lubridate , function aggregate passed by. instead of getting closer solution more , more confused approach best problem.

thanks lot in advance!

as @vagabond points out, you'll want combine 1200 files single data frame (the plyr package allow simple like: data.all <- adply(dir([data folder]), 1, read.csv).

once have data, first step transform date column proper posixct date data. right data appear strings, , want them have underlying numerical representation (which posixct does):

library(lubridate) df$date.new <- as.date(dmy(df$date))         date height   date.new 1 14-1-1980   7659 1980-01-14 2 28-1-1980   7632 1980-01-28 3 14-2-1980   7661 1980-02-14 4 14-3-1980   7638 1980-03-14 5 28-3-1980   7642 1980-03-28 6 14-4-1980   7652 1980-04-14 

note date.new column looks string, in fact date data, , can handled numerical operations (addition, comparison, etc.).

next, might construct set of date periods, on want compute averages. example mentions 5 years, data provided, that's not illustrative example. here i'm creating 1-year periods starting @ every day between jan 14 1980 , jan 14 1985

date.start <- as.date(as.date('1980-01-14') : as.date('1985-01-14'), origin = '1970-01-01') date.end <- date.start + years(1) dates <- data.frame(start = date.start, end = date.end)         start        end 1 1980-01-14 1981-01-14 2 1980-01-15 1981-01-15 3 1980-01-16 1981-01-16 4 1980-01-17 1981-01-17 5 1980-01-18 1981-01-18 6 1980-01-19 1981-01-19 

then can use dplyr package move through each row of data frame , compute summary average of height:

library(dplyr) df.mean <- dates %>%      group_by(start, end) %>%      summarize(height.mean = mean(df$height[df$date.new >= start & df$date.new < end]))         start        end height.mean       <date>     <date>       <dbl> 1 1980-01-14 1981-01-14    7630.273 2 1980-01-15 1981-01-15    7632.045 3 1980-01-16 1981-01-16    7632.045 4 1980-01-17 1981-01-17    7632.045 5 1980-01-18 1981-01-18    7632.045 6 1980-01-19 1981-01-19    7632.045 

Comments

Popular posts from this blog

java - SSE Emitter : Manage timeouts and complete() -

jquery - uncaught exception: DataTables Editor - remote hosting of code not allowed -

java - How to resolve error - package com.squareup.okhttp3 doesn't exist? -