r - Moving average over 5 years with irregular dates -
i have large number of files (~1200) each contains large timeserie data height of groundwater. starting date , length of serie different each file. there can large data gaps between dates, example (small part of such file):
date height (cm) 14-1-1980 7659 28-1-1980 7632 14-2-1980 7661 14-3-1980 7638 28-3-1980 7642 14-4-1980 7652 25-4-1980 7646 14-5-1980 7635 29-5-1980 7622 13-6-1980 7606 27-6-1980 7598 14-7-1980 7654 28-7-1980 7654 14-8-1980 7627 28-8-1980 7600 12-9-1980 7617 14-10-1980 7596 28-10-1980 7601 14-11-1980 7592 28-11-1980 7614 11-12-1980 7650 29-12-1980 7670 14-1-1981 7698 28-1-1981 7700 13-2-1981 7694 17-3-1981 7740 30-3-1981 7683 14-4-1981 7692 14-5-1981 7682 15-6-1981 7696 17-7-1981 7706 28-7-1981 7699 28-8-1981 7686 30-9-1981 7678 17-11-1981 7723 11-12-1981 7803 18-2-1982 7757 16-3-1982 7773 13-5-1982 7753 11-6-1982 7740 14-7-1982 7731 15-8-1982 7739 14-9-1982 7722 14-10-1982 7794 15-11-1982 7764 14-12-1982 7790 14-1-1983 7810 28-3-1983 7836 28-4-1983 7815 31-5-1983 7857 29-6-1983 7801 28-7-1983 7774 24-8-1983 7758 28-9-1983 7748 26-10-1983 7727 29-11-1983 7782 27-1-1984 7801 28-3-1984 7764 27-4-1984 7752 28-5-1984 7795 27-7-1984 7748 27-8-1984 7729 28-9-1984 7752 26-10-1984 7789 28-11-1984 7797 18-12-1984 7781 28-1-1985 7833 21-2-1985 7778 22-4-1985 7794 28-5-1985 7768 28-6-1985 7836 26-8-1985 7765 19-9-1985 7760 31-10-1985 7756 26-11-1985 7760 20-12-1985 7781 17-1-1986 7813 28-1-1986 7852 26-2-1986 7797 25-3-1986 7838 22-4-1986 7807 27-5-1986 7785 24-6-1986 7787 26-8-1986 7744 23-9-1986 7742 22-10-1986 7752 1-12-1986 7749 17-12-1986 7758
i want calculate average height on 5 years. so, in case of example 14-1-1980 + 5 years, 14-1-1985 + 5 years, .... amount of datapoints different each calculation of average. date 5 years later not in dataset datapoint. hence, think need tell r somehow take average in timespan.
i searched on internet didn't find fitted needs. lot of useful packages uts, zoo, lubridate , function aggregate passed by. instead of getting closer solution more , more confused approach best problem.
thanks lot in advance!
as @vagabond points out, you'll want combine 1200 files single data frame (the plyr package allow simple like: data.all <- adply(dir([data folder]), 1, read.csv)
.
once have data, first step transform date
column proper posixct date data. right data appear strings, , want them have underlying numerical representation (which posixct does):
library(lubridate) df$date.new <- as.date(dmy(df$date)) date height date.new 1 14-1-1980 7659 1980-01-14 2 28-1-1980 7632 1980-01-28 3 14-2-1980 7661 1980-02-14 4 14-3-1980 7638 1980-03-14 5 28-3-1980 7642 1980-03-28 6 14-4-1980 7652 1980-04-14
note date.new
column looks string, in fact date data, , can handled numerical operations (addition, comparison, etc.).
next, might construct set of date periods, on want compute averages. example mentions 5 years, data provided, that's not illustrative example. here i'm creating 1-year periods starting @ every day between jan 14 1980 , jan 14 1985
date.start <- as.date(as.date('1980-01-14') : as.date('1985-01-14'), origin = '1970-01-01') date.end <- date.start + years(1) dates <- data.frame(start = date.start, end = date.end) start end 1 1980-01-14 1981-01-14 2 1980-01-15 1981-01-15 3 1980-01-16 1981-01-16 4 1980-01-17 1981-01-17 5 1980-01-18 1981-01-18 6 1980-01-19 1981-01-19
then can use dplyr package move through each row of data frame , compute summary average of height
:
library(dplyr) df.mean <- dates %>% group_by(start, end) %>% summarize(height.mean = mean(df$height[df$date.new >= start & df$date.new < end])) start end height.mean <date> <date> <dbl> 1 1980-01-14 1981-01-14 7630.273 2 1980-01-15 1981-01-15 7632.045 3 1980-01-16 1981-01-16 7632.045 4 1980-01-17 1981-01-17 7632.045 5 1980-01-18 1981-01-18 7632.045 6 1980-01-19 1981-01-19 7632.045
Comments
Post a Comment