Title: | Non-Parametric Bayesian Analyses of Animal Movement |
---|---|
Description: | Methods for assessing animal movement from telemetry and biologging data using non-parametric Bayesian methods. This includes features for pre- processing and analysis of data, as well as the visualization of results from the models. This framework does not rely on standard parametric density functions, which provides flexibility during model fitting. Further details regarding part of this framework can be found in Cullen et al. (2022) <doi:10.1111/2041-210X.13745>. |
Authors: | Joshua Cullen [aut, cre, cph] , Denis Valle [aut, cph] |
Maintainer: | Joshua Cullen <[email protected]> |
License: | GPL-3 |
Version: | 0.2.3 |
Built: | 2024-12-07 05:41:51 UTC |
Source: | https://github.com/joshcullen/bayesmove |
Assign behavior estimates to observations
assign_behavior(dat.orig, dat.seg.list, theta.estim.long, behav.names = NULL)
assign_behavior(dat.orig, dat.seg.list, theta.estim.long, behav.names = NULL)
dat.orig |
A data frame that contains all of the original data for all
animal IDs. Must be same as was used to originally segment the tracks. Must
have columns |
dat.seg.list |
A list of data associated with each animal ID where names
of list elements are the ID names and tracks have already been segmented.
Must have columns |
theta.estim.long |
A data frame in long format where each observation (time1) of each track segment (tseg) of each animal ID (id) has separate rows for behavior proportion estimates per state. Columns for behavior and proportion estimates should be labeled behavior and prop, respectively. Date (in POSIXct format) should also be included as a column labeled date. |
behav.names |
deprecated. Now taken from the |
A data frame of all animal IDs where columns (with names from
behav.names
) include proportions of each behavioral state per
observation, as well as a column that stores the dominant behavior within a
given track segment for which the observation belongs (behav
). This
is merged with the original data frame dat.orig
, so any observations
that were excluded (not at primary time interval) will show NA
for
behavior estimates.
#load original and segmented data data(tracks) data(tracks.seg) #convert segmented dataset into list tracks.list<- df_to_list(dat = tracks.seg, ind = "id") #select only id, tseg, SL, and TA columns tracks.seg2<- tracks.seg[,c("id","tseg","SL","TA")] #summarize data by track segment obs<- summarize_tsegs(dat = tracks.seg2, nbins = c(5,8)) #cluster data with LDA res<- cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000, nburn = 500, nmaxclust = 7, ndata.types = 2) #Extract proportions of behaviors per track segment theta.estim<- extract_prop(res = res, ngibbs = 1000, nburn = 500, nmaxclust = 7) #Create augmented matrix by replicating rows (tsegs) according to obs per tseg theta.estim.long<- expand_behavior(dat = tracks.seg, theta.estim = theta.estim, obs = obs, nbehav = 3, behav.names = c("Encamped","ARS","Transit"), behav.order = c(1,2,3)) #Run function dat.out<- assign_behavior(dat.orig = tracks, dat.seg.list = tracks.list, theta.estim.long = theta.estim.long)
#load original and segmented data data(tracks) data(tracks.seg) #convert segmented dataset into list tracks.list<- df_to_list(dat = tracks.seg, ind = "id") #select only id, tseg, SL, and TA columns tracks.seg2<- tracks.seg[,c("id","tseg","SL","TA")] #summarize data by track segment obs<- summarize_tsegs(dat = tracks.seg2, nbins = c(5,8)) #cluster data with LDA res<- cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000, nburn = 500, nmaxclust = 7, ndata.types = 2) #Extract proportions of behaviors per track segment theta.estim<- extract_prop(res = res, ngibbs = 1000, nburn = 500, nmaxclust = 7) #Create augmented matrix by replicating rows (tsegs) according to obs per tseg theta.estim.long<- expand_behavior(dat = tracks.seg, theta.estim = theta.estim, obs = obs, nbehav = 3, behav.names = c("Encamped","ARS","Transit"), behav.order = c(1,2,3)) #Run function dat.out<- assign_behavior(dat.orig = tracks, dat.seg.list = tracks.list, theta.estim.long = theta.estim.long)
After breakpoints have been extracted for each animal ID, this function assigns the associated segment number to observations for each animal ID. These segments of observations will be used in the second stage of the model framework to perform mixed-membership clustering by Latent Dirichlet Allocation.
assign_tseg(dat, brkpts)
assign_tseg(dat, brkpts)
dat |
A list where each element stores the data for a unique animal ID.
Each element is a data frame that contains all data associated for a given
animal ID and must include a column labeled |
brkpts |
A data frame of breakpoints for each animal ID (as generated by
|
A data frame that updates the original data object by including the segment number associated with each observation in relation to the extracted breakpoints.
#load data data(tracks.list) #subset only first track tracks.list<- tracks.list[1] #only retain id and discretized step length (SL) and turning angle (TA) columns tracks.list2<- purrr::map(tracks.list, subset, select = c(id, SL, TA)) set.seed(1) # Define model params alpha<- 1 ngibbs<- 1000 nbins<- c(5,8) #future::plan(future::multisession) #run all MCMC chains in parallel dat.res<- segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins, alpha = alpha) # Determine MAP iteration for selecting breakpoints and store breakpoints MAP.est<- get_MAP(dat = dat.res$LML, nburn = ngibbs/2) brkpts<- get_breakpts(dat = dat.res$brkpts, MAP.est = MAP.est) # Assign track segments to all observations by ID tracks.seg<- assign_tseg(dat = tracks.list, brkpts = brkpts)
#load data data(tracks.list) #subset only first track tracks.list<- tracks.list[1] #only retain id and discretized step length (SL) and turning angle (TA) columns tracks.list2<- purrr::map(tracks.list, subset, select = c(id, SL, TA)) set.seed(1) # Define model params alpha<- 1 ngibbs<- 1000 nbins<- c(5,8) #future::plan(future::multisession) #run all MCMC chains in parallel dat.res<- segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins, alpha = alpha) # Determine MAP iteration for selecting breakpoints and store breakpoints MAP.est<- get_MAP(dat = dat.res$LML, nburn = ngibbs/2) brkpts<- get_breakpts(dat = dat.res$brkpts, MAP.est = MAP.est) # Assign track segments to all observations by ID tracks.seg<- assign_tseg(dat = tracks.list, brkpts = brkpts)
After breakpoints have been extracted for each animal ID, this function assigns the associated segment number to observations for each animal ID. These segments of observations will be used in the second stage of the model framework to perform mixed-membership clustering by Latent Dirichlet Allocation.
assign_tseg_internal(dat, brkpts)
assign_tseg_internal(dat, brkpts)
dat |
A data frame that contains all data associated for a given animal
ID. Must include a column labeled |
brkpts |
A data frame of breakpoints for each animal ID (as generated by
|
A data frame that updates the original data object by including the segment number associated with each observation in relation to the extracted breakpoints.
This function serves as a wrapper for samp_move
by running this
sampler for each iteration of the MCMC chain. It is called by
segment_behavior
to run the RJMCMC on all animal IDs
simultaneously.
behav_gibbs_sampler(dat, ngibbs, nbins, alpha, breakpt, p)
behav_gibbs_sampler(dat, ngibbs, nbins, alpha, breakpt, p)
dat |
A data frame that only contains columns for the animal IDs and for each of the discretized movement variables. |
ngibbs |
numeric. The total number of iterations of the MCMC chain. |
nbins |
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within
|
alpha |
numeric. A single value used to specify the hyperparameter for
the prior distribution. A standard value for |
breakpt |
numeric. A vector of breakpoints if pre-specifying where they
may occur, otherwise |
p |
An object storing information from
|
A list of the breakpoints, the number of breakpoints, and the log marginal likelihood at each MCMC iteration, as well as the time it took the model to finish running. This is only provided for the data of a single animal ID.
Transforms vectors of bin numbers into full matrices for plotting as a heatmap.
behav_seg_image(dat, nbins)
behav_seg_image(dat, nbins)
dat |
A data frame for a single animal ID that contains only columns for
the ID and each of the movement variables that were analyzed by
|
nbins |
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within
|
A list where each element stores the presence-absence matrix for each of the movement variables.
This function uses a Gibbs sampler within a mixture model to estimate the optimal number of behavioral states, the state-dependent distributions, and to assign behavioral states to each observation. This model does not assume an underlying mechanistic process.
cluster_obs(dat, alpha, ngibbs, nmaxclust, nburn)
cluster_obs(dat, alpha, ngibbs, nmaxclust, nburn)
dat |
A data frame that **only** contains columns for the discretized movement variables. |
alpha |
numeric. A single value used to specify the hyperparameter for the prior distribution. |
ngibbs |
numeric. The total number of iterations of the MCMC chain. |
nmaxclust |
numeric. A single number indicating the maximum number of clusters to test. |
nburn |
numeric. The length of the burn-in phase. |
The mixture model analyzes all animal IDs pooled together, thus providing a population-level estimate of behavioral states.
A list of model results is returned where elements include the
phi
matrix for each data stream, theta
matrix, log likelihood
estimates for each iteration of the MCMC chain loglikel
, a list of
the MAP estimates of the latent states for each observation z.MAP
, a
matrix of the whole posterior of state assignments per observation
z.posterior
, and a vector gamma1
of estimates for the gamma
hyperparameter.
data(tracks.list) #convert from list to data frame tracks.list<- dplyr::bind_rows(tracks.list) #only retain id and discretized step length (SL) and turning angle (TA) columns tracks<- subset(tracks.list, select = c(SL, TA)) set.seed(1) # Define model params alpha=0.1 ngibbs=1000 nburn=ngibbs/2 nmaxclust=7 dat.res<- cluster_obs(dat = tracks, alpha = alpha, ngibbs = ngibbs, nmaxclust = nmaxclust, nburn = nburn)
data(tracks.list) #convert from list to data frame tracks.list<- dplyr::bind_rows(tracks.list) #only retain id and discretized step length (SL) and turning angle (TA) columns tracks<- subset(tracks.list, select = c(SL, TA)) set.seed(1) # Define model params alpha=0.1 ngibbs=1000 nburn=ngibbs/2 nmaxclust=7 dat.res<- cluster_obs(dat = tracks, alpha = alpha, ngibbs = ngibbs, nmaxclust = nmaxclust, nburn = nburn)
This function performs a Gibbs sampler within the Latent Dirichlet Allocation
(LDA) model to estimate proportions of each behavioral state for all time
segments generated by segment_behavior
. This is the second
stage of the two-stage Bayesian model that estimates proportions of
behavioral states by first segmenting individual tracks into relatively
homogeneous segments of movement.
cluster_segments(dat, gamma1, alpha, ngibbs, nmaxclust, nburn, ndata.types)
cluster_segments(dat, gamma1, alpha, ngibbs, nmaxclust, nburn, ndata.types)
dat |
A data frame returned by |
gamma1 |
numeric. A hyperparameter for the truncated stick-breaking
prior for estimating the |
alpha |
numeric. A hyperparameter for the Dirichlet distribution when
estimating the |
ngibbs |
numeric. The total number of iterations of the MCMC chain. |
nmaxclust |
numeric. A single number indicating the maximum number of clusters to test. |
nburn |
numeric. The length of the burn-in phase. |
ndata.types |
numeric. A vector of the number of bins used to discretize
each movement variable. These must be in the same order as the columns
within |
The LDA model analyzes all animal IDs pooled together, thereby providing population-level estimates of behavioral states.
A list of model results is returned where elements include the
phi
matrix for each data stream, theta
matrix, log likelihood
estimates for each iteration of the MCMC chain loglikel
, and
matrices of the latent cluster estimates for each data stream z.agg
.
#load data data(tracks.seg) #select only id, tseg, SL, and TA columns tracks.seg2<- tracks.seg[,c("id","tseg","SL","TA")] #summarize data by track segment obs<- summarize_tsegs(dat = tracks.seg2, nbins = c(5,8)) #cluster data with LDA res<- cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000, nburn = 500, nmaxclust = 7, ndata.types = 2)
#load data data(tracks.seg) #select only id, tseg, SL, and TA columns tracks.seg2<- tracks.seg[,c("id","tseg","SL","TA")] #summarize data by track segment obs<- summarize_tsegs(dat = tracks.seg2, nbins = c(5,8)) #cluster data with LDA res<- cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000, nburn = 500, nmaxclust = 7, ndata.types = 2)
Internal function that calculates the inverted cumsum
CumSumInv(ntsegm, nmaxclust, z)
CumSumInv(ntsegm, nmaxclust, z)
ntsegm |
An integer. |
nmaxclust |
An integer. |
z |
An integer matrix. |
Converts an object of class data.frame
to a list where each element is
a separate animal ID. This function prepares the data for further analysis
and when mapping other functions onto the data for separate animal IDs.
df_to_list(dat, ind)
df_to_list(dat, ind)
dat |
A data frame containing the data for each animal ID. |
ind |
character. The name of the column storing the animal IDs. |
A list where each element stores the data for a separate animal ID.
#load data data(tracks) #convert to list dat.list<- df_to_list(dat = tracks, ind = "id")
#load data data(tracks) #convert to list dat.list<- df_to_list(dat = tracks, ind = "id")
Convert movement variables from continuous to discrete values for analysis by
segment_behavior
.
discrete_move_var(dat, lims, varIn, varOut)
discrete_move_var(dat, lims, varIn, varOut)
dat |
A data frame that contains the variable(s) of interest to convert from continuous to discrete values. |
lims |
A list of the bin limits for each variable. Each element of the list should be a vector of real numbers. |
varIn |
A vector of names for the continuous variable stored as columns
within |
varOut |
A vector of names for the storage of the discrete variables returned by the function. |
A data frame with new columns of discretized variables as labeled by
varOut
.
#load data data(tracks) #subset only first track tracks<- tracks[tracks$id == "id1",] #calculate step lengths and turning angles tracks<- prep_data(dat = tracks, coord.names = c("x","y"), id = "id") #round times to nearest interval of interest (e.g. 3600 s or 1 hr) tracks<- round_track_time(dat = tracks, id = "id", int = 3600, tol = 180, time.zone = "UTC", units = "secs") #create list from data frame tracks.list<- df_to_list(dat = tracks, ind = "id") #filter observations to only 1 hr (or 3600 s) tracks_filt.list<- filter_time(dat.list = tracks.list, int = 3600) #define bin number and limits for turning angles and step lengths angle.bin.lims=seq(from=-pi, to=pi, by=pi/4) #8 bins dist.bin.lims=quantile(tracks[tracks$dt == 3600,]$step, c(0,0.25,0.50,0.75,0.90,1), na.rm=TRUE) #5 bins # Assign bins to observations tracks_disc.list<- purrr::map(tracks_filt.list, discrete_move_var, lims = list(dist.bin.lims, angle.bin.lims), varIn = c("step", "angle"), varOut = c("SL", "TA"))
#load data data(tracks) #subset only first track tracks<- tracks[tracks$id == "id1",] #calculate step lengths and turning angles tracks<- prep_data(dat = tracks, coord.names = c("x","y"), id = "id") #round times to nearest interval of interest (e.g. 3600 s or 1 hr) tracks<- round_track_time(dat = tracks, id = "id", int = 3600, tol = 180, time.zone = "UTC", units = "secs") #create list from data frame tracks.list<- df_to_list(dat = tracks, ind = "id") #filter observations to only 1 hr (or 3600 s) tracks_filt.list<- filter_time(dat.list = tracks.list, int = 3600) #define bin number and limits for turning angles and step lengths angle.bin.lims=seq(from=-pi, to=pi, by=pi/4) #8 bins dist.bin.lims=quantile(tracks[tracks$dt == 3600,]$step, c(0,0.25,0.50,0.75,0.90,1), na.rm=TRUE) #5 bins # Assign bins to observations tracks_disc.list<- purrr::map(tracks_filt.list, discrete_move_var, lims = list(dist.bin.lims, angle.bin.lims), varIn = c("step", "angle"), varOut = c("SL", "TA"))
Expand behavior estimates from track segments to observations
expand_behavior(dat, theta.estim, obs, nbehav, behav.names, behav.order)
expand_behavior(dat, theta.estim, obs, nbehav, behav.names, behav.order)
dat |
A data frame of the animal ID, track segment labels, and all other data per observation. Animal ID, date, track segment, and observation number columns must be labeled id, date, tseg, and time1, respectively. |
theta.estim |
A matrix (returned by |
obs |
A data frame summarizing the number of observations within each
bin per movement variable that is returned by
|
nbehav |
numeric. The number of behavioral states that will be retained in 1 to nmaxclust. |
behav.names |
character. A vector of names to label each state (in order). |
behav.order |
numeric. A vector that identifies the order in which the user would like to rearrange the behavioral states. If satisfied with order returned by the LDA model, this still must be specified. |
A new data frame that expands behavior proportions for each
observation within all track segments, including the columns labeled
time1 and date from the original dat
data frame.
#load data data(tracks.seg) #select only id, tseg, SL, and TA columns tracks.seg2<- tracks.seg[,c("id","tseg","SL","TA")] #summarize data by track segment obs<- summarize_tsegs(dat = tracks.seg2, nbins = c(5,8)) #cluster data with LDA res<- cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000, nburn = 500, nmaxclust = 7, ndata.types = 2) #Extract proportions of behaviors per track segment theta.estim<- extract_prop(res = res, ngibbs = 1000, nburn = 500, nmaxclust = 7) #Create augmented matrix by replicating rows (tsegs) according to obs per tseg theta.estim.long<- expand_behavior(dat = tracks.seg, theta.estim = theta.estim, obs = obs, nbehav = 3, behav.names = c("Encamped","ARS","Transit"), behav.order = c(1,2,3))
#load data data(tracks.seg) #select only id, tseg, SL, and TA columns tracks.seg2<- tracks.seg[,c("id","tseg","SL","TA")] #summarize data by track segment obs<- summarize_tsegs(dat = tracks.seg2, nbins = c(5,8)) #cluster data with LDA res<- cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000, nburn = 500, nmaxclust = 7, ndata.types = 2) #Extract proportions of behaviors per track segment theta.estim<- extract_prop(res = res, ngibbs = 1000, nburn = 500, nmaxclust = 7) #Create augmented matrix by replicating rows (tsegs) according to obs per tseg theta.estim.long<- expand_behavior(dat = tracks.seg, theta.estim = theta.estim, obs = obs, nbehav = 3, behav.names = c("Encamped","ARS","Transit"), behav.order = c(1,2,3))
Calculates the mean of the posterior for the proportions of each behavior within track segments. These results can be explored to determine the optimal number of latent behavioral states.
extract_prop(res, ngibbs, nburn, nmaxclust)
extract_prop(res, ngibbs, nburn, nmaxclust)
res |
A list of results returned by |
ngibbs |
numeric. The total number of iterations of the MCMC chain. |
nburn |
numeric. The length of the burn-in phase. |
nmaxclust |
numeric. A single number indicating the maximum number of clusters to test. |
A matrix that stores the proportions of each state/cluster (columns) per track segment (rows).
#load data data(tracks.seg) #select only id, tseg, SL, and TA columns tracks.seg2<- tracks.seg[,c("id","tseg","SL","TA")] #summarize data by track segment obs<- summarize_tsegs(dat = tracks.seg2, nbins = c(5,8)) #cluster data with LDA res<- cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000, nburn = 500, nmaxclust = 7, ndata.types = 2) #Extract proportions of behaviors per track segment theta.estim<- extract_prop(res = res, ngibbs = 1000, nburn = 500, nmaxclust = 7)
#load data data(tracks.seg) #select only id, tseg, SL, and TA columns tracks.seg2<- tracks.seg[,c("id","tseg","SL","TA")] #summarize data by track segment obs<- summarize_tsegs(dat = tracks.seg2, nbins = c(5,8)) #cluster data with LDA res<- cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000, nburn = 500, nmaxclust = 7, ndata.types = 2) #Extract proportions of behaviors per track segment theta.estim<- extract_prop(res = res, ngibbs = 1000, nburn = 500, nmaxclust = 7)
Selects observations that belong to the time interval of interest and removes
all others. This function also removes entire IDs from the dataset when there
is one or fewer observations at this time interval. This function works
closely with round_track_time
to only retain observations
sampled at a regular time interval, which is important for analyzing step
lengths and turning angles. Column storing the time intervals must be labeled
dt
.
filter_time(dat.list, int)
filter_time(dat.list, int)
dat.list |
A list of data associated with each animal ID where names of list elements are the ID names. |
int |
numeric. The time interval of interest. |
A list where observations for each animal ID (element) has been
filtered for int
. Two columns (obs
and time1
) are
added for each list element (ID), which store the original observation
number before filtering and the new observation number after filtering,
respectively.
#load data data(tracks) #subset only first track tracks<- tracks[tracks$id == "id1",] #calculate step lengths and turning angles tracks<- prep_data(dat = tracks, coord.names = c("x","y"), id = "id") #round times to nearest interval of interest (e.g. 3600 s or 1 hr) tracks<- round_track_time(dat = tracks, id = "id", int = 3600, tol = 180, time.zone = "UTC", units = "secs") #create list from data frame tracks.list<- df_to_list(dat = tracks, ind = "id") #filter observations to only 1 hr (or 3600 s) tracks_filt.list<- filter_time(dat.list = tracks.list, int = 3600)
#load data data(tracks) #subset only first track tracks<- tracks[tracks$id == "id1",] #calculate step lengths and turning angles tracks<- prep_data(dat = tracks, coord.names = c("x","y"), id = "id") #round times to nearest interval of interest (e.g. 3600 s or 1 hr) tracks<- round_track_time(dat = tracks, id = "id", int = 3600, tol = 180, time.zone = "UTC", units = "secs") #create list from data frame tracks.list<- df_to_list(dat = tracks, ind = "id") #filter observations to only 1 hr (or 3600 s) tracks_filt.list<- filter_time(dat.list = tracks.list, int = 3600)
Identify changes within a discrete variable. These values can be used to
pre-specify breakpoints within the segmentation model using
segment_behavior
.
find_breaks(dat, ind)
find_breaks(dat, ind)
dat |
A data frame containing the data for each animal ID. |
ind |
character. The name of the column storing the discrete variable of interest. |
A vector of breakpoints is returned based on the data provided. If
wishing to identify separate breakpoints per animal ID, this function
should be mapped onto a list generated by df_to_list
.
#simuluate data var<- sample(1:3, size = 50, replace = TRUE) var<- rep(var, each = 20) id<- rep(1:10, each = 100) #create data frame dat<- data.frame(id, var) #create list dat.list<- df_to_list(dat = dat, ind = "id") #run function using purrr::map() breaks<- purrr::map(dat.list, ~find_breaks(dat = ., ind = "var")) #or with lapply() breaks1<- lapply(dat.list, find_breaks, ind = "var")
#simuluate data var<- sample(1:3, size = 50, replace = TRUE) var<- rep(var, each = 20) id<- rep(1:10, each = 100) #create data frame dat<- data.frame(id, var) #create list dat.list<- df_to_list(dat = dat, ind = "id") #run function using purrr::map() breaks<- purrr::map(dat.list, ~find_breaks(dat = ., ind = "var")) #or with lapply() breaks1<- lapply(dat.list, find_breaks, ind = "var")
Pulls model results for the estimates of bin proportions per movement variable from the posterior distribution. This can be used for visualization of movement variable distribution for each behavior estimated.
get_behav_hist(dat, nburn, ngibbs, nmaxclust, var.names)
get_behav_hist(dat, nburn, ngibbs, nmaxclust, var.names)
dat |
The list object returned by the LDA model
( |
nburn |
numeric. The length of the burn-in phase. |
ngibbs |
numeric. The total number of iterations of the MCMC chain. |
nmaxclust |
numeric. The maximum number of clusters on which to attribute behaviors. |
var.names |
character. A vector of names used for each of the movement
variables. Must be in the same order as were listed within the data frame
returned by |
A data frame that contains columns for bin number, behavioral state,
proportion represented by a given bin, and movement variable name. This is
displayed in a long format, which is easier to visualize using
ggplot2
.
#load data data(tracks.seg) #select only id, tseg, SL, and TA columns tracks.seg2<- tracks.seg[,c("id","tseg","SL","TA")] #summarize data by track segment obs<- summarize_tsegs(dat = tracks.seg2, nbins = c(5,8)) #cluster data with LDA res<- cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000, nburn = 500, nmaxclust = 7, ndata.types = 2) #Extract proportions of behaviors per track segment theta.estim<- extract_prop(res = res, ngibbs = 1000, nburn = 500, nmaxclust = 7) #run function for clustered segments behav.res<- get_behav_hist(dat = res, nburn = 500, ngibbs = 1000, nmaxclust = 7, var.names = c("Step Length","Turning Angle"))
#load data data(tracks.seg) #select only id, tseg, SL, and TA columns tracks.seg2<- tracks.seg[,c("id","tseg","SL","TA")] #summarize data by track segment obs<- summarize_tsegs(dat = tracks.seg2, nbins = c(5,8)) #cluster data with LDA res<- cluster_segments(dat = obs, gamma1 = 0.1, alpha = 0.1, ngibbs = 1000, nburn = 500, nmaxclust = 7, ndata.types = 2) #Extract proportions of behaviors per track segment theta.estim<- extract_prop(res = res, ngibbs = 1000, nburn = 500, nmaxclust = 7) #run function for clustered segments behav.res<- get_behav_hist(dat = res, nburn = 500, ngibbs = 1000, nmaxclust = 7, var.names = c("Step Length","Turning Angle"))
Extract breakpoints for each animal ID
get_breakpts(dat, MAP.est)
get_breakpts(dat, MAP.est)
dat |
A list of lists where animal IDs are separated as well as the
breakpoints estimated for each iteration of the MCMC chain. This is stored
within |
MAP.est |
numeric. A vector of values at which the maximum a posteriori
(MAP) estimate was identified for each of the animal IDs as returned by
|
A data frame where breakpoints are returned per animal ID within each
row. For animal IDs that have fewer breakpoints than the maximum number
that were estimated, NA
values are used as place holders for these
breakpoints that do not exist.
#load data data(tracks.list) #subset only first track tracks.list<- tracks.list[1] #only retain id and discretized step length (SL) and turning angle (TA) columns tracks.list2<- purrr::map(tracks.list, subset, select = c(id, SL, TA)) set.seed(1) # Define model params alpha<- 1 ngibbs<- 1000 nbins<- c(5,8) #future::plan(future::multisession) #run all MCMC chains in parallel dat.res<- segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins, alpha = alpha) # Determine MAP iteration for selecting breakpoints and store breakpoints MAP.est<- get_MAP(dat = dat.res$LML, nburn = ngibbs/2) brkpts<- get_breakpts(dat = dat.res$brkpts, MAP.est = MAP.est)
#load data data(tracks.list) #subset only first track tracks.list<- tracks.list[1] #only retain id and discretized step length (SL) and turning angle (TA) columns tracks.list2<- purrr::map(tracks.list, subset, select = c(id, SL, TA)) set.seed(1) # Define model params alpha<- 1 ngibbs<- 1000 nbins<- c(5,8) #future::plan(future::multisession) #run all MCMC chains in parallel dat.res<- segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins, alpha = alpha) # Determine MAP iteration for selecting breakpoints and store breakpoints MAP.est<- get_MAP(dat = dat.res$LML, nburn = ngibbs/2) brkpts<- get_breakpts(dat = dat.res$brkpts, MAP.est = MAP.est)
Identify the MCMC iteration that holds the MAP estimate. This will be used to
inform get_breakpts
as to which breakpoints should be retained
on which to assign track segments to the observations of each animal ID.
get_MAP(dat, nburn)
get_MAP(dat, nburn)
dat |
A data frame where each row holds the log marginal likelihood values at each iteration of the MCMC chain. |
nburn |
numeric. The size of the burn-in phase after which the MAP estimate will be identified. |
A numeric vector of iterations at which the MAP estimate was found for each animal ID.
#load data data(tracks.list) #subset only first track tracks.list<- tracks.list[1] #only retain id and discretized step length (SL) and turning angle (TA) columns tracks.list2<- purrr::map(tracks.list, subset, select = c(id, SL, TA)) set.seed(1) # Define model params alpha<- 1 ngibbs<- 1000 nbins<- c(5,8) #future::plan(future::multisession) #run all MCMC chains in parallel dat.res<- segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins, alpha = alpha) # Determine MAP iteration for selecting breakpoints and store breakpoints MAP.est<- get_MAP(dat = dat.res$LML, nburn = ngibbs/2)
#load data data(tracks.list) #subset only first track tracks.list<- tracks.list[1] #only retain id and discretized step length (SL) and turning angle (TA) columns tracks.list2<- purrr::map(tracks.list, subset, select = c(id, SL, TA)) set.seed(1) # Define model params alpha<- 1 ngibbs<- 1000 nbins<- c(5,8) #future::plan(future::multisession) #run all MCMC chains in parallel dat.res<- segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins, alpha = alpha) # Determine MAP iteration for selecting breakpoints and store breakpoints MAP.est<- get_MAP(dat = dat.res$LML, nburn = ngibbs/2)
An internal function that calculates the sufficient statistics to be used
within the reversible-jump MCMC Gibbs sampler called by
link{samp_move}
.
get_summary_stats(breakpt, dat, max.time, nbins, ndata.types)
get_summary_stats(breakpt, dat, max.time, nbins, ndata.types)
breakpt |
numeric. A vector of breakpoints. |
dat |
A matrix that only contains columns storing discretized data for each of the movement variables. |
max.time |
numeric. The number of of the last observation of |
nbins |
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within
|
ndata.types |
numeric. The length of |
Returns the sufficient statistics associated with the provided breakpoints for a given animal ID.
Calculates the log-likelihood of the mixture model based on estimates for theta and phi.
get.llk.mixmod(phi, theta, ndata.types, dat, nobs, nmaxclust)
get.llk.mixmod(phi, theta, ndata.types, dat, nobs, nmaxclust)
phi |
A list of proportion estimates that characterize distributions (bins) for each data stream and possible behavioral state. |
theta |
numeric. A vector of values that sum to one. |
ndata.types |
numeric. The number of data streams being analyzed. |
dat |
A data frame containing only columns of the discretized data streams for all observations. |
nobs |
numeric. The total number of rows in the dataset. |
nmaxclust |
numeric. A single number indicating the maximum number of clusters to test. |
A numeric value of the log-likelihood based upon the current values for phi and theta.
Calculates values of theta matrix within Gibbs sampler. Not for calling directly by users.
get.theta(v, nmaxclust, ntsegm)
get.theta(v, nmaxclust, ntsegm)
v |
A matrix returned by |
nmaxclust |
numeric. A single number indicating the maximum number of clusters to test. |
ntsegm |
numeric. The total number of time segments from all animal IDs. |
A matrix of proportion estimates that represent proportions of different behavioral states per time segment.
Insert NA gaps to regularize a time series
insert_NAs(data, int, units)
insert_NAs(data, int, units)
data |
A data frame that minimally contains columns for animal ID, date,
and time step. These must be labeled |
int |
integer. An integer that characterizes the desired interval on which to insert new rows. |
units |
character. The units of the selected time interval |
A data frame where new rows have been inserted to regularize the date
column. This results in values provided for id
, date
, and dt while inserting NAs for all other columns. Additionally, observations with duplicate date-times are removed.
#load data data(tracks) #remove rows to show how function works (create irregular time series) set.seed(1) ind<- sort(sample(2:15003, 500)) tracks.red<- tracks[-ind,] #calculate step lengths, turning angles, net-squared displacement, and time steps tracks.red<- prep_data(dat = tracks.red, coord.names = c("x","y"), id = "id") #round times to nearest interval tracks.red<- round_track_time(dat = tracks.red, id = "id", int = c(3600, 7200, 10800, 14400), tol = 300, units = "secs") #insert NA gaps dat.out<- insert_NAs(tracks.red, int = 3600, units = "secs")
#load data data(tracks) #remove rows to show how function works (create irregular time series) set.seed(1) ind<- sort(sample(2:15003, 500)) tracks.red<- tracks[-ind,] #calculate step lengths, turning angles, net-squared displacement, and time steps tracks.red<- prep_data(dat = tracks.red, coord.names = c("x","y"), id = "id") #round times to nearest interval tracks.red<- round_track_time(dat = tracks.red, id = "id", int = c(3600, 7200, 10800, 14400), tol = 300, units = "secs") #insert NA gaps dat.out<- insert_NAs(tracks.red, int = 3600, units = "secs")
An internal function that is used to calculate the log marginal likelihood of
models for the current and proposed sets of breakpoints. Called within
samp_move
.
log_marg_likel(alpha, summary.stats, nbins, ndata.types)
log_marg_likel(alpha, summary.stats, nbins, ndata.types)
alpha |
numeric. A single value used to specify the hyperparameter for
the prior distribution. A standard value for |
summary.stats |
A matrix of sufficient statistics returned from
|
nbins |
numeric. A vector of the number of bins used to discretize each movement variable. |
ndata.types |
numeric. The length of |
The log marginal likelihood is calculated for a model with a given set of breakpoints and the discretized data.
Visualize the breakpoints estimated by the segmentation model as they relate to either the original (continuous) or discretized data. These plots assist in determining whether too many or too few breakpoints were estimated as well as whether the user needs to redefine how they discretized their data before analysis.
plot_breakpoints(data, as_date = FALSE, var_names, var_labels = NULL, brkpts)
plot_breakpoints(data, as_date = FALSE, var_names, var_labels = NULL, brkpts)
data |
A list where each element stores a data frame for a given animal
ID. Each of these data frames contains columns for the ID, date or time1
generated by |
as_date |
logical. If |
var_names |
A vector of the column names for the movement variables to be plotted over time. |
var_labels |
A vector of the labels to be plotted on the y-axis for each
movement variable. Set to |
brkpts |
A data frame that contains the breakpoints associated with each
animal ID. This data frame is returned by |
A line plot per animal ID for each movement variable showing how the estimated
breakpoints relate to the underlying data. Depending on the user input for
var_names
, this may either be on the scale of the original
continuous data or the discretized data.
#load data data(tracks.list) #subset only first track tracks.list<- tracks.list[1] #only retain id and discretized step length (SL) and turning angle (TA) columns tracks.list2<- purrr::map(tracks.list, subset, select = c(id, SL, TA)) set.seed(1) # Define model params alpha<- 1 ngibbs<- 1000 nbins<- c(5,8) #future::plan(future::multisession) #run all MCMC chains in parallel dat.res<- segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins, alpha = alpha) # Determine MAP iteration for selecting breakpoints and store breakpoints MAP.est<- get_MAP(dat = dat.res$LML, nburn = ngibbs/2) brkpts<- get_breakpts(dat = dat.res$brkpts, MAP.est = MAP.est) #run function plot_breakpoints(data = tracks.list, as_date = FALSE, var_names = c("step","angle"), var_labels = c("Step Length (m)", "Turning Angle (rad)"), brkpts = brkpts)
#load data data(tracks.list) #subset only first track tracks.list<- tracks.list[1] #only retain id and discretized step length (SL) and turning angle (TA) columns tracks.list2<- purrr::map(tracks.list, subset, select = c(id, SL, TA)) set.seed(1) # Define model params alpha<- 1 ngibbs<- 1000 nbins<- c(5,8) #future::plan(future::multisession) #run all MCMC chains in parallel dat.res<- segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins, alpha = alpha) # Determine MAP iteration for selecting breakpoints and store breakpoints MAP.est<- get_MAP(dat = dat.res$LML, nburn = ngibbs/2) brkpts<- get_breakpts(dat = dat.res$brkpts, MAP.est = MAP.est) #run function plot_breakpoints(data = tracks.list, as_date = FALSE, var_names = c("step","angle"), var_labels = c("Step Length (m)", "Turning Angle (rad)"), brkpts = brkpts)
An internal function for plotting the results of the segmentation model.
plot_breakpoints_behav(data, as_date, var_names, var_labels, brkpts)
plot_breakpoints_behav(data, as_date, var_names, var_labels, brkpts)
data |
A data frame for a single animal ID that contains columns for the
ID, date or time variable, and each of the movement variables that were
analyzed by |
as_date |
logical. If |
var_names |
A vector of the column names for the movement variables to be plotted over time. |
var_labels |
A vector of the labels to be plotted on the y-axis for each
movement variable. Set to |
brkpts |
A data frame that contains the breakpoints associated with each
animal ID. This data frame is returned by |
A line plot for each movement variable showing how the estimated
breakpoints relate to the underlying data. Depending on the user input for
var_names
, this may either be on the scale of the original
continuous data or the discretized data.
Calculates step lengths, turning angles, and net-squared displacement based
on coordinates for each animal ID and calculates time steps based on the
date-time. Provides a self-contained method to calculate these variables
without needing to rely on other R packages (e.g., adehabitatLT
).
However, functions from other packages can also be used to perform this step
in data preparation.
prep_data(dat, coord.names, id)
prep_data(dat, coord.names, id)
dat |
A data frame that contains a column for animal IDs, the columns
associated with the x and y coordinates, and a column for the date. For
easier interpretation of the model results, it is recommended that
coordinates be stored in a UTM projection (meters) as opposed to
unprojected in decimal degrees (map units). Date-time should be of class
|
coord.names |
character. A vector of the column names under which the coordinates are stored. The name for the x coordinate should be listed first and the name for the y coordinate second. |
id |
character. The name of the column storing the animal IDs. |
A data frame where all original data are returned and new columns are
added for step length (step
), turning angle (angle
),
net-squared displacement (NSD
), and time
step (dt
). Names for coordinates are changed to x
and
y
. Units for step
and NSD
depend on the projection of the
coordinates, angle
is returned in radians, and dt
is
returned in seconds.
#load data data(tracks) #subset only first track tracks<- tracks[tracks$id == "id1",] #calculate step lengths and turning angles tracks<- prep_data(dat = tracks, coord.names = c("x","y"), id = "id")
#load data data(tracks) #subset only first track tracks<- tracks[tracks$id == "id1",] #calculate step lengths and turning angles tracks<- prep_data(dat = tracks, coord.names = c("x","y"), id = "id")
An internal function that calculates step lengths, turning angles, and time steps for a given animal ID.
prep_data_internal(dat, coord.names)
prep_data_internal(dat, coord.names)
dat |
A data frame that contains the columns associated with the x and y
coordinates as well as the date-time. For easier interpretation of the
model results, it is recommended that coordinates be stored after UTM
projection (meters) as opposed to unprojected in decimal degrees (map
units). Date-time should be of class |
coord.names |
character. A vector of the column names under which the coordinates are stored. The name for the x coordinate should be listed first and the name for the y coordinate second. |
A data frame where all original data are returned and new columns are
added for step length (step
), turning angle (angle
),
net-squared displacement (NSD
), and time step (dt
).
Internal function that samples z's from a categorical distribution
rmultinom1(prob, randu)
rmultinom1(prob, randu)
prob |
A numeric matrix. |
randu |
A numeric vector. |
Internal function that samples z's from a multinomial distribution
rmultinom2(prob, n, randu, nmaxclust)
rmultinom2(prob, n, randu, nmaxclust)
prob |
A numeric vector. |
n |
An integer. |
randu |
A numeric vector. |
nmaxclust |
An integer. |
Rounds sampling intervals that are close, but not exactly the time interval
of interest (e.g., 240 s instead of 300 s). This can be performed on multiple
time intervals, but only using a single tolerance value. This function
prepares the data to be analyzed by segment_behavior
, which
requires that all time intervals exactly match the primary time interval when
analyzing step lengths and turning angles. Columns storing the time intervals
and dates must be labeled dt
and date
, respectively, where
dates are of class POSIXct
.
round_track_time(dat, id, int, tol, time.zone = "UTC", units)
round_track_time(dat, id, int, tol, time.zone = "UTC", units)
dat |
A data frame that contains the sampling interval of the observations. |
id |
character. The name of the column storing the animal IDs. |
int |
numeric. A vector of the time interval(s) of on which to perform rounding. |
tol |
numeric. A single tolerance value on which to round any |
time.zone |
character. Specify the time zone for which the date-times
were recorded. Set to UTC by default. Refer to |
units |
character. The units of the selected time interval |
A data frame where dt
and date
are both adjusted based
upon the rounding of time intervals according to the specified tolerance.
#load data data(tracks) #subset only first track tracks<- tracks[tracks$id == "id1",] #calculate step lengths and turning angles tracks<- prep_data(dat = tracks, coord.names = c("x","y"), id = "id") #round times to nearest interval of interest (e.g. 3600 s or 1 hr) tracks<- round_track_time(dat = tracks, id = "id", int = 3600, tol = 180, time.zone = "UTC", units = "secs")
#load data data(tracks) #subset only first track tracks<- tracks[tracks$id == "id1",] #calculate step lengths and turning angles tracks<- prep_data(dat = tracks, coord.names = c("x","y"), id = "id") #round times to nearest interval of interest (e.g. 3600 s or 1 hr) tracks<- round_track_time(dat = tracks, id = "id", int = 3600, tol = 180, time.zone = "UTC", units = "secs")
This is RJMCMC algorithm that drives the proposal and selection of
breakpoints for the data based on the difference in log marginal likelihood.
This function is called within behav_gibbs_sampler
.
samp_move(breakpt, max.time, dat, alpha, nbins, ndata.types)
samp_move(breakpt, max.time, dat, alpha, nbins, ndata.types)
breakpt |
numeric. A vector of breakpoints. |
max.time |
numeric. The number of of the last observation of |
dat |
A matrix that only contains columns storing discretized data for
each of the movement variables used within |
alpha |
numeric. A single value used to specify the hyperparameter for
the prior distribution. A standard value for |
nbins |
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within
|
ndata.types |
numeric. The length of |
The breakpoints and log marginal likelihood are retained from the selected model from the Gibbs sampler and returned as elements of a list. This is performed for each iteration of the MCMC algorithm.
Internal function to sample the gamma hyperparameter
sample.gamma.mixmod(v, ngroup, gamma.possib)
sample.gamma.mixmod(v, ngroup, gamma.possib)
v |
numeric. A vector of proportions for each of the possible clusters. |
ngroup |
numeric. The total number of possible clusters. |
gamma.possib |
numeric. A vector of possible values that gamma can take ranging between 0.1 and 1. |
A single numeric value for gamma that falls within
gamma.possib
for calculation of the log-likelihood.
Estimates values of phi matrix for use in characterizing distributions of the movement variables. Not for calling directly by users.
sample.phi(z.agg, alpha, nmaxclust, nbins, ndata.types)
sample.phi(z.agg, alpha, nmaxclust, nbins, ndata.types)
z.agg |
A list of latent cluster estimates provided by
|
alpha |
numeric. A hyperparameter for the Dirichlet distribution. |
nmaxclust |
numeric. A single number indicating the maximum number of clusters to test. |
nbins |
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within
|
ndata.types |
numeric. The number of data streams being analyzed. |
A matrix of proportion estimates that characterize distributions (bins) for each movement variable and possible behavioral state.
Estimates values of phi matrix for use in characterizing distributions of the movement variables. Not for calling directly by users.
sample.phi.mixmod(alpha, nmaxclust, nbins, ndata.types, nmat)
sample.phi.mixmod(alpha, nmaxclust, nbins, ndata.types, nmat)
alpha |
numeric. A hyperparameter for the Dirichlet distribution. |
nmaxclust |
numeric. A single number indicating the maximum number of clusters to test. |
nbins |
numeric. A vector of the number of bins used to discretize each
data stream. These must be in the same order as the columns within
|
ndata.types |
numeric. The number of data streams being analyzed. |
nmat |
A list based on |
A list of proportion estimates that characterize distributions (bins) for each data stream and possible behavioral state.
This function samples the latent v parameter within the Gibbs sampler.
Calls on the CumSumInv
function written in C++. Not for calling
directly by users.
sample.v(z.agg, gamma1, ntsegm, ndata.types, nmaxclust)
sample.v(z.agg, gamma1, ntsegm, ndata.types, nmaxclust)
z.agg |
A list of latent cluster estimates provided by
|
gamma1 |
numeric. Hyperparameter for the truncated stick-breaking prior. |
ntsegm |
numeric. The total number of time segments from all animal IDs. |
ndata.types |
numeric. The number of data streams being analyzed. |
nmaxclust |
numeric. A single number indicating the maximum number of clusters to test. |
A matrix with estimates for v for each of the number of time segments and possible states.
This function samples the latent v parameter within the Gibbs sampler. Not for calling directly by users.
sample.v.mixmod(z, gamma1, nmaxclust)
sample.v.mixmod(z, gamma1, nmaxclust)
z |
A vector of latent cluster estimates provided by
|
gamma1 |
numeric. Hyperparameter for the truncated stick-breaking prior. |
nmaxclust |
numeric. A single number indicating the maximum number of clusters to test. |
A list with estimates for v and theta for each of the possible states.
This function samples the latent z parameter within the Gibbs sampler.
Calls on the SampleZAgg
function written in C++. Not for calling
directly by users.
sample.z(ntsegm, nbins, y, nmaxclust, phi, ltheta, zeroes, ndata.types)
sample.z(ntsegm, nbins, y, nmaxclust, phi, ltheta, zeroes, ndata.types)
ntsegm |
numeric. The total number of time segments from all animal IDs. |
nbins |
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within
|
y |
A list where each element stores separate aggregated count data per bin per time segment for each movement variable being analyzed. These are stored as matrices. |
nmaxclust |
numeric. A single number indicating the maximum number of clusters to test. |
phi |
A list where each element stores separate proportions per bin per time segment for each movement variable. |
ltheta |
A matrix storing the log-transformed values from the
|
zeroes |
A list of arrays that contain only zero values which are three dimensional (ntsegm,nbins[i],nmaxclust). |
ndata.types |
numeric. The number of data streams being analyzed. |
A list with estimates for z where the number of elements is equal to the number of movement variables.
This function samples the latent z parameter within the Gibbs sampler.
Calls on the rmultinom1
function written in C++. Not for calling
directly by users.
sample.z.mixmod(nobs, nmaxclust, dat, ltheta, lphi, ndata.types)
sample.z.mixmod(nobs, nmaxclust, dat, ltheta, lphi, ndata.types)
nobs |
numeric. The total number of rows in the dataset. |
nmaxclust |
numeric. A single number indicating the maximum number of clusters to test. |
dat |
A data frame containing only columns of the discretized data streams for all observations. |
ltheta |
numeric. A vector of log-transformed estimates for parameter theta. |
lphi |
A list containing log-transformed estimates for each data stream of the phi parameter. |
ndata.types |
numeric. The number of data streams being analyzed. |
A vector with estimates for z for each observation within
dat
.
Internal function that samples z1 aggregate
SampleZAgg(ntsegm, b1, y1, nmaxclust, lphi1, ltheta, zeroes)
SampleZAgg(ntsegm, b1, y1, nmaxclust, lphi1, ltheta, zeroes)
ntsegm |
An integer. |
b1 |
An integer. |
y1 |
An integer matrix. |
nmaxclust |
An integer. |
lphi1 |
A numeric matrix. |
ltheta |
A numeric matrix. |
zeroes |
A numeric vector. |
This function performs the reversible-jump MCMC algorithm using a Gibbs sampler, which estimates the breakpoints of the movement variables for each of the animal IDs. This is the first stage of the two-stage Bayesian model that estimates proportions of behavioral states by first segmenting individual tracks into relatively homogeneous segments of movement.
segment_behavior( data, ngibbs, nbins, alpha, breakpt = purrr::map(names(data), ~NULL) )
segment_behavior( data, ngibbs, nbins, alpha, breakpt = purrr::map(names(data), ~NULL) )
data |
A list where each element stores the data for a separate animal ID. List elements are data frames that only contain columns for the animal ID and for each of the discretized movement variables. |
ngibbs |
numeric. The total number of iterations of the MCMC chain. |
nbins |
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within
|
alpha |
numeric. A single value used to specify the hyperparameter for
the prior distribution. A standard value for |
breakpt |
A list where each element stores a vector of breakpoints if
pre-specifying where they may occur for each animal ID. By default this is
set to |
This model is run in parallel using the future
package. To ensure that
the model is run in parallel, the plan
must be used
with future::multisession
as the argument for most operating systems.
Otherwise, model will run sequentially by default if this is not set before
running segment_behavior
.
A list of model results is returned where elements include the breakpoints, number of breakpoints, and log marginal likelihood at each iteration of the MCMC chain for all animal IDs. The time it took the model to finish running for each animal ID are also stored and returned.
#load data data(tracks.list) #subset only first track tracks.list<- tracks.list[1] #only retain id and discretized step length (SL) and turning angle (TA) columns tracks.list2<- purrr::map(tracks.list, subset, select = c(id, SL, TA)) set.seed(1) # Define model params alpha<- 1 ngibbs<- 1000 nbins<- c(5,8) future::plan(future::multisession, workers = 3) #run all MCMC chains in parallel dat.res<- segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins, alpha = alpha) future::plan(future::sequential) #return to single core
#load data data(tracks.list) #subset only first track tracks.list<- tracks.list[1] #only retain id and discretized step length (SL) and turning angle (TA) columns tracks.list2<- purrr::map(tracks.list, subset, select = c(id, SL, TA)) set.seed(1) # Define model params alpha<- 1 ngibbs<- 1000 nbins<- c(5,8) future::plan(future::multisession, workers = 3) #run all MCMC chains in parallel dat.res<- segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins, alpha = alpha) future::plan(future::sequential) #return to single core
This Shiny application allows for the exploration of animal movement patterns. Options are available to interactively filter the plotted tracks by a selected time period of a given variable, which is then displayed on an interactive map. Additionally, a data table is shown with options to filter and export this table once satisfied.
shiny_tracks(data, epsg)
shiny_tracks(data, epsg)
data |
A data frame that must contain columns labeled |
epsg |
numeric. The coordinate reference system (CRS) as an EPSG code or a PROJ string. |
Currently, the time series plot shown for the exploration of individual
tracks cannot display variables of class character
or factor
.
Therefore, these should be changed to numeric values if they are to be
plotted.
If the data are stored as longitude and latitude (i.e., WGS84), the EPSG code is 4326. All other codes will need to be looked up if they are not already known.
## Not run: #load data data(tracks) #run Shiny app shiny_tracks(data = tracks, epsg = 32617) ## End(Not run)
## Not run: #load data data(tracks) #run Shiny app shiny_tracks(data = tracks, epsg = 32617) ## End(Not run)
This function helps store z from all iterations after burn in
StoreZ(z, store_z, nobs)
StoreZ(z, store_z, nobs)
z |
An integer vector. |
store_z |
An integer matrix. |
nobs |
An integer. |
Prepares the data that has already been segmented for clustering by Latent Dirichlet Allocation. This function summarizes the counts observed per movement variable bin within each track segment per animal ID.
summarize_tsegs(dat, nbins)
summarize_tsegs(dat, nbins)
dat |
A data frame of only the animal ID, track segment number,
and the discretized data for each movement variable. Animal ID and time
segment must be the first two columns of this data frame. This should be a
simplified form of the output from |
nbins |
numeric. A vector of the number of bins used to discretize each
movement variable. These must be in the same order as the columns within
|
A new data frame that contains the animal ID, track segment number,
and the counts per bin for each movement variable. The names for each of
these bins are labeled according to the order in which the variables were
provided to summarize_tsegs
.
#load data data(tracks.seg) #select only id, tseg, SL, and TA columns tracks.seg2<- tracks.seg[,c("id","tseg","SL","TA")] #run function obs<- summarize_tsegs(dat = tracks.seg2, nbins = c(5,8))
#load data data(tracks.seg) #select only id, tseg, SL, and TA columns tracks.seg2<- tracks.seg[,c("id","tseg","SL","TA")] #run function obs<- summarize_tsegs(dat = tracks.seg2, nbins = c(5,8))
Internal function that summarizes bin distributions of track segments
summarize1(VecVals, Breakpts, nobs, nbins, nbreak)
summarize1(VecVals, Breakpts, nobs, nbins, nbreak)
VecVals |
A vector of bin values. |
Breakpts |
A vector if breakpoints. |
nobs |
The number of observations. |
nbins |
The number of bins for a given data stream. |
nbreak |
The number of estimated breakpoints. |
Internal function that generates nmat matrix to help with multinomial draws
SummarizeDat(z, dat, ncateg, nbehav, nobs)
SummarizeDat(z, dat, ncateg, nbehav, nobs)
z |
An integer vector. |
dat |
An integer vector. |
ncateg |
An integer. |
nbehav |
An integer. |
nobs |
An integer. |
Visualize trace-plots of the number of breakpoints estimated by the model as well as the log marginal likelihood (LML) for each animal ID.
traceplot(data, type)
traceplot(data, type)
data |
A list of model results that is returned as output from |
type |
character. The type of data that are being plotted from the Bayesian segmentation model results. Takes either 'nbrks' for the number of breakpoints or 'LML' for the log marginal likelihood. |
Trace-plots for the number of breakpoints or the log marginal likelihood are displayed for each of the animal IDs that were analyzed by the segmentation model.
#load data data(tracks.list) #only retain id and discretized step length (SL) and turning angle (TA) columns tracks.list2<- purrr::map(tracks.list, subset, select = c(id, SL, TA)) set.seed(1) # Define model params alpha<- 1 ngibbs<- 1000 nbins<- c(5,8) future::plan(future::multisession, workers = 3) #run all MCMC chains in parallel dat.res<- segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins, alpha = alpha) future::plan(future::sequential) #return to single core #run function traceplot(data = dat.res, type = "nbrks") traceplot(data = dat.res, type = "LML")
#load data data(tracks.list) #only retain id and discretized step length (SL) and turning angle (TA) columns tracks.list2<- purrr::map(tracks.list, subset, select = c(id, SL, TA)) set.seed(1) # Define model params alpha<- 1 ngibbs<- 1000 nbins<- c(5,8) future::plan(future::multisession, workers = 3) #run all MCMC chains in parallel dat.res<- segment_behavior(data = tracks.list2, ngibbs = ngibbs, nbins = nbins, alpha = alpha) future::plan(future::sequential) #return to single core #run function traceplot(data = dat.res, type = "nbrks") traceplot(data = dat.res, type = "LML")
A dataset containing the IDs as well as x and y coordinates for three tracks of 5001 observations each (15,003 in total).
tracks
tracks
A data frame with 15003 rows and 4 variables:
ID for each simulated track
date, recorded as datetime
x coordinate of tracks
y coordinate of tracks
A dataset containing the prepared data after discretizing step lengths and turning angles, as well as filtering observations at the primary time step.
tracks.list
tracks.list
A list with three elements, each containing a data frame with ~4700 rows and 11 variables:
ID for each simulated track
date, recorded as datetime
x coordinate of tracks
y coordinate of tracks
the step length calculated as the distance between successive locations measured in units
the relative turning angle measured in radians
the time step or sampling interval between datetimes of successive observations
the ordered number of observations per ID before filtering for the primary time step
the ordered number of observations per ID after filtering for the primary time step
discretized step lengths, separated into five bins
discretized turning angles, separated into eight bins
A dataset containing the filtered track data with time segments assigned to all observations on an individual basis.
tracks.seg
tracks.seg
A data frame with 14096 rows and 12 variables:
ID for each simulated track
date, recorded as datetime
x coordinate of tracks
y coordinate of tracks
the step length calculated as the distance between successive locations measured in units
the relative turning angle measured in radians
the time step or sampling interval between datetimes of successive observations
the ordered number of observations per ID before filtering for the primary time step
the ordered number of observations per ID after filtering for the primary time step
discretized step lengths, separated into five bins
discretized turning angles, separated into eight bins
time segment assigned to a given set of observations per ID