Working with spatial data

library(ARUtools)
library(sf)
library(dplyr)

Here we’ll cover some workflows using spatial data. For a general workflow, best take a look at the “Getting started with ARUtools” first.

We’ll start with our metadata data frame.

m <- clean_metadata(project_files = example_files)
#> Extracting ARU info...
#> Extracting Dates and Times...
m
#> # A tibble: 42 × 8
#>   file_name   type  path  aru_type aru_id site_id date_time           date      
#>   <chr>       <chr> <chr> <chr>    <chr>  <chr>   <dttm>              <date>    
#> 1 P01_1_2020… wav   a_BA… BarLT    BARLT… P01_1   2020-05-02 05:00:00 2020-05-02
#> 2 P01_1_2020… wav   a_BA… BarLT    BARLT… P01_1   2020-05-03 05:20:00 2020-05-03
#> 3 P02_1_2020… wav   a_S4… SongMet… S4A01… P02_1   2020-05-04 05:25:00 2020-05-04
#> 4 P02_1_2020… wav   a_S4… SongMet… S4A01… P02_1   2020-05-05 07:30:00 2020-05-05
#> # ℹ 38 more rows

This isn’t spatial because we don’t actually know where the sites are located. But our next step is to get our site coordinates.

Let’s assume we have a spatial data frame containing our sites and where they are located.

s <- st_as_sf(example_sites, coords = c("lon", "lat"), crs = 4326)
s
#> Simple feature collection with 10 features and 6 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -91.38 ymin: 45 xmax: -84.45 ymax: 52.68
#> Geodetic CRS:  WGS 84
#>    Sites Date_set_out Date_removed        ARU Plots Subplot
#> 1  P01_1   2020-05-01   2020-05-03 BARLT10962 Plot1       a
#> 2  P02_1   2020-05-03   2020-05-05   S4A01234 Plot1       a
#> 3  P03_1   2020-05-05   2020-05-06 BARLT10962 Plot2       a
#> 4  P04_1   2020-05-05   2020-05-07 BARLT11111 Plot2       a
#> 5  P05_1   2020-05-06   2020-05-07 BARLT10962 Plot3       b
#> 6  P06_1   2020-05-08   2020-05-09 BARLT10962 Plot1       a
#> 7  P07_1   2020-05-08   2020-05-10   S4A01234 Plot1       a
#> 8  P08_1   2020-05-10   2020-05-11 BARLT10962 Plot2       a
#> 9  P09_1   2020-05-10   2020-05-11   S4A02222 Plot2       a
#> 10 P10_1   2020-05-10   2020-05-11   S4A03333 Plot3       b
#>                 geometry
#> 1   POINT (-85.03 50.01)
#> 2   POINT (-87.45 52.68)
#> 3   POINT (-90.38 48.99)
#> 4      POINT (-85.53 45)
#> 5   POINT (-88.45 51.05)
#> 6      POINT (-90.08 52)
#> 7   POINT (-86.03 50.45)
#> 8  POINT (-84.45 48.999)
#> 9      POINT (-91.38 45)
#> 10     POINT (-90 50.01)

Similar to a non-spatial workflow, we’ll clean up this list so we can add these sites to our metadata.

sites <- clean_site_index(s,
  name_aru_id = "ARU",
  name_site_id = "Sites",
  name_date_time = c("Date_set_out", "Date_removed")
)
#> There are overlapping date ranges
#> • Shifting start/end times to 'noon'
#> • Skip this with `resolve_overlaps = FALSE`
sites
#> Simple feature collection with 10 features and 6 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -91.38 ymin: 45 xmax: -84.45 ymax: 52.68
#> Geodetic CRS:  WGS 84
#> # A tibble: 10 × 7
#>   site_id aru_id   date_time_start     date_time_end       date_start date_end  
#> * <chr>   <chr>    <dttm>              <dttm>              <date>     <date>    
#> 1 P01_1   BARLT10… 2020-05-01 12:00:00 2020-05-03 12:00:00 2020-05-01 2020-05-03
#> 2 P02_1   S4A01234 2020-05-03 12:00:00 2020-05-05 12:00:00 2020-05-03 2020-05-05
#> 3 P03_1   BARLT10… 2020-05-05 12:00:00 2020-05-06 12:00:00 2020-05-05 2020-05-06
#> 4 P04_1   BARLT11… 2020-05-05 12:00:00 2020-05-07 12:00:00 2020-05-05 2020-05-07
#> # ℹ 6 more rows
#> # ℹ 1 more variable: geometry <POINT [°]>

Note that we still have a spatial data set.

Now let’s add this site-related information to our metadata.

m <- add_sites(m, sites)
#> Joining by columns `date_time_start` and `date_time_end`
m
#> Simple feature collection with 42 features and 8 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -91.38 ymin: 45 xmax: -84.45 ymax: 52.68
#> Geodetic CRS:  WGS 84
#> # A tibble: 42 × 9
#>   file_name   type  path  aru_type aru_id site_id date_time           date      
#> * <chr>       <chr> <chr> <chr>    <chr>  <chr>   <dttm>              <date>    
#> 1 P01_1_2020… wav   a_BA… BarLT    BARLT… P01_1   2020-05-02 05:00:00 2020-05-02
#> 2 P01_1_2020… wav   a_BA… BarLT    BARLT… P01_1   2020-05-03 05:20:00 2020-05-03
#> 3 P02_1_2020… wav   a_S4… SongMet… S4A01… P02_1   2020-05-04 05:25:00 2020-05-04
#> 4 P02_1_2020… wav   a_S4… SongMet… S4A01… P02_1   2020-05-05 07:30:00 2020-05-05
#> # ℹ 38 more rows
#> # ℹ 1 more variable: geometry <POINT [°]>

Again our output is as a spatial data set.

Let’s continue by adding times to sunrise/sunset.

m <- calc_sun(m)
m
#> Simple feature collection with 42 features and 11 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -91.38 ymin: 45 xmax: -84.45 ymax: 52.68
#> Geodetic CRS:  WGS 84
#> # A tibble: 42 × 12
#>   file_name   type  path  aru_type aru_id site_id date_time           date      
#>   <chr>       <chr> <chr> <chr>    <chr>  <chr>   <dttm>              <date>    
#> 1 P01_1_2020… wav   a_BA… BarLT    BARLT… P01_1   2020-05-02 05:00:00 2020-05-02
#> 2 P01_1_2020… wav   a_BA… BarLT    BARLT… P01_1   2020-05-03 05:20:00 2020-05-03
#> 3 P02_1_2020… wav   a_S4… SongMet… S4A01… P02_1   2020-05-04 05:25:00 2020-05-04
#> 4 P02_1_2020… wav   a_S4… SongMet… S4A01… P02_1   2020-05-05 07:30:00 2020-05-05
#> # ℹ 38 more rows
#> # ℹ 4 more variables: tz <chr>, t2sr <dbl>, t2ss <dbl>, geometry <POINT [°]>

All done! And we’ve retained a spatial data set the entire way.

Problems

However, sometimes spatial data sets might be trickier to use.

For example, sf spatial data sets cannot have missing coordinates, meaning that when using the add_sites() function, you’ll get a warning and a data frame back if you try to add an incomplete list of sites.

m <- clean_metadata(project_files = example_files)
#> Extracting ARU info...
#> Extracting Dates and Times...

sites <- st_as_sf(example_sites, coords = c("lon", "lat"), crs = 4326) |>
  clean_site_index(
    name_aru_id = "ARU",
    name_site_id = "Sites",
    name_date_time = c("Date_set_out", "Date_removed")
  )
#> There are overlapping date ranges
#> • Shifting start/end times to 'noon'
#> • Skip this with `resolve_overlaps = FALSE`

sites <- sites[-1, ] # Omit that first site

m <- add_sites(m, sites)
#> Joining by columns `date_time_start` and `date_time_end`
#> Identified possible problems with metadata extraction:
#> ✖ Not all files were matched to a site reference (6/42)
#> • Consider adjusting the `by` argument
#> Warning in add_sites(m, sites): Cannot have missing coordinates in spatial data frames
#> • Returning non-spatial data frame
m
#> # A tibble: 42 × 10
#>   file_name   type  path  aru_type aru_id site_id date_time           date      
#>   <chr>       <chr> <chr> <chr>    <chr>  <chr>   <dttm>              <date>    
#> 1 P01_1_2020… wav   a_BA… BarLT    BARLT… P01_1   2020-05-02 05:00:00 2020-05-02
#> 2 P01_1_2020… wav   a_BA… BarLT    BARLT… P01_1   2020-05-03 05:20:00 2020-05-03
#> 3 P02_1_2020… wav   a_S4… SongMet… S4A01… P02_1   2020-05-04 05:25:00 2020-05-04
#> 4 P02_1_2020… wav   a_S4… SongMet… S4A01… P02_1   2020-05-05 07:30:00 2020-05-05
#> # ℹ 38 more rows
#> # ℹ 2 more variables: longitude <dbl>, latitude <dbl>

To resolve this, either add in the missing site information, or omit the files before joining.

m <- clean_metadata(project_files = example_files) |>
  filter(date > "2020-05-03") # Filter out recordings that don't match a site
#> Extracting ARU info...
#> Extracting Dates and Times...

m <- add_sites(m, sites)
#> Joining by columns `date_time_start` and `date_time_end`
m
#> Simple feature collection with 36 features and 8 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -91.38 ymin: 45 xmax: -84.45 ymax: 52.68
#> Geodetic CRS:  WGS 84
#> # A tibble: 36 × 9
#>   file_name   type  path  aru_type aru_id site_id date_time           date      
#> * <chr>       <chr> <chr> <chr>    <chr>  <chr>   <dttm>              <date>    
#> 1 P02_1_2020… wav   a_S4… SongMet… S4A01… P02_1   2020-05-04 05:25:00 2020-05-04
#> 2 P02_1_2020… wav   a_S4… SongMet… S4A01… P02_1   2020-05-05 07:30:00 2020-05-05
#> 3 P03_1_2020… wav   a_BA… BarLT    BARLT… P03_1   2020-05-06 10:00:00 2020-05-06
#> 4 P04_1_2020… wav   a_BA… BarLT    BARLT… P04_1   2020-05-06 05:00:00 2020-05-06
#> # ℹ 32 more rows
#> # ℹ 1 more variable: geometry <POINT [°]>

Fixed!