Northern Cardinal image credit: Public Domain Files

Hamilton Christmas Bird Count: Part 2a

Further cleaning of the Hamilton Christmas Bird Count data

Northern Cardinal image credit: Public Domain Files

Hamilton Christmas Bird Count: Part 2a

Further cleaning of the Hamilton Christmas Bird Count data

Note

This is the second part of four for this dataset.

  • Part 1 contains data downloading and cleaning

  • Part 2b contains visualizations

  • Part 3 contains my Shiny app!

  • Part 4 contains a gganimated plot.

Introduction

While I was visualizing the data, I realized I still needed to do a bit more cleaning. So this is a short post outlining my steps to do so.

To start, we’ll load all of the packages we’ll be using and set the plot theme:

library(dplyr)
library(readr)
library(stringr)
library(here)
library(knitr)
library(kableExtra)
hamilton_cbc <- read_rds(here("content",
                              "post",
                              "hamilton_cbc_part_3",
                              "hamilton_cbc_shiny",
                              "hamilton_cbc_output.rds"))

Final cleaning touches

Particularly, I want to:

  1. Remove hybrid birds

  2. Consolidate the names of some species that had variations in them

Let’s see how many hybrid species we have and remove them:

hamilton_cbc %>%
  filter(str_detect(species, "hybrid")) %>%
  distinct(species) %>%
  kable() %>%
  kable_styling(full_width = FALSE, position = "left")
species
Snow x Canada Goose (hybrid)
American Black Duck x Mallard (hybrid)
Mallard x Northern Pintail (hybrid)
Herring x Glaucous Gull (hybrid)
Herring x Great Black-backed Gull (hybrid)
hamilton_cbc <- hamilton_cbc %>%
  filter(!str_detect(species, "hybrid"))

Now, onto cleaning the trickier stuff. Sometimes, species have sub-species names or groups that have different total counts. For example, the Juncos (where total_counted is the sum of the counts over all years for that species):

hamilton_cbc %>%
  filter(str_detect(species, "Junco")) %>%
  group_by(species, species_latin) %>%
  summarise(total_counted = sum(how_many_counted)) %>%
  ungroup() %>%
  kable() %>%
  kable_styling(full_width = FALSE, position = "left")
species species_latin total_counted
Dark-eyed Junco Junco hyemalis 14426
Dark-eyed Junco (Oregon) Junco hyemalis [oreganus Group 39
Dark-eyed Junco (Slate-colored) Junco hyemalis hyemalis/carolinensis 46764
Dark-eyed Junco (White-winged) Junco hyemalis aikeni 1

I just want there to be one Dark-eyed Junco species in this dataset, so I am going to consolidate these four sub-species into one species. (Even though people get way more excited about seeing the Oregon sub-species in Hamilton than the Slate-colored 😄.)

The first step is to only keep the first two words of the species_latin variable:

hamilton_cbc <- hamilton_cbc %>%
  mutate(species_latin = word(species_latin, start = 1, end = 2))

We can also see who else is in this list:

hamilton_cbc %>%
  group_by(species_latin) %>%
  filter(n_distinct(species) > 1) %>%
  group_by(species, species_latin) %>%
  summarise(total_counted = sum(how_many_counted)) %>%
  ungroup() %>%
  kable() %>%
  kable_styling(full_width = FALSE, position = "left")
species species_latin total_counted
American Kestrel Falco sparverius 1520
American Kestrel (Northern) Falco sparverius 4
Brant Branta bernicla 8
Brant (Atlantic) Branta bernicla 1
Common Grackle Quiscalus quiscula 173
Common Grackle (Purple) Quiscalus quiscula 17
Dark-eyed Junco Junco hyemalis 14426
Dark-eyed Junco (Oregon) Junco hyemalis 39
Dark-eyed Junco (Slate-colored) Junco hyemalis 46764
Dark-eyed Junco (White-winged) Junco hyemalis 1
Green-winged Teal Anas crecca 671
Green-winged Teal (American) Anas crecca 469
Horned Lark Eremophila alpestris 1712
Horned Lark (Eastern dark Group) Eremophila alpestris 24
Iceland Gull Larus glaucoides 153
Iceland Gull (kumlieni) Larus glaucoides 9
Northern Flicker Colaptes auratus 99
Northern Flicker (Yellow-shafted) Colaptes auratus 727
Northern Goshawk Accipiter gentilis 51
Northern Goshawk (American) Accipiter gentilis 10
Purple Finch Haemorhous purpureus 1355
Purple Finch (Eastern) Haemorhous purpureus 3
Song Sparrow Melospiza melodia 4352
Song Sparrow (melodia/atlantica) Melospiza melodia 41
Tundra Swan Cygnus columbianus 1032
Tundra Swan (Whistling) Cygnus columbianus 4

The second step is to sum up the counts for each year across all of the sub-species so the counts are the same, and then filter to only keep the first instance of each species (which, when arranged alphabetically, is the shortest species name):

hamilton_cbc <- hamilton_cbc %>%
  group_by(year, species_latin) %>%
  mutate(how_many_counted = sum(how_many_counted)) %>%
  arrange(year, species) %>%
  filter(row_number() == 1) %>%
  ungroup()

hamilton_cbc %>%
  filter(str_detect(species, "Junco")) %>%
  group_by(species, species_latin) %>%
  summarise(total_counted = sum(how_many_counted)) %>%
  ungroup() %>%
  kable() %>%
  kable_styling(full_width = FALSE, position = "left")
species species_latin total_counted
Dark-eyed Junco Junco hyemalis 61230

Perfect! No more sub-species. The last group of species to deal with is species where the name has either a ( or a /:

hamilton_cbc %>%
  group_by(species, species_latin) %>%
  summarise(total_counted = sum(how_many_counted)) %>%
  ungroup() %>%
  filter(str_detect(species, "\\(|/")) %>% # The "|" is an "or" within the regex
  kable() %>%
  kable_styling(full_width = FALSE, position = "left")
species species_latin total_counted
Barn Owl (American) Tyto alba 1
Bullock’s/Baltimore Oriole Icterus bullockii/galbula 1
Great Blue Heron (Blue form) Ardea herodias 362
Greater/Lesser Scaup Aythya marila/affinis 26558
Pacific/Winter Wren Troglodytes pacificus/hiemalis 498
Palm Warbler (Western) Setophaga palmarum 1
Rock Pigeon (Feral Pigeon) Columba livia 60114
Spotted/Eastern Towhee (Rufous-sided Towhee) Pipilo maculatus/erythrophthalmus 28
Western/Eastern Meadowlark Sturnella neglecta/magna 49
Wilson’s/Common Snipe Gallinago delicata/gallinago 13
Yellow-rumped Warbler (Myrtle) Setophaga coronata 65

I am going to make some executive decisions about what to do with these species:

  1. Delete species guess: Greater/Lesser Scaup
  2. Assume super-rare species were in fact the more common species:
    • Bullock’s/Baltimore Oriole were Baltimore Orioles
    • Western/Eastern Meadowlark were Eastern Meadowlarks
    • Wilson’s/Common Snipe were Common Snipes
    • Spotted/Eastern Towhee (Rufous-sided Towhee) were Eastern Towhees
    • Pacific/Winter Wren were Winter Wrens
  3. Remove parentheses on the remaining species for neatness
hamilton_cbc <- hamilton_cbc %>%
  filter(!(species == "Greater/Lesser Scaup")) %>%
  mutate(species = case_when(species == "Bullock's/Baltimore Oriole" ~ "Baltimore Oriole",
                             species == "Western/Eastern Meadowlark" ~ "Eastern Meadowlark",
                             species == "Wilson's/Common Snipe" ~ "Common Snipe",
                             species == "Spotted/Eastern Towhee (Rufous-sided Towhee)" ~ "Eastern Towhee",
                             species == "Pacific/Winter Wren" ~ "Winter Wren",
                             TRUE ~ species),
         species_latin = case_when(species_latin == "Icterus bullockii/galbula" ~ "Icterus galbula",
                             species_latin == "Sturnella neglecta/magna" ~ "Sturnella magna",
                             species_latin == "Gallinago delicata/gallinago" ~ "Gallinago gallinago",
                             species_latin == "Pipilo maculatus/erythrophthalmus" ~ "Pipilo erythrophthalmus",
                             species_latin == "Troglodytes pacificus/hiemalis" ~ "Troglodytes hiemalis",
                             TRUE ~ species_latin),
         species = case_when(species == "Barn Owl (American)" ~ "Barn Owl",
                             species == "Great Blue Heron (Blue form)" ~ "Great Blue Heron",
                             species == "Palm Warbler (Western)" ~ "Palm Warbler",
                             species == "Rock Pigeon (Feral Pigeon)" ~ "Rock Pigeon",
                             species == "Yellow-rumped Warbler (Myrtle)" ~ "Yellow-rumped Warbler",
                             TRUE ~ species))

# Consolidate the counts between the species whose names were just updated (same step as was done in the earlier sub-species section)
hamilton_cbc <- hamilton_cbc %>%
  group_by(year, species) %>%
  mutate(how_many_counted = sum(how_many_counted)) %>%
  arrange(year, species) %>%
  filter(row_number() == 1) %>%
  ungroup()

Finally, I am going to recalculate the how_many_counted_by_hour variable that depends on how_many_counted:

hamilton_cbc <- hamilton_cbc %>%
  mutate(how_many_counted_by_hour = as.double(how_many_counted) / total_hours)

Number of species counted each year

In the course of creating a plot, I believe there was a error in the total hours recorded for 1982, where the total number of hours was only 64, but there was no drop in the number of species counted that year. I think it should have actually been 164 hours, because, in 1981, there were 167 hours, and in 1983, there were 168 hours. So, in the below chunk, I’ve mutated 1982 to have 164 total hours.

# Mutating total_hours and how_many_counted_by_hour that depends on it

hamilton_cbc <- hamilton_cbc %>%
  mutate(total_hours = ifelse(year == 1982, 164, total_hours),
         how_many_counted_by_hour = as.double(how_many_counted) / total_hours)

And thank you to the Christmas Bird Count! The Christmas Bird Count Data was provided by National Audubon Society and through the generous efforts of Bird Studies Canada and countless volunteers across the western hemisphere.

Avatar
Sharleen
Statistician

Related