So You Want to Use Occurrence Data? Presenting the R Package Beebdc

Dr James Dorey1,2, Erica Fischer3, Dr Michael Orr4, Prof Laura Melissa Guzman5, Prof John Ascher6, Prof Alice Hughes7, Dr Neil Cobb8

1University Of Wollongong, Wollongong, Australia, 2Flinders University, Adelaide, Australia, 3King's College, London, UK, 4Staatliches Museum für Naturkunde, Stuttgart, Germany, 5University of Southern California, Los Angeles, USA, 6Singapore National University, Singapore, Singapore, 7University of Hong Kong, Hong Kong, China, 8Biodiversity Outreach Network, Flagstaff, USA

Biography:

I'm an evolutionary biologist that mostly researches wild bees in Australia, Fiji, and on a global scale. I have a particular interest in the drivers behind diversity and how those same drivers might threaten diversity. I am very interested in a diversity of topics ranging from macroevolution, macroecology, conservation, systematics, and organismal biology. I try to answer questions relating to these topics using diverse methods such as R-coding, phylogenetics, geographical information systems (GIS), field work, morphometrics, statistics, and whatever else I can use to learn about the natural world. I prefer to use an integrative approach and falsify or support hypotheses using diverse methods.

Abstract:

As an evolutionary biologist, the questions that I ask are varied and often interdisciplinary. No matter the question, I find myself answering some aspects of it with species occurrence datasets. At the very least, species occurrence data provides important context. But this is where the problems start and you may find yourself asking some critical questions, such as: Do I only need to download data from one place? Are my data really fit for purpose? Is this datum really unique? This can be the point that some researchers may give up or decide to ignore the many issues with occurrence datasets. To remedy this, my collaborators and I have created BeeBDC, a new R package for any taxon, and a global bee occurrence dataset. In our example, we combined >18.3 million bee occurrence records from multiple public repositories (GBIF, SCAN, iDigBio, USGS, ALA) and smaller private datasets, then standardised, flagged, deduplicated, and cleaned the data using the reproducible BeeBDC R-workflow. By publishing reproducible R workflows and globally cleaned datasets, we aim to increase the accessibility and reliability of downstream analyses. Our workflow can be implemented for any taxon to support research and conservation.