Web Scraping with R

05 March 2020, PwC Waterfall City, 4 Lisbon Lane, Jukskei View, Midrand, 2090

Objectives

The contemporary Internet Viking uses web scraping techniques to systematically extract information from web pages. This workshop will demonstrate the process of web scraping. Here’s the battle plan:

  • Sharpening the Axe: Understanding the structure of an HTML document.
  • Preparing the Longships: Using the DOM to select HTML elements.
  • Doing Battle: Using {rvest} to extract data from an HTML document.
  • Stashing the Treasure: Storing data as CSV or JSON.
  • Triumphant Return: Handling dynamic content using {RSelenium}.

The first two topics will be fairly brief, covering this material at a high level. We’ll dig much deeper into the latter topics.

Outcomes

By the end of the workshop you will be able to easily (and confidently) scrape large swathes of the internet. We’re considerate Vikings, so you’ll also learn how to do this ethically and mindfully.

Who should attend?

This tutorial will be suitable for Vikings with low to moderate levels of R experience.

Requirements

Participants are assumed to have prior exposure to R, or at least to programming of some variety. Some familiarity with HTML and CSS will be an advantage but not mandatory. We’ll use RStudio Cloud to ensure that everybody has the same infrastructure and (hopefully) avoid most technical issues.

Interactive course material

Our training emphasises practical skills. So, although you’ll be learning concepts and theory, you’ll see how everything is applied in the real world as we work through examples and exercises based on real datasets.

We like questions!

Having a firm understanding of the course content will result in you being able to confidently apply your new skills. So, if at any point you’re unsure of something, just ask!

Purchase a ticket

Contact us at training@exegetic.biz if you have any questions.