Web Data Collection
The web is the largest source of data that can be accessed today. Web Data Retrieval shows how such data can be extracted and processed into structured information.
Regular Expressions (RegEx): Students can use RegEx to search for and extract relevant information from unstructured text.
XPath/JSONPath: Students can use XPath and JSONPath to search for and extract relevant information from XML and JSON structured formats.
Web API: Students understand the concepts behind REST and GraphQL. Students will be able to programmatically access REST APIs, query data, and extract relevant elements. Students will be able to provide a REST API for their own application.
Web Crawling/Web Scraping:
Web Scraping: Students understand the structure of an HTML page and can programmatically select elements from it and search for the text passages relevant to them and extract their content.
Web Crawling: Students are able to programmatically crawl a web domain.
Webfeed: Students know how web feeds work technically and can programmatically examine them for relevant information.