Web Scraping Workshop
A web scraping workshop was organized from October 2020 to January 2021 for the BSc students. The workshop was conducted by Mr Rahul Gupta.The focus was on platforms like Anaconda, Python and Jupyter and how to get students accustomed to these platforms. It started with skills in python, namely, data types, and common operations on various data types. Moving forward, students were then introduced to the basics of HTML and CSS, the basic tags and their relevance. The learning process also included inspecting pages for what data to be extracted and rolling them down into the exact thing to be extracted. To get the students more acquainted with the interface, a demonstration of live extraction of election results from the ECI website was also shown. They also learnt to make the best use of Github and other online sources to build understanding and know about the basics of environments, how to change environments, and how to optimise in terms of python packages needed for the task using a designated environment. Furthermore, they were taught to execute codes which were followed by an inquisitive guided project. The project focused on an introduction to the basics of :
- Beautiful soup
- Requests
- Selenium
- Pandas
and to
- Get web pages
- spot the data
- navigate the web page
- extract the data
- write it in an excel file
Towards the end, students chose a particular site to scrape and were guided through how they could go about it and they had to build their scraper.