This week we were taught about the process of web scraping and a basic way of how to do this. Web scraping is the process of extracting data from sites by reading parts of the HTML code and extracting the relevant data. This is used if researchers etc want to study the kind of content contained on websites in this way, but it exists in such high quantities. By using a web scraper, this information can be collected much quicker.
We downloaded a web scraping bot and accessed a streaming site such as BBC iplayer which obviously has lots of programmes listed on it, and therefore, high quantities of listable data that we could scrape. We used the web scraper to select a piece of the website’s code, such as the episode titles, and present them to us in a list.
At first, I found this a bit difficult as some of the websites have a lot of code to inspect, so located the series title or episode title took a bit of trial and error. Once I located this and inputted it into the web scraper, I was able to receive the correct results I was after.
I understand how, in a specific research situation, this would be a useful tool to process a lot of data. This task also was insightful to me as I was continuing to build my understanding of HTML and websites. Seeing what lines of the code corresponded to which part of the website was interesting to me and helped me understand the structure.