Top
Skip to Content
LOGO(small) - Queen's University Belfast
  • Our linkedin
  • Our instagram
  • Our facebook
  • Our youtube
LOGO(large) - Queen's University Belfast
Queen's Business School
  • Home
  • About
    • Mission, Vision, Values
    • Accreditation and Reputation
    • Ethics, Responsibility and Sustainability
    • Diversity, Equity, Inclusion and Belonging
    • QBS Student Hub
    • Why Choose Queen's Business School?
    • Contact Us
  • Study
    • Undergraduate Study
    • Postgraduate Taught Study
    • Postgraduate Research Study
    • Master of Business Administration (MBA)
  • Student Opportunities
    • QBS Connect
    • Secure a Placement
    • Year in Enterprise
    • Highered
    • Data Duo Mentoring for Students
    • Business and Human Rights Student Ambassador Programme
    • Future Ready Award
    • Join A Society
    • FinTrU Trading Room
    • Student Managed Fund
    • Student Experiences Blog
    • 2024 Prize Giving
  • Research
    • Academic Departments
    • Research Centres and Initiatives
    • Research Environment
    • Research Impact
    • Research Staff
    • Find a PhD Supervisor
    • Publications
    • Seminars
    • Working Papers
    • REF 2021
  • Business Hub
    • Business Clinic
    • QBS Analytics Lab
    • Data Duo Mentoring for Professionals
    • Meeting of Minds
    • Recruit Student Talent
    • Recruit a Placement Student
    • Give Back to QBS
    • Innovative Ideas
    • Executive Education
    • Good Business Podcast
    • Queen's MBA
    • Case studies
    • Contact Business Hub
  • International
    • Come to Queen's
    • Go Abroad
    • Visiting Scholars
    • Global Partnerships
    • International Student Testimonials
    • Offer Guide for International Students
    • Prodigy Finance Loans
  • People
    • Key Contacts
    • Academic Staff
    • Professional Services Staff
    • Research Students
    • School Advisory Board
    • Honorary Staff
    • Alumni
  • Executive Education
  • Events
    • Events Gallery
  • News
  • Home
  • About
    • Mission, Vision, Values
    • Accreditation and Reputation
    • Ethics, Responsibility and Sustainability
    • Diversity, Equity, Inclusion and Belonging
    • QBS Student Hub
    • Why Choose Queen's Business School?
    • Contact Us
  • Study
    • Undergraduate Study
    • Postgraduate Taught Study
    • Postgraduate Research Study
    • Master of Business Administration (MBA)
  • Student Opportunities
    • QBS Connect
    • Secure a Placement
    • Year in Enterprise
    • Highered
    • Data Duo Mentoring for Students
    • Business and Human Rights Student Ambassador Programme
    • Future Ready Award
    • Join A Society
    • FinTrU Trading Room
    • Student Managed Fund
    • Student Experiences Blog
    • 2024 Prize Giving
  • Research
    • Academic Departments
    • Research Centres and Initiatives
    • Research Environment
    • Research Impact
    • Research Staff
    • Find a PhD Supervisor
    • Publications
    • Seminars
    • Working Papers
    • REF 2021
  • Business Hub
    • Business Clinic
    • QBS Analytics Lab
    • Data Duo Mentoring for Professionals
    • Meeting of Minds
    • Recruit Student Talent
    • Recruit a Placement Student
    • Give Back to QBS
    • Innovative Ideas
    • Executive Education
    • Good Business Podcast
    • Queen's MBA
    • Case studies
    • Contact Business Hub
  • International
    • Come to Queen's
    • Go Abroad
    • Visiting Scholars
    • Global Partnerships
    • International Student Testimonials
    • Offer Guide for International Students
    • Prodigy Finance Loans
  • People
    • Key Contacts
    • Academic Staff
    • Professional Services Staff
    • Research Students
    • School Advisory Board
    • Honorary Staff
    • Alumni
  • Executive Education
  • Events
    • Events Gallery
  • News
  • Our linkedin
  • Our instagram
  • Our facebook
  • Our youtube
In This Section
  • PRME
  • Good Business Podcast
  • ERS Blog
  • Business and Human Rights Student Ambassador Programme

  • Home
  • Queen's Business School
  • About
  • Ethics, Responsibility and Sustainability
  • ERS Blog

ERS Blog

Responsible Web Scraping

Dr Alan Hanna discusses the ethical considerations behind the new module ‘Python for Finance’ on our BSc Finance programme.

The word ‘hack’ and its derivatives have been partially rehabilitated in recent years. A ‘life hack’ is considered to a useful shortcut to boost efficiency or wellbeing. ‘Hackathons’ are collaborative events that bring developer communities together to learn, share, and create solutions. This is a departure from the more sinister use of the word ‘hacker’ to describe a cybercriminal or its more derogatory use for a less-than-professional developer (akin to its use in golf). Like most skills, coding can be used for good or ill.

At Queen’s Management School, we have recently added a new module ‘Python for Finance’ to our BSc Finance programme. Since most of the students are new to coding, the module can be viewed as an introduction to the programming language itself and to the universe of possibilities that it opens. Part of the appeal of python is that it allows novice developers to accomplish significant results with just a few lines of relatively simply code by building on an ever-expanding collection of freely available packages.

With these new-found skills, also come a potential minefield of ethical issues and professional responsibilities in areas such as privacy and data-driven decision making. These are issues that can all too easily be overlooked and for which students need some guidance. An excellent starting point is the Association for Computing Machinery’s (ACM) Code of Ethics which reminds computing professionals to ‘act responsibly’ and to ‘reflect upon the wider impacts of their work, consistently supporting the public good’.

Web scraping

Extracting data from the internet, particularly in an automated fashion, is referred to as web scraping. As an industry, finance has an insatiable appetite for information and using free-to-download libraries like Beautiful Soup, Selenium, and Scrapy, one can easily write a few lines of code to acquire data. The benefits are clear and immediate: through automation one can replace tedious manual processes and increase the speed and volume of data acquisition to realise huge productivity gains.

Such an approach though is not always welcomed by the organisation behind the website. Recognising the value of their content, some websites try to make it difficult for wholesale harvesting of data and automated website access. For this, we all pay the small cost of trying to prove from time to time that we are in fact ‘not a robot’ via the CAPTCHA system. Blade Runner-style tests aside, other techniques include limiting the number of search results per page, restricting the frequency (throttling) of requests from a single IP address, and generating content in non-HTML format. Thus, while it may now be technically possible to extract data from a website, one should always pause to ask if it is legitimate to do so.

To begin with, one should consult the website for terms of service. This can clarify, for example, if personal, educational, or commercial usage is permitted. If in doubt, consider reaching out to the company to check. Most domains also include a robots.txt file (see for example https://www.yahoo.com/robots.txt). While primarily aimed at search engine crawlers, this can indicate parts of a website where automated requests are unwelcome.

Some websites are happy to share their data to the point of facilitating information requests via an application programming interface (API), often with accompanying documentation and sample code. These define protocols for requesting data (or performing other operations) and allow companies to better marshal such requests. Where available, these should be the preferred mode of access.

A responsible web scraper can choose to share additional information via the user-agent request header. This allows servers to check the application (normally a browser), version, and operating system from where requests have been made. Some automated requests can be distinguished (and blocked) in this way. To promote transparency, the header can be customised to provide additional information (such as a contact email address) that would allow the domain owner to understand or query unexpected usage.

A further consideration is fair usage of a shared resource. While a single user running a single process is unlikely to overload a server, sending too many requests could impact the service available to others. Taken to an extreme, this could result in a situation similar to a denial-of-service (DoS) attack, rendering the website unavailable to users. A simple solution is to slow the speed of requests by adding sleep commands to periodically pause the execution of code.

Post-scraping, the developer is also faced with responsible storage, processing, and interpretation of the data. One should also consider how the data itself will be used. For example, if content is subject to copyright, can it be reshared in raw or derived formats, what attribution is required, and is commercial exploitation permitted.

As the infamous web-crawler Peter Parker was reminded, with great power comes great responsibility. This is true not only of our students with their new-found coding skills, but also for those who impart such knowledge.

Photo: Dr Alan Hanna
Dr Alan Hanna
Senior Lecturer in Finance
View Profile
Share
Latest News
Ethics, Responsibility and Sustainability
  • Ethics, Responsibility and Sustainability
  • PRME
  • Good Business Podcast
  • ERS Blog
  • Business and Human Rights Student Ambassador Programme
QUB Logo
Contact Us

Queen's Business School
185 Stranmillis Road
Belfast
Northern Ireland
BT9 5EE

GET DIRECTIONS

Contact details

Quick Links

  • Home
  • Study
  • Research
  • Social media

 

 

 

 

 

 

© Queen's University Belfast 2024
  • Privacy and cookies
  • Website accessibility
  • Freedom of information
  • Modern slavery statement
  • Equality, Diversity and Inclusion
  • University Policies and Procedures
Information
  • Privacy and cookies
  • Website accessibility
  • Freedom of information
  • Modern slavery statement
  • Equality, Diversity and Inclusion
  • University Policies and Procedures

© Queen's University Belfast 2024

Manage cookies