Chapter 11 Web Scraping Practice Questions

  1. Briefly describe the differences between the webbrowser, requests, BeautifulSoup, and selenium modules.

webbrowser: Comes with Python and opens a browser to a specific page. Requests: Downloads files and web pages from the Internet. BeautifulSoup: Parses HTML, the format that web pages are written in. Selenium: Launches and controls a web browser. Selenium is able to fill in forms and simulate mouse clicks in this browser.

  1. What type of object is returned by requests.get()? How can you access the downloaded content as a string value?

The type of object that returned by requests.get() is a Response object.

import requests
res = requests.get('https://painset.com')
type(res)

This will get result: .

If the request succeeded, the downloaded web page is stored as a string in the Response object's text variable. Such as:

res.text

  1. What Requests method checks that the download worked?

A ways to check for success is to call the raise_for_status() method on the Response object. Such as:

res.raise_for_status()

  1. How can you get the HTTP status code of a Requests response?

By res.status_code. if it is equal to the value of requests.codes.ok, then everything went fine.

  1. How do you save a Requests response to a file?

To write the web page to a file, you can use a for loop with the Response object's iter_content() method.

import requests
res = requests.get('https://.../...txt')
res.raise_for_status()
playFile = open('RomeoAndJuliet.txt', 'wb')
for chunk in res.iter_content(100000):
  playFile.write(chunk)
playFile.close()
  1. What is the keyboard shortcut for opening a browser’s developer tools?

Chrome and IE: Command + Option + I in OS X, or F12 for Windows.

  1. How can you view (in the developer tools) the HTML of a specific element on a web page?

Right-click the specific element on the page and select Inspect Element from the context menu that appears.

  1. What is the CSS selector string that would find the element with an id attribute of main?

'#main'

  1. What is the CSS selector string that would find the elements with a CSS class of highlight?

'.highlight'

  1. What is the CSS selector string that would find all the
    elements inside another
    element?

'div div'

  1. What is the CSS selector string that would find the

'button[value="favorite"]'

  1. Say you have a Beautiful Soup Tag object stored in the variable spam for the element
    Hello world!
    . How could you get a string 'Hello world!' from the Tag object?

spam.getText()

  1. How would you store all the attributes of a Beautiful Soup Tag object in a variable named linkElem?

linkElem.attrs

  1. Running import selenium doesn’t work. How do you properly import the selenium module?

from selenium import webdriver

  1. What’s the difference between the findelement and findelements methods?

The findelement* methods return a single WebElement object, representing the first element on the page that matches your query.

The findelements methods return a list of WebElement _\ objects for every matching element on the page.

  1. What methods do Selenium’s WebElement objects have for simulating mouse clicks and keyboard keys?

click() method and send_keys()

  1. You could call send_keys(Keys.ENTER) on the Submit button’s WebElement object, but what is an easier way to submit a form with Selenium?

Calling the submit() method on any element will have the same result as clicking the Submit button for the form that element is in.

  1. How can you simulate clicking a browser’s Forward, Back, and Refresh buttons with Selenium?

browser.forward() browser.back() browser.refresh()

results matching ""

    No results matching ""