Robot

Simple Web Scraper

Opens a web page and stores some content.

Robocorp

The example task opens a web page, scrapes the web page for intended values and stores those values into a file. The output will be stored in the "output" directory.

When run, the task will:

  • Open a real web browser
  • Navigate to "https://finance.yahoo.com/crypto"
  • Brush off the Accept these Cookies pop-out.
  • Detect if the targeted information is available - the Cryptocurrencies table
  • Collect the Top 10 cryptocurrencies
  • Pretty print them to the screen
  • Save the collected data to a CSV file

The Preparation

The conda.yaml file found in this project will contain the necessary python packages. You'll find appropriate explanations or links.

The robot.yaml file found in this project will contain meta information that will configure the execution of the necessary tasks.

The Main Task

The main robot file (tasks.py) contains the task: web_scraper_top_10_crypto your robot is going to complete when run. This is indicated by the @task decorator that is imported from the robocorp.tasks library.

The Python script doesn't need anything else to be executed with the robocorp library & CLI facilitators. You can easily execute the @task using the Robocorp Code VSCode Extension

Find comments and helpful insights in the code itself.

Note on creating locators: Simply put, a locator is an object constructed after a selector is executed. The selector value is usually pretty difficult to construct. As such, you can use the Robocorp Web Inspector from the Robocorp Code VSCode Extension to build valid selectors with ease.

When the @task finishes executing, the screen output will look similar to this: ```

############################################

Top 10 Cryptocurrencies:

############################################

1 | Bitcoin USD | $ 42,383.66

2 | Ethereum USD | $ 2,208.85

3 | Tether USDt USD | $ 1.0007

4 | BNB USD | $ 304.37

5 | Solana USD | $ 96.09

6 | XRP USD | $ 0.550529

7 | USD Coin USD | $ 1.0002

8 | Lido Staked ETH USD | $ 2,206.01

9 | Cardano USD | $ 0.534951

10 | Avalanche USD | $ 35.68

############################################

```

You can take a look inside the output folder and find different files related to the execution. The output folder will contain log files, journals, but also the CSV output of the @task: top-10-cryptos-(today's date).csv

Note: It is important to know that the Control Room Work Items will & should be part of the output folder as well.

There you can find the log.html file that you might find interesting. It will provide you with keen insights into the @task execution in a detailed and nicely formatted log.

Note: Observe how the print function is called and displayed in the log.html format.

Summary

You executed a web scraper task, congratulations!

  • How to set up a project and its dependencies
  • How to define a task
  • Use the browser library provided by robocorp
  • Navigate to a new web page & wait for the load state
  • Resolve intermediate steps before getting to the target content
  • Ignore issues if elements are non existent
  • Detect valid selectors by using the Robocorp Web Inspector from Robocorp Code VSCode Extension
  • Wait and assert if the elements actually exist on the web page
  • Scrape the values of the targeted elements
  • Pretty print them to the screen output
  • Save all to a CSV file

Technical information

Last updated

4 January 2024

License

Apache License 2.0

Dependencies