Simple Web Scraper
Opens a web page and stores some content.
The example task opens a web page, scrapes the web page for intended values and stores those values into a file. The output will be stored in the "output" directory.
When run, the task will:
- Open a real web browser
- Navigate to
"https://finance.yahoo.com/crypto"
- Brush off the Accept these Cookies pop-out.
- Detect if the targeted information is available - the Cryptocurrencies table
- Collect the Top 10 cryptocurrencies
- Pretty print them to the screen
- Save the collected data to a CSV file
The Preparation
The conda.yaml
file found in this project will contain the necessary python packages.
You'll find appropriate explanations or links.
The robot.yaml
file found in this project will contain meta information that will configure the execution of the necessary tasks.
The Main Task
The main robot file (tasks.py
) contains the task: web_scraper_top_10_crypto
your robot is going to complete when run.
This is indicated by the @task
decorator that is imported from the robocorp.tasks
library.
The Python script doesn't need anything else to be executed with the robocorp
library & CLI facilitators.
You can easily execute the @task
using the Robocorp Code VSCode Extension
Find comments and helpful insights in the code itself.
Note on creating locators: Simply put, a locator is an object constructed after a selector is executed. The selector value is usually pretty difficult to construct. As such, you can use the
Robocorp Web Inspector
from the Robocorp Code VSCode Extension to build valid selectors with ease.
When the @task
finishes executing, the screen output will look similar to this:
```
############################################
Top 10 Cryptocurrencies:
############################################
1 | Bitcoin USD | $ 42,383.66
2 | Ethereum USD | $ 2,208.85
3 | Tether USDt USD | $ 1.0007
4 | BNB USD | $ 304.37
5 | Solana USD | $ 96.09
6 | XRP USD | $ 0.550529
7 | USD Coin USD | $ 1.0002
8 | Lido Staked ETH USD | $ 2,206.01
9 | Cardano USD | $ 0.534951
10 | Avalanche USD | $ 35.68
############################################
```
You can take a look inside the output
folder and find different files related to the execution.
The output
folder will contain log files, journals, but also the CSV output of the @task
: top-10-cryptos-(today's date).csv
Note: It is important to know that the Control Room Work Items will & should be part of the
output
folder as well.
There you can find the log.html
file that you might find interesting.
It will provide you with keen insights into the @task
execution in a detailed and nicely formatted log.
Note: Observe how the
log.html
format.
Summary
You executed a web scraper task, congratulations!
- How to set up a project and its dependencies
- How to define a task
- Use the
browser
library provided byrobocorp
- Navigate to a new web page & wait for the load state
- Resolve intermediate steps before getting to the target content
- Ignore issues if elements are non existent
- Detect valid selectors by using the
Robocorp Web Inspector
from Robocorp Code VSCode Extension - Wait and assert if the elements actually exist on the web page
- Scrape the values of the targeted elements
- Pretty print them to the screen output
- Save all to a CSV file
Technical information
Last updated
4 January 2024License
Apache License 2.0Dependencies