Web page retrieval
In this part, you will have to retrieve a web page from a given URL and return its content as a String
or return a proper error if the page retrieval fails.
Initialization
But first of all, ensure that your Rust installation is up-to-date by typing:
$ rustup update
If you get an error, it means that you have not installed Rust through rustup
. In this case, make sure that your system is up-to-date.
For this lab, you will create a new Rust program named "lab4". Create a new binary (executable) project by typing
$ cargo new --bin lab4
$ cd lab4
Everything you will do today will happen in the lab4
directory, in particular in the src/main.rs
file.
Adding a dependency
You could make HTTP requests "by hand" by opening a TCP socket to the right host and port, and by sending HTTP commands and parsing the responses. However, you can take advantage of existing libraries written in Rust that can do that already.
The library ("crate" in Rust terminology) you will want to use is called reqwest (this is not a typo). In order to use it in your program, you have to add it to Cargo.toml
in the [dependencies]
section.
Rather than adding it by hand, you can use the cargo add
command. Moreover, we want to use the "blocking" API of reqwest which is the simplest one at this stage. This "blocking" API is enabled by requesting (ah ah) the "blocking" feature or reqwest:
$ cargo add reqwest --features blocking
If you look at your Cargo.toml
, you should see something like:
[dependencies]
reqwest = { version = "0.12.8", features = ["blocking"] }
It indicates that your Rust program will use version 0.12.8 of the "reqwest" crate with the "blocking" feature enabled. 0.12.8 is the latest published version of the crate at the time this page has been written.
Adding the "reqwest" crate as a dependency means that you will be able to use its types and functions by prefixing them with reqwest::
.
⚠️ If you later receive a compilation error, you might need to install the
pkg-config
andlibssl-dev
packages (or equivalent) on your system using your package manager, so thatreqwest
can find the library it needs (namely, OpenSSL for cryptographic routines required forhttps://
handling).
Fetching a web page
Your first function will retrieve a web page from its URL using the blocking API of reqwest, whose documentation is accessible online.
Exercise 1.a: write a function with signature fn get(url: &str) -> Result<String, reqwest::Error>
which returns the content of the web page located at url
(use a code similar to the one in the documention).
Use the following main()
program to test it:
fn main() -> Result<(), reqwest::Error> { println!("{}", get("https://rfc1149.net/")?); Ok(()) }
Note how main()
can return a Result<(), E>
instead of returning nothing (which is written ()
in Rust and is the equivalent of void
in C-like languages). Either get("https://rfc1149.net/")
returns an error and it will be propagated to main()
by the ?
operator, or it returns a String
which will be displayed. At the end of the main()
program, Ok(())
ensures that ()
is returned in an Ok
.
Returning a better error
Now, try changing the URL with one returning a 404 (not found) code. You can use "https://rfc1149.net/nonexistent"
, which does not exist.
Note how your get()
function returns without an error: it returns the content of the error page. This is not a good idea: in your program, you only want to return pages which were successfully found.
However, you will not be able to return a reqwest::Error
to indicate that the page was not found, as it is not an existing error condition for reqwest::Error
. You will have to write your own Error
type with two variants for now:
#![allow(unused)] fn main() { #[derive(Debug)] enum Error { Reqwest(reqwest::Error), BadHttpResult(u16), } }
This Error
type can be a Error::Reqwest
and encapsulate a reqwest::Error
, or it can be a Error::BadHttpResult
and encapsulate the HTTP error code returned by the web server (for example 404 for "not found").
The #[derive(Debug)]
will be explained in a later class. For the time being, you just need to know that it will allow an Error
object to be displayed by the main()
program if needed. Also, you can display an Error
yourself by using the {:?}
placeholder:
#![allow(unused)] fn main() { let my_error = Error::BadHttpResult(404); println!("The error is {my_error:?}."); }
would display
The error is Error::BadHttpResult(404).
Also, in order to take advantage of the automatic conversion performed by the ?
operator, you want to implement From<reqwest::Error>
for your Error
type:
#![allow(unused)] fn main() { impl From<reqwest::Error> for Error { fn from(e: reqwest::Error) -> Error { // Encapsulate the error into a Error::Reqwest variant Error::Reqwest(e) } } }
Exercise 1.b: update your get()
function so that it checks the status code of the response before reading its text. Your get()
function will return a Result<String, Error>
, and your main()
function will return a Result<(), Error>
, in order to accomodate both error conditions.
Look at the documentation for the reqwest::blocking::get()
method: what is its return type? What method can be called on a Response
to get a StatusCode
and compare it with StatusCode::Ok
? How can you get the numerical u16
code of a StatusCode
?
Note: instead of typing qualified type names such as reqwest::StatusCode
, you can add a use reqwest::StatusCode;
at the beginning of your program: this will import StatusCode
in your namespace, and you will be able to use StatusCode
instead of the longer reqwest::StatusCode
.
Check that your new version of get()
works by ensuring that an error is displayed when trying to print the content of "https://rfc1149.net/nonexistent"
. You should see a 404 error.