Getting big
Now that our program looks fine, we want to go bigger and consider larger texts.
Exercise 1.a: Download the novel Moby-Dick; or The Whale, by Herman Melville, and place it in the same directory as your Cargo.toml
.
This novel, which totals 22316 lines, will be more interesting than our hand-crafted two lines.
Reading the file
Exercise 1.b: Since this lab is not about how to read files, copy the following use
statements and function into your program (the easiest exercise ever):
use std::fs::File;
use std::io::{self, BufRead};
fn load_file(name: &str) -> Result<Vec<String>, io::Error> {
io::BufReader::new(File::open(name)?).lines().collect()
}
Have you noticed that load_file()
returns a Vec<String>
? This will not be convertible to a &[&str]
that we need to count characters, so we will need some adapting.
Adapting count_chars()
We want to adapt count_chars()
so that it accepts a slice of &str
, as it did before, but also a slice of String
. In fact, we would like to accept a slice of any type which can be easily seen as a &str
.
The trait AsRef<T>
means exactly that: when implemented on a type U
, it means that without doing any extra copy, an object of type U
can be viewed in memory as an object of type &T
. For example, String
implements AsRef<str>
: calling .as_ref()
on a String
will return a &str
pointing to data owned by the String
.
Also, every type T
implements AsRef<T>
, as seeing a T
as a &T
is trivial.
Exercise 1.c: Change the signature of count_chars()
to the following one, accepting a slice of any type that can be seen as a &str
. Also, use .as_ref()
on the provided data (in the inner loop) to convert the real type S
to a &str
.
fn count_chars<S: AsRef<str>>(input: &[S]) -> HashMap<char, usize>
As soon as you have done that, you are able to pass either a &[&str]
or a &[String]
to count_chars()
, and of course a &Vec<String>
thanks to Deref
which allows a reference to a vector to be seen as a slice.
Exercise 1.d: Change the main()
function signature so that it returns Result<(), io::Error>
, and make it load and analyze the character frequency of Moby Dick.
Have you noticed that it takes more time than when using our two lines? Let's parallelize this!