Translating strings from English to French
We want to write a #[translate]
attribute macro that will translate English strings to French for string literals that represent numbers. For example, the following code:
#[translate]
fn main() {
let res = "forty two";
println!("12 + 30 = {}", res);
}
will display:
12 + 30 = quarante-deux
Only integer strings within an arbitrary range (e.g., 0..=100
) will be translated.
We will use the following two crates to implement this functionality:
english_numbers
for English numbersfrench_numbers
for French numbers
Exercise 5.a: Add these crates as dependencies to the macros
crate.
Preloading Strings
The english_numbers
crate does not provide a way to recognize an English number and retrieve its numeric value. Therefore, we will build a dictionary to store the string representation and its associated numeric value.
Exercise 5.b: Create a Translate
struct that contains a dictionary associating a string with an i64
, the type used by the english_numbers
crate.
Exercise 5.c: Create an associated function new()
that returns a Translate
object with a preloaded dictionary. We will only enable the spaces
formatting option and leave the other options disabled.
Choosing the String Replacement Technique
We could choose to use a mutable visitor to rewrite LitStr
nodes that correspond to an English number and replace them with the corresponding French term. However, this technique, which seems to work at first glance, will fail on simple tests like:
#[test]
#[translate]
fn test_translate() {
assert_eq!("trois", "three");
}
The visitor will visit the Macro
node when analyzing this function and encountering assert_eq!
. The visitor will correctly visit the path
and delimiter
fields, but it will not visit the tokens
field (available as a proc_macro2::TokenStream
), which is the content of the macro, as it may not be valid Rust code at this stage.
Therefore, we need to also intercept the visit of Macro
nodes to replace the literal tokens we are interested in. Since our procedural macro already works with TokenStream
, why not directly implement this solution? We don't need a visitor.
Transforming the Token Stream
Exercise 5.d: Write a method that substitutes the tokens corresponding to a string literal representing an English number in our dictionary with the corresponding French number. Be sure to recursively call this method when encountering a delimited group of tokens.
impl Translate {
fn substitute_tokens(stream: proc_macro2::TokenStream) -> proc_macro2::TokenStream {
todo!()
}
}
Note that the literal representation we have access to is the one in the source code, enclosed in double quotes (we can ignore string literals using other delimiters like r#""#
). Instead of removing these quotes, it may be easier to add them to the dictionary for direct comparison.
Exercise 5.e: Write a procedural macro #[translate]
that constructs a Translate
object and uses it to transform the TokenStream
. Remember that conversions with From
and Into
are implemented between proc_macro::TokenStream
(at the macro interface) and proc_macro2::TokenStream
(used inside the macro).
Exercise 5.f: Write tests for your macro. It may be useful to define a str!(a, b)
macro with macro_rules!
that dynamically constructs a string from a
and b
, without having the ab
string appearing in the source code:
// Check that out-of-range (1..=100) values are not translated
assert_eq!(str!("one h", "undred and one"), "one hundred and one");
Determining the Positive or Zero Bounds
We want to optionally specify the bounds for the numbers to be translated using an attribute. The following notations should be accepted:
#[translate] fn f() { ... } // Default bounds (0..=100)
#[translate(0..10)] fn f() { ... }
#[translate(0..=10)] fn f() { ... }
However, we want to reject incorrect constructions with clear error messages:
error: unexpected end of input, expected `..=` or `..`
--> tests/ui/translate.rs:3:1
|
3 | #[translate(10)]
| ^^^^^^^^^^^^^^^^
|
= note: this error originates in the attribute macro `translate` (in Nightly builds, run with -Z macro-backtrace for more info)
error: expected integer literal
--> tests/ui/translate.rs:6:13
|
6 | #[translate(..10)]
| ^^
error: unexpected end of input, expected integer literal
--> tests/ui/translate.rs:9:1
|
9 | #[translate(10..)]
| ^^^^^^^^^^^^^^^^^^
|
= note: this error originates in the attribute macro `translate` (in Nightly builds, run with -Z macro-backtrace for more info)
error: expected integer literal
--> tests/ui/translate.rs:12:13
|
12 | #[translate(x)]
| ^
To achieve this, we will build a structure on which we can implement the syn::parse::Parse
trait:
struct Bounds { low: i64, high: i64 }
Exercise 5.g: Implement the Parse
trait on Bounds
. You have to read an integer with type LitInt
(syn
handles the unary minus sign), look for one of ..=
and ..
, read the higher bound and build the Bounds
object. You might want to use Lookahead1 to make things easier.
Exercise 5.h: Add specific tests to check that you can read the various intervals. To avoid exporting private types, you may add the tests in a submodule which is defined only in testing mode:
#[cfg(test)]
mod tests {
…
}
You can parse strings with parser T
using syn::parse_str::<T>(s)
, this might be handy in your tests.
Exercise 5.i: Update the translate
macro so that it reads the bounds from its attribute if it is not empty, and initialize the Translate
object appropriately.
Exercise 5.j: Add tests. For example, this test must pass.
#[test]
#[translate(-10..=10)]
fn test_negative_bounds() {
assert_eq!("moins dix", "negative ten");
assert_eq!("dix", "ten");
assert_eq!(str!("neg", "ative eleven"), "negative eleven");
assert_eq!(str!("ele", "ven"), "eleven");
}
Conclusion
We have seen that several methods might be combined to implement a macro. Here, we wrote a dedicated parser to read bounds, and also worked with the token stream directly.