Parsing input data
It is now time to try and extract frames. In order to later be able to handle frames in parallel as much as possible, it is necessary to parse the input in two steps:
- Do the minimum work needed to extract a frame from the input (parsing phase).
- Later, as frames have been separated, do the rest of the work to decode the frame compressed content (decoding phase).
As zstd compressed input does not contain a table of contents indicating where each frame starts, parsing a frame implies having already parsed all frames coming beforehand.
Forward byte parser
The first parser you need is a forward byte parser, which can deliver bytes from the input in order. The parser must also remember not to deliver bytes it has delivered already. To do this, a slice of bytes is enough as it references a piece of memory.
✅ Create a
parsing
module (in fileparsing.rs
) and create aForwardByteParser
type in it.⚠️ Do not forget to reference this module in
lib.rs
as a module with public visibility.
Since a slice does not own its content, it must be accompanied by a lifetime parameter, for example:
pub struct ForwardByteParser<'a>(&'a [u8]);
Initializing a forward byte parser from an existing slice is straightforward:
impl<'a> ForwardByteParser<'a> {
pub fn new(data: &'a [u8]) -> Self {
Self(data)
}
}
Consuming a byte from the input
Consuming a byte from the input implies returning the first byte, if it exists (the input may be empty) and storing a slice with the first byte removed:
impl<'a> ForwardByteParser<'a> {
pub fn u8(&mut self) -> Option<u8> {
let (first, rest) = self.0.split_first()?;
self.0 = rest;
Some(*first)
}
}
While returning an Option<u8>
seems handy, it will not be very useful: when you need a byte from the input, not obtaining it should be an error that can be propagated further.
✅ Create an
Error
type and aResult
alias in theparsing
module as described in the general principles. Modify youru8()
method so that it returns an error if the input is empty.
This error alternative (for example NotEnoughBytes { requested: usize, available: usize }
) must be generic and will be reused in other methods. When displayed, the error should print: not enough bytes: 1 requested out of 0 available.
Writing a unit test
Does the parser work as expected? A unit test should be written along with the code.
✅ Create a
tests/parsing.rs
file in your repository and include the following tests.
use net7212::parsing::{self, ForwardByteParser};
#[test]
fn forward_byte_parser_u8() {
// Check that bytes are delivered in order
let mut parser = ForwardByteParser::new(&[0x12, 0x23, 0x34]);
assert_eq!(0x12, parser.u8().unwrap());
assert_eq!(0x23, parser.u8().unwrap());
assert_eq!(0x34, parser.u8().unwrap());
assert!(matches!(
parser.u8(),
Err(parsing::Error::NotEnoughBytes {
requested: 1,
available: 0,
})
));
}
Running the tests with cargo test
should yield a success.
💡 Notice that we use
matches!()
instead ofassert_eq!()
to compare the error, as theparsing::Error
type does not implementPartialEq
(and does not need to).
Adding more methods
Parsing a frame will require reading a 4-byte unsigned integer (u32
) in little-endian format, or extracting an arbitrary number of bytes. You can add utility functions now to your parser, such as (more will be needed later):
impl<'a> ForwardByteParser<'a> {
/// Return the number of bytes still unparsed
pub fn len(&self) -> usize { todo!() }
/// Check if the input is exhausted
pub fn is_empty(&self) -> bool { todo!() }
/// Extract `len` bytes as a slice
pub fn slice(&mut self, len: usize) -> Result<&'a [u8]> { todo!() }
/// Consume and return a u32 in little-endian format
pub fn le_u32(&mut self) -> Result<u32> { todo!() }
}
Tests must also be added for those methods to ensure they act as expected. Those tests will also act as non-regression tests, allowing us to later modify the body of those methods without fear.