Chapter 2: Tags and Character Classes
The simplest useful parser you can write is one which has no special characters, it just matches a string.
In nom, we call a simple collection of bytes a tag. Because
these are so common, there already exists a function called tag().
This function returns a parser for a given string.
Warning: nom has multiple different definitions of tag, make sure you use this one for the
moment!
extern crate nom;
pub use nom::bytes::complete::tag;
For example, code to parse the string "abc" could be represented as tag("abc").
If you have not programmed in a language where functions are values, the type signature of them tag function might be a surprise:
pub fn tag<T, Input, Error: ParseError<Input>>(
tag: T
) -> impl Fn(Input) -> IResult<Input, Input, Error> where
Input: InputTake + Compare<T>,
T: InputLength + Clone,
Or, for the case where Input and T are both &str, and simplifying slightly:
fn tag(tag: &str) -> (impl Fn(&str) -> IResult<&str, Error>)
In other words, this function tag returns a function. The function it returns is a
parser, taking a &str and returning an IResult. Functions creating parsers and
returning them is a common pattern in Nom, so it is useful to call out.
Below, we have implemented a function that uses tag.
extern crate nom; pub use nom::bytes::complete::tag; pub use nom::IResult; use std::error::Error; fn parse_input(input: &str) -> IResult<&str, &str> { // note that this is really creating a function, the parser for abc // vvvvv // which is then called here, returning an IResult<&str, &str> // vvvvv tag("abc")(input) } fn main() -> Result<(), Box<dyn Error>> { let (leftover_input, output) = parse_input("abcWorld")?; assert_eq!(leftover_input, "World"); assert_eq!(output, "abc"); assert!(parse_input("defWorld").is_err()); Ok(()) }
If you'd like to, you can also check tags without case-sensitivity
with the tag_no_case function.
Character Classes
Tags are incredibly useful, but they are also incredibly restrictive. The other end of Nom's functionality is pre-written parsers that allow us to accept any of a group of characters, rather than just accepting characters in a defined sequence.
Here is a selection of them:
alpha0: Recognizes zero or more lowercase and uppercase alphabetic characters:/[a-zA-Z]/.alpha1does the same but returns at least one characteralphanumeric0: Recognizes zero or more numerical and alphabetic characters:/[0-9a-zA-Z]/.alphanumeric1does the same but returns at least one characterdigit0: Recognizes zero or more numerical characters:/[0-9]/.digit1does the same but returns at least one charactermultispace0: Recognizes zero or more spaces, tabs, carriage returns and line feeds.multispace1does the same but returns at least one characterspace0: Recognizes zero or more spaces and tabs.space1does the same but returns at least one characterline_ending: Recognizes an end of line (both\nand\r\n)newline: Matches a newline character\ntab: Matches a tab character\t
We can use these in
extern crate nom; pub use nom::IResult; use std::error::Error; pub use nom::character::complete::alpha0; fn parser(input: &str) -> IResult<&str, &str> { alpha0(input) } fn main() -> Result<(), Box<dyn Error>> { let (remaining, letters) = parser("abc123")?; assert_eq!(remaining, "123"); assert_eq!(letters, "abc"); Ok(()) }
One important note is that, due to the type signature of these functions,
it is generally best to use them within a function that returns an IResult.
If you don't, some of the information around the type of the tag function must be
manually specified, which can lead to verbose code or confusing errors.