Chapter 2: Tags and Character Classes
The simplest useful parser you can write is one which has no special characters, it just matches a string.
In nom
, we call a simple collection of bytes a tag. Because
these are so common, there already exists a function called tag()
.
This function returns a parser for a given string.
Warning: nom
has multiple different definitions of tag
, make sure you use this one for the
moment!
extern crate nom;
pub use nom::bytes::complete::tag;
For example, code to parse the string "abc"
could be represented as tag("abc")
.
If you have not programmed in a language where functions are values, the type signature of them tag function might be a surprise:
pub fn tag<T, Input, Error: ParseError<Input>>(
tag: T
) -> impl Fn(Input) -> IResult<Input, Input, Error> where
Input: InputTake + Compare<T>,
T: InputLength + Clone,
Or, for the case where Input
and T
are both &str
, and simplifying slightly:
fn tag(tag: &str) -> (impl Fn(&str) -> IResult<&str, Error>)
In other words, this function tag
returns a function. The function it returns is a
parser, taking a &str
and returning an IResult
. Functions creating parsers and
returning them is a common pattern in Nom, so it is useful to call out.
Below, we have implemented a function that uses tag
.
extern crate nom; pub use nom::bytes::complete::tag; pub use nom::IResult; use std::error::Error; fn parse_input(input: &str) -> IResult<&str, &str> { // note that this is really creating a function, the parser for abc // vvvvv // which is then called here, returning an IResult<&str, &str> // vvvvv tag("abc")(input) } fn main() -> Result<(), Box<dyn Error>> { let (leftover_input, output) = parse_input("abcWorld")?; assert_eq!(leftover_input, "World"); assert_eq!(output, "abc"); assert!(parse_input("defWorld").is_err()); Ok(()) }
If you'd like to, you can also check tags without case-sensitivity
with the tag_no_case
function.
Character Classes
Tags are incredibly useful, but they are also incredibly restrictive. The other end of Nom's functionality is pre-written parsers that allow us to accept any of a group of characters, rather than just accepting characters in a defined sequence.
Here is a selection of them:
alpha0
: Recognizes zero or more lowercase and uppercase alphabetic characters:/[a-zA-Z]/
.alpha1
does the same but returns at least one characteralphanumeric0
: Recognizes zero or more numerical and alphabetic characters:/[0-9a-zA-Z]/
.alphanumeric1
does the same but returns at least one characterdigit0
: Recognizes zero or more numerical characters:/[0-9]/
.digit1
does the same but returns at least one charactermultispace0
: Recognizes zero or more spaces, tabs, carriage returns and line feeds.multispace1
does the same but returns at least one characterspace0
: Recognizes zero or more spaces and tabs.space1
does the same but returns at least one characterline_ending
: Recognizes an end of line (both\n
and\r\n
)newline
: Matches a newline character\n
tab
: Matches a tab character\t
We can use these in
extern crate nom; pub use nom::IResult; use std::error::Error; pub use nom::character::complete::alpha0; fn parser(input: &str) -> IResult<&str, &str> { alpha0(input) } fn main() -> Result<(), Box<dyn Error>> { let (remaining, letters) = parser("abc123")?; assert_eq!(remaining, "123"); assert_eq!(letters, "abc"); Ok(()) }
One important note is that, due to the type signature of these functions,
it is generally best to use them within a function that returns an IResult
.
If you don't, some of the information around the type of the tag
function must be
manually specified, which can lead to verbose code or confusing errors.