raesl.compile.scanner#

Lexer for on-demand recognition of tokens.

As the language has unrestrained text in its needs, lexing beforehand is not going to work in all cases. Instead, the scanner tries to match tokens on demand.

Also there is overlap in matching between tokens (a NONSPACE expression matches almost all other tokens as well, and a NAME expression matches all keywords). The rule applied here (by means of sorting edges in the state machines) is that specific wins from generic. For example, if at some point both the OR_KW and the NONSPACE token may be used, and the text is “or”, the OR_KW token is chosen.

Module Contents#

Classes#

Token

Data of a matched token.

Lexer

On-demand scanner.

Functions#

get_token_priority(→ int)

Priority of the tokens. Higher value is less specific.

Attributes#

SPACE_RE

_FIRST_NAME_PAT

_OTHER_NAME_PAT

_DOTTED_NAME_PAT

_KWORD_AVOID_PAT

_NAME_AVOID_PAT

comp

TOKENS

raesl.compile.scanner.SPACE_RE#
raesl.compile.scanner._FIRST_NAME_PAT = '[A-Za-z][A-Za-z0-9]*(?:[-_][A-Za-z0-9]+)*'#
raesl.compile.scanner._OTHER_NAME_PAT = '[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*'#
raesl.compile.scanner._DOTTED_NAME_PAT#
raesl.compile.scanner._KWORD_AVOID_PAT = '(?![-_A-Za-z0-9])'#
raesl.compile.scanner._NAME_AVOID_PAT = '(?![A-Za-z0-9])'#
raesl.compile.scanner.comp#
raesl.compile.scanner.TOKENS: Dict[str, Pattern]#
raesl.compile.scanner.get_token_priority(tok_type: str) int#

Priority of the tokens. Higher value is less specific.

Parameters:

tok_type – Name of the token type.

Returns:

Priority of the token.

class raesl.compile.scanner.Token(tok_type: str, tok_text: str, fname: str | None, offset: int, line_offset: int, line_num: int)#

Data of a matched token.

Parameters:
  • tok_type – Type name of the token.

  • tok_text – Text of the token.

  • fname – Name of the file containing the text.

  • offset – Offset of the current position in the input text.

  • line_offset – Offset of the first character of the current line in the input text.

  • line_num – Line number of the current line.

get_location(offset: int = 0) raesl.types.Location#

Get this token’s Location.

__str__() str#

Return str(self).

class raesl.compile.scanner.Lexer(fname: str | None, text: str, offset: int, line_offset: int, line_num: int, doc_comments: List[Token])#

On-demand scanner.

For debugging token matching, enable the PARSER_DEBUG flag near the top of the file. That also enables debug output in the parser.parse_line to understand what line is being tried, and which line match steppers are running.

Arguments;

fname: Name of the file containing the text, may be None. text: Input text. length: Length of the text. offset: Offset of the current position in the text. line_offset: Offset of the first character of the current line in the input

text.

line_num: Line number of the current line. doc_comments: Documentation comments found so far, shared between all scanners.

copy() Lexer#

Make copy of self. New scanner at the same position as self.

get_location() raesl.types.Location#

Get location information of the next token. Note that such a position may be at an unexpected place since new-lines are significant. For example, it may be at the end of a comment.

Returns:

Location information of the next token.

get_linecol() Tuple[int, int]#

Get line and column information of the next token. Note that as new-lines are significant, such a position may be at an unexpected place, for example at the end of a comment.

Returns:

Line and column information of the next token.

find(tok_type: str) Token | None#

Try to find the requested token.

Parameters:

tok_type – Type name of the token.

Returns:

Found token, or None.

skip_white()#
Skip white space, triple dots, newlines, and comments. Implements the

following Graphviz diagram:

digraph white {

1 -> 1 [label=”spc+”] 1 -> 99 [label=”eof”] 1 -> 4 [label=”#.*”] 1 -> 5 [label=”…”] 1 -> 99 [label=”other”]

4 -> 99 [label=”eof”] 4 -> 99 [label=”nl”]

5 -> 5 [label=”spc+”] 5 -> 99 [label=”eof”] 5 -> 1 [label=”nl”] 5 -> 6 [label=”#.*”] 5 -> REV [label=”…”] 5 -> REV [label=”other”]

6 -> 99 [label=”eof”] 6 -> 1 [label=”nl”]

}

Jump to non-99 location eats the recognized text, REV means the last found “…” was a false positive and must be reverted to just before that position.

Note that

is a significant token, so it is not skipped everywhere.

_save_doc_comment(hash_char_offset: int, nl_offset: int)#

Inspect a comment and rescue doc-comments.

Arguments:

hash_char_offset: Offset of the ‘#’ character in the text. nl_offset: Offset of the next ‘

‘ in the text, negative value means

no ‘

‘ was found.