:py:mod:`raesl.compile.scanner` =============================== .. py:module:: raesl.compile.scanner .. autoapi-nested-parse:: Lexer for on-demand recognition of tokens. As the language has unrestrained text in its needs, lexing beforehand is not going to work in all cases. Instead, the scanner tries to match tokens on demand. Also there is overlap in matching between tokens (a NONSPACE expression matches almost all other tokens as well, and a NAME expression matches all keywords). The rule applied here (by means of sorting edges in the state machines) is that specific wins from generic. For example, if at some point both the OR_KW and the NONSPACE token may be used, and the text is "or", the OR_KW token is chosen. Module Contents --------------- Classes ~~~~~~~ .. autoapisummary:: raesl.compile.scanner.Token raesl.compile.scanner.Lexer Functions ~~~~~~~~~ .. autoapisummary:: raesl.compile.scanner.get_token_priority Attributes ~~~~~~~~~~ .. autoapisummary:: raesl.compile.scanner.SPACE_RE raesl.compile.scanner._FIRST_NAME_PAT raesl.compile.scanner._OTHER_NAME_PAT raesl.compile.scanner._DOTTED_NAME_PAT raesl.compile.scanner._KWORD_AVOID_PAT raesl.compile.scanner._NAME_AVOID_PAT raesl.compile.scanner.comp raesl.compile.scanner.TOKENS .. py:data:: SPACE_RE .. py:data:: _FIRST_NAME_PAT :value: '[A-Za-z][A-Za-z0-9]*(?:[-_][A-Za-z0-9]+)*' .. py:data:: _OTHER_NAME_PAT :value: '[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*' .. py:data:: _DOTTED_NAME_PAT .. py:data:: _KWORD_AVOID_PAT :value: '(?![-_A-Za-z0-9])' .. py:data:: _NAME_AVOID_PAT :value: '(?![A-Za-z0-9])' .. py:data:: comp .. py:data:: TOKENS :type: Dict[str, Pattern] .. py:function:: get_token_priority(tok_type: str) -> int Priority of the tokens. Higher value is less specific. :param tok_type: Name of the token type. :returns: Priority of the token. .. py:class:: Token(tok_type: str, tok_text: str, fname: Optional[str], offset: int, line_offset: int, line_num: int) Data of a matched token. :param tok_type: Type name of the token. :param tok_text: Text of the token. :param fname: Name of the file containing the text. :param offset: Offset of the current position in the input text. :param line_offset: Offset of the first character of the current line in the input text. :param line_num: Line number of the current line. .. py:method:: get_location(offset: int = 0) -> raesl.types.Location Get this token's Location. .. py:method:: __str__() -> str Return str(self). .. py:class:: Lexer(fname: Optional[str], text: str, offset: int, line_offset: int, line_num: int, doc_comments: List[Token]) On-demand scanner. For debugging token matching, enable the PARSER_DEBUG flag near the top of the file. That also enables debug output in the parser.parse_line to understand what line is being tried, and which line match steppers are running. Arguments; fname: Name of the file containing the text, may be None. text: Input text. length: Length of the text. offset: Offset of the current position in the text. line_offset: Offset of the first character of the current line in the input text. line_num: Line number of the current line. doc_comments: Documentation comments found so far, shared between all scanners. .. py:method:: copy() -> Lexer Make copy of self. New scanner at the same position as self. .. py:method:: get_location() -> raesl.types.Location Get location information of the next token. Note that such a position may be at an unexpected place since new-lines are significant. For example, it may be at the end of a comment. :returns: Location information of the next token. .. py:method:: get_linecol() -> Tuple[int, int] Get line and column information of the next token. Note that as new-lines are significant, such a position may be at an unexpected place, for example at the end of a comment. :returns: Line and column information of the next token. .. py:method:: find(tok_type: str) -> Optional[Token] Try to find the requested token. :param tok_type: Type name of the token. :returns: Found token, or None. .. py:method:: skip_white() Skip white space, triple dots, newlines, and comments. Implements the following Graphviz diagram: digraph white { 1 -> 1 [label="spc+"] 1 -> 99 [label="eof"] 1 -> 4 [label="#.*"] 1 -> 5 [label="..."] 1 -> 99 [label="other"] 4 -> 99 [label="eof"] 4 -> 99 [label="nl"] 5 -> 5 [label="spc+"] 5 -> 99 [label="eof"] 5 -> 1 [label="nl"] 5 -> 6 [label="#.*"] 5 -> REV [label="..."] 5 -> REV [label="other"] 6 -> 99 [label="eof"] 6 -> 1 [label="nl"] } Jump to non-99 location eats the recognized text, REV means the last found "..." was a false positive and must be reverted to just before that position. Note that is a significant token, so it is not skipped everywhere. .. py:method:: _save_doc_comment(hash_char_offset: int, nl_offset: int) Inspect a comment and rescue doc-comments. Arguments: hash_char_offset: Offset of the '#' character in the text. nl_offset: Offset of the next ' ' in the text, negative value means no ' ' was found.