:py:mod:`raesl.compile.scanner`
===============================

.. py:module:: raesl.compile.scanner

.. autoapi-nested-parse::

   Lexer for on-demand recognition of tokens.

   As the language has unrestrained text in its needs, lexing beforehand is not
   going to work in all cases. Instead, the scanner tries to match tokens on demand.

   Also there is overlap in matching between tokens (a NONSPACE expression matches
   almost all other tokens as well, and a NAME expression matches all keywords).
   The rule applied here (by means of sorting edges in the state machines) is that
   specific wins from generic. For example, if at some point both the OR_KW and
   the NONSPACE token may be used, and the text is "or", the OR_KW token is chosen.


Module Contents
---------------

Classes
~~~~~~~

.. autoapisummary::

   raesl.compile.scanner.Token
   raesl.compile.scanner.Lexer


Functions
~~~~~~~~~

.. autoapisummary::

   raesl.compile.scanner.get_token_priority


Attributes
~~~~~~~~~~

.. autoapisummary::

   raesl.compile.scanner.SPACE_RE
   raesl.compile.scanner._FIRST_NAME_PAT
   raesl.compile.scanner._OTHER_NAME_PAT
   raesl.compile.scanner._DOTTED_NAME_PAT
   raesl.compile.scanner._KWORD_AVOID_PAT
   raesl.compile.scanner._NAME_AVOID_PAT
   raesl.compile.scanner.comp
   raesl.compile.scanner.TOKENS


.. py:data:: SPACE_RE

   
.. py:data:: _FIRST_NAME_PAT
   :value: '[A-Za-z][A-Za-z0-9]*(?:[-_][A-Za-z0-9]+)*'

   
.. py:data:: _OTHER_NAME_PAT
   :value: '[A-Za-z0-9]+(?:[-_][A-Za-z0-9]+)*'

   
.. py:data:: _DOTTED_NAME_PAT

   
.. py:data:: _KWORD_AVOID_PAT
   :value: '(?![-_A-Za-z0-9])'

   
.. py:data:: _NAME_AVOID_PAT
   :value: '(?![A-Za-z0-9])'

   
.. py:data:: comp

   
.. py:data:: TOKENS
   :type: Dict[str, Pattern]

   
.. py:function:: get_token_priority(tok_type: str) -> int

   Priority of the tokens. Higher value is less specific.

   :param tok_type: Name of the token type.

   :returns: Priority of the token.


.. py:class:: Token(tok_type: str, tok_text: str, fname: Optional[str], offset: int, line_offset: int, line_num: int)


   Data of a matched token.

   :param tok_type: Type name of the token.
   :param tok_text: Text of the token.
   :param fname: Name of the file containing the text.
   :param offset: Offset of the current position in the input text.
   :param line_offset: Offset of the first character of the current line in the input
                       text.
   :param line_num: Line number of the current line.

   .. py:method:: get_location(offset: int = 0) -> raesl.types.Location

      Get this token's Location.


   .. py:method:: __str__() -> str

      Return str(self).


.. py:class:: Lexer(fname: Optional[str], text: str, offset: int, line_offset: int, line_num: int, doc_comments: List[Token])


   On-demand scanner.

   For debugging token matching, enable the PARSER_DEBUG flag near the top
   of the file. That also enables debug output in the parser.parse_line to
   understand what line is being tried, and which line match steppers are
   running.

   Arguments;
       fname: Name of the file containing the text, may be None.
       text: Input text.
       length: Length of the text.
       offset: Offset of the current position in the text.
       line_offset: Offset of the first character of the current line in the input
           text.
       line_num: Line number of the current line.
       doc_comments: Documentation comments found so far, shared between all scanners.

   .. py:method:: copy() -> Lexer

      Make copy of self. New scanner at the same position as self.


   .. py:method:: get_location() -> raesl.types.Location

      Get location information of the next token. Note that such a position may be
      at an unexpected place since new-lines are significant. For example, it may be
      at the end of a comment.

      :returns: Location information of the next token.


   .. py:method:: get_linecol() -> Tuple[int, int]

      Get line and column information of the next token. Note that as new-lines
      are significant, such a position may be at an unexpected place, for example at
      the end of a comment.

      :returns: Line and column information of the next token.


   .. py:method:: find(tok_type: str) -> Optional[Token]

      Try to find the requested token.

      :param tok_type: Type name of the token.

      :returns: Found token, or None.


   .. py:method:: skip_white()

      Skip white space, triple dots, newlines, and comments. Implements the
             following Graphviz diagram:

             digraph white {
                 1 -> 1 [label="spc+"]
                 1 -> 99 [label="eof"]
                 1 -> 4 [label="#.*"]
                 1 -> 5 [label="..."]
                 1 -> 99 [label="other"]

                 4 -> 99 [label="eof"]
                 4 -> 99 [label="nl"]

                 5 -> 5 [label="spc+"]
                 5 -> 99 [label="eof"]
                 5 -> 1 [label="nl"]
                 5 -> 6 [label="#.*"]
                 5 -> REV [label="..."]
                 5 -> REV [label="other"]

                 6 -> 99 [label="eof"]
                 6 -> 1 [label="nl"]
             }

             Jump to non-99 location eats the recognized text, REV means the
             last found "..." was a false positive and must be reverted to just
             before that position.

             Note that
      is a significant token, so it is not skipped everywhere.


   .. py:method:: _save_doc_comment(hash_char_offset: int, nl_offset: int)

      Inspect a comment and rescue doc-comments.

              Arguments:
                  hash_char_offset: Offset of the '#' character in the text.
                  nl_offset: Offset of the next '
      ' in the text, negative value means
                      no '
      ' was found.