7.6. Scanner Scripting¶
7.6.1. Introduction¶
With the Scanner Scripting you can access a token stream of any source code file or even
a specific token at a certain position in the file. This can be useful to inspect
comments or look for code that has been removed via preprocessor macros (e. g.
#ifdef etc.) because this information is not present in the IR.
7.6.1.1. Accessing the Scanner¶
You have to import the bauhaus.scanner module when you want to use the Scanner (see Listing Importing the scanner part of bauhaus).
from bauhaus import scanner
7.6.1.2. Global functions of the Scanner¶
tokens(filename[, line])
Calling function tokens returns a generator function that yields all the tokens of filename. If you also specify line, only the tokens starting in that line are yielded.
token(filename, line, column)
Calling function token returns the token starting at the specified position or None if the position does not point to a token start.
7.6.1.3. Token¶
Objects of class Token represent source code tokens. They represent the source code verbatim, but are classified into categories like identifiers, literals, or keywords of the respective programming languages.
Accessing fields¶
The instance method fields returns a list containing the names of all fields of the Token.
The instance method field accesses a specific field of a Token and returns the field’s content.
Tokens offer the following fields: Pathname, Line, Column, End_Line, End_Column, Logical_Line, Logical_Column, Logical_End_Line, Logical_End_Column, Type, Name, and Value.
The field name is supplied as fieldname and the return type of the method depends on
the field type. For Line, Column, End_Line, End_Column, integer values are
returned indicating the location in the source code file. Similarly, the fields
Logical_Line, Logical_Column, Logical_End_Line, Logical_End_Column are
represented as integer values and indicate the logical source location of the token.
This is the location where the token is logically located, after any #line
directives have been processed. The field Type also is an integer value. For
Pathname, Name, and Value you get string values.
The instance methods prev and next return the previous and next tokens in the token stream respectively. If the end of the token stream of a file is reached by forward or backward iteration, then the exception StopIteration is raised.
The function __str__ returns the token value.
Token Types¶
The number which is stored in the field Type is actually a constant designating the type of the token, like identifier, integer literal, etc. The list of token types is subject to change, since for example new input languages might be added to the Axivion Suite . You can inspect the current list by using the Python online help:
from bauhaus import scanner
help(scanner)
In the section data all token type constants are given; their names are prefixed by “Sym_”.
Example Usage¶
The following example searches for occurrences of tokens of type Sym_Identifier having the value “x”, and prints out the corresponding lines.
from bauhaus import scanner
for token in scanner.tokens("file.c"):
if token.Type == scanner.Sym_Identifier and token.Value == "x":
print "identifier \"x\" at line %d" % token.Line