- Language of implementation:
C++20 and CMake as a project manager
. The main CMake file is located on the same level asmain.cpp
file. - Input alphabet:
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'
(letters),'1', '2', '3', '4', '5', '6', '7', '8', '9', '0'
(numbers),' ', '\n', '\t', '\v', '\f', '\r'
(white symbols),';', ':','#', '{', '}', '[',']', '(', ')', '*', '+', '-', '/', '&', '|', '!', '=', '<', '>', ',', '.', '^', '~', '"', '%'
(Operator symbols)
- Input format:
C programing language
[Link] with some limitations such as:- No hexadecimal / octal constraints like
Oxadf
or06615
. - Only two constraint types: INT and DOUBLE.
- No modulo operator and it's derivatives.
- No advanced compiler variables.
- Strings are very basic.
- No character chains.
- Maybe some other limitations, that I'm not aware at the moment.
- No hexadecimal / octal constraints like
- Tokens list:
KEYWORD
- AllC programing language
keywords likestruct
,int
,valotile
[Link]. With two additions:false
,true
boolean constraint values.IDENTIFIER
-[a-zA-Z][a-zA-Z0-9]*
that is not a keyword.LITERAL_INT
-[0-9]+
stored inside as integer.LITERAL_DOUBLE
-[0-9]+\.[0-9]+
stored inside as double.SPECIAL_CHARACTERS
-[(){}[]#:;]
single value token.STRINGS
-\"[^"]*\"
see no character chains.OPERATOR
- [Link to all the operators implemented] no modulo related stuff.WHITE_SYMBOL
-[ \n\t\v\f\r]+
used to make the output file have the same structure as input file.
- Transition diagram is located at:
./documentatin/module/automata_table.xlsx
- All the state corresponding to a given token code are grouped together. Example:
Operator
token is related to states: 10 - 18. Blue
state is the initial state.Red
state is not accepting state. If the automata finish in this state an error is thrown.Green
accepting state. If the input stops at this state the corresponding Code token is returned and input is formatted accordingly.
- All the state corresponding to a given token code are grouped together. Example:
- Used compiler is:
gcc (MinGW-W64 x86_64-ucrt-posix-seh, built by Brecht Sanders) 12.2.0
- Executable file is provided in
./documentation/
directory if you do not feel able to compile this project on your own. - How to compile on your own:
./config/build_ios.sh # Require to set up path to c and cxx compiler in your system in this file. make make install
- Use the program:
Syntax_Highlighter.exe <path/to/input/file.c> <path/to/output/file.html> [;|\n]
- Example output and input files are provided in
./documentation/example_output/
,./tests/input/c_programs/
directory accordingly. - The only external library is
googletest
, it is used only to test code and is not being used as external code for this program source code. The proof for this is in all the CMakeLists.txt files in the project. - If you have any problems / concerns regarding a project code please contact project owner. Or create a GitHub Issue and assign the owner.
- Inside a scanner object there is a DFA (see
./documentation/module/automata_table.xlsx
) - Input from the input file provided is feed to automata and then if accepting state is reached further instructions are given (function pointer) from the current state.
- The instruction contain how to deal with given data, for example input that lead to double token generation is stored as a double inside a program memory, not as a string.
- After the token is generated it has print method is run and the output is saved.
- O(n).
- Dominik Breksa, [email protected] - [Link to GitHub profile].