TGPD: The GLADOS Project Documentation
1. Other documentations
2. Introduction
The glados is a third year Epitech project where five people are asked to build a new programming language.
Who never had problems with a programming language ? Who never thought of building it’s own ? Who never imagined a language so litteral, that anyone could use it without technical knowledge ?
Introducing: the Franc C. The Franc C is a programming language with a litteral french syntax.
In this project, we built five different things.
- The Franc C language itself, defining it’s grammar and application
- A bytecode associated with the Franc C, used to run Franc C Programs
- A virtual machine reference, used to dictate how the bytecode should be executed.
- The Franc C Compiler, a tool used to compile Franc C Reference code into a Franc C Program
- The Franc C Virtual Machine, a tool used to run Franc C Program on any computer.
- The Larousse, a library of Franc C Reference code used to help Franc C programmers with some basic utilitarian functions
We chose to implement everything in the Haskell programming language, because it offers great parsing capabilities and perfect security. The functionnal paradigm is perfect for a compiler implementation as all cases must be treated, and it is easy to test with the HUnit testing library.
3. Franc C reference
3.1. Motives
- The Franc C was built to allow french people to program in their home language.
- The language is easily readable for a non-technical person.
- The focus is put on the syntax, not in the functionalities of the language.
3.2. Inspirations
We took inspiration for our syntax on a beautiful language, linotte ; and for our implementation on the C programming language.
Obviously, the french language is our biggest inspirator.
3.2.1. Linotte
The linotte programming language is a high-level, french-litteral-syntaxed programming language made for introducing programmation to children.
We saw it as a great inspiration for our language with the next features:
- Word delimitation at the end of functions, if and while conditionals.
- Use of some syntax like “read” instead of “execute” code
The linotte syntax was, however, not litteral enough to our liking.
3.2.2. C
The C programming language is a low-level language. We took inspiration from it in the implementation of our language. For example, the C programming language is from the imperative paradigm, and so is our language.
We took many things from this language:
- The imperative paradigm
- The int-defined booleans In the C programming language, all integers can be interpreted as booleans. A false is a value strictly equal to zero, and a true is every other values.
- The possibility of defining variables local to each function call
- The possibility of using recursive function calls
- The representation of a chararacter by it’s ascii value
- The function used to print a character “putchar”
- The existence of a standard library with some useful functions
- All of the operations like xor, addition, bitshift…
- The possibility for functions to return a value
- The infix notation of operations, with parenthesis
- The possibility to compute directly with any combination of operations in any part of the code where a value is required. For example, when passing a parameter, you could pass a computation, a variable value, or a mix of both
Many of these features are also present in other languages, but we took them from our knowledge of the C programming language.
3.2.3. French language
French is not a programming, but a spoken language.
We wanted a truly grammatically correct, complete, easily readable, and beautiful language.
Our choice was obviously french, as french syntax is way more precise than english’s.
We searched for dozen of grammar rules to inforce in our language.
3.3. Language Formal Syntax (BNF)
<header> ::= "Bonjour," <spaces> <main> ::= "En" <spaces> "sachant" <spaces> "que" <spaces> "les" <spaces> "variables" <spaces> "principales" <spaces> "sont" <spaces> ":" <spaces> [<variable_list>] "\;" <spaces> "pourrais-tu" <spaces> "s'il" <spaces> "te" <spaces> "plaît" <spaces> "commencer" <spaces> "la" <spaces> "lecture" <spaces> "ici" <spaces> "?" [<spaces>] <block> [<spaces>] "Merci d'avance," <spaces> "Cordialement," <spaces> <name> <name> <variable_definition> ::= "-" <spaces> <word> "," <spaces> "de" <spaces> "type" <spaces> <variable_type> "," <spaces> "valant" <spaces> <value> <variable_list> ::= <variable_definition> <spaces> [<variable_list>] <exclusif_or> ::= "ou" <spaces> "exclusif" | "xor" <arithmetic_operator> ::= "plus" | "moins" | ("divisé" | "multiplié") <spaces> "par" | "modulo" | "fois" <comparison_operator> ::= "est" | "est" <spaces> "égale" <spaces> "à" | "est" <spaces> "différent" <spaces> "de" | "est" <spaces> "inférieure" <spaces> "à" | "est" <spaces> "supérieure" <spaces> "à" | "est" <spaces> "inférieure" <spaces> "ou" <spaces> "égale" <spaces> "à" | "est" <spaces> "supérieure" <spaces> "ou" <spaces> "égale" <spaces> "à" <logical_operator> ::= "et" | "ou" <bitwise_operator> ::= "décalé" <spaces> "binairement" <spaces> "à" <spaces> "gauche" | "décalé" <spaces> "binairement" <spaces> "à" <spaces> "droite" | "et" <spaces> "binaire" | "ou" <spaces> "binaire" | <exclusif_or> <operators> ::= <bitwise_operator> | <logical_operator> | <comparison_operator> | <arithmetic_operator> <parenthesized_condition> ::= "(" <condition> ")" <condition> ::= <word> [<spaces> <operators> <spaces> (<condition> | <parenthesized_condition>)] | <value> [<spaces> <operators> <spaces> (<condition> | <parenthesized_condition>)] <parameter> ::= "-" <spaces> <word> "," <spaces> "de" <spaces> "type" <spaces> <variable_type> <parameter_list> ::= <parameter> <spaces> [<parameter_list>] <function_definition> ::= "J’aimerais" <spaces> "définir" <spaces> "le" <spaces> "bloc" <spaces> "répondant" <spaces> "au" <spaces> "nom" <spaces> "de" <spaces> <word> "," <spaces> "de" <spaces> "type" <spaces> "de" <spaces> "retour" <spaces> <function_type> "," <spaces> "nécessitant" <spaces> "comme" <spaces> "entrée" <spaces> ":" <spaces> [<parameter_list>] "\;" <spaces> "contenant" <spaces> "les" <spaces> "variables" <spaces> ":" <spaces> [<variable_list>] "\;" <spaces> "représenté" <spaces> "par" <spaces> "le" <spaces> "code" <spaces> ("suivant" | "ci-après" | "ci-dessous") "." [<spaces>] <block> <spaces> "Merci." <text> ::= <symbol> [<text>] <comment> ::= "*" <text> "*" <quoted_value> ::= "« " <text> " »" <block> ::= <statement_list> <statement_list> ::= <statement> [[<spaces>] <statement_list> [<spaces]] <statement> ::= <condition_statement> | <return_statement> | <loop_statement> | <display_statement> | <invoke_statement> | <assignment_statement> <return_statement> ::= "Enfin," <spaces> "renvoie" <spaces> <condition> "." | "Enfin," <spaces> "sors" <spaces> "du" <spaces> "bloc." <condition_definition> ::= "Si" <spaces> <condition> "," <spaces> "exécute" <spaces> "le" <spaces> "texte" <spaces> ":" <condition_statement> ::= <condition_definition> <space> <block> "Merci." | <condition_definition> <space> <block> "\;" <space> "sinon" <space> "exécute" <space> "le" <space> "texte" <space> ":" <block> "Merci." <loop_statement> ::= "Tant" <spaces> "que" <condition> <spaces> "exécute" <spaces> "le" <spaces> "code" <spaces> "ci-après" <spaces> ":" <spaces> <block> "Merci." <display_statement> ::= "Affiche" (<condition> | <quoted_value>) "." <invoke_params> ::= <condition> ["," <condition>] <invoke_statement> ::= "J'invoque" <spaces> "le" <spaces> "bloc" <spaces> <word> ["," <spaces> "et" <spaces> "j'assigne" <spaces> "la" <spaces> "valeur" <spaces> "de" <spaces> "retour" <spaces> "à" <spaces> "la" <spaces> "variable" <spaces> <word>] ["," <spaces> "avec les paramètres" <spaces> <invoke_params>] "." <assignment_statement> ::= "J'aimerais" <spaces> "que" <spaces> <word> <spaces> "prenne" <spaces> "la" <spaces> "valeur" <spaces> <condition> "." | <word> <spaces> "prend" <spaces> "la" <spaces> "valeur" <spaces> <condition> <variable_type> ::= "entier" <spaces> "naturel" | "booléen" <function_type> ::= "entier" <spaces> "naturel" | "booléen" | "nul" <name> ::= <word> "\n" <variables_list> ::= <variables> | <variables> <spaces> <variables_list> <variables> ::= <name> "," <spaces> "de" <spaces> "type" <spaces> <type> "," <spaces> "valant" <spaces> <value> <space> ::= "\t" | "\n" | "\r" | "\f" | "\v" <spaces> ::= <space> | <space> <spaces> <number> ::= 0 | 1 | 2 | 3 | ... <signed_number> ::= "-" <number> | "~" <number> | "non" <spaces> <number> <value> ::= <signed_number> | <number> | <bool> | <char_value> <word_char> ::= <printable> | <word_symbol> <printable> ::= "A" .. "Z" | "a" .. "z" | "0" .. "9" <word_symbol> ::= "!" | "\"" | "#" | "$" | "%" | "&" | "'" | "*" | "+" | "-" | "/" | ":" | "\;" | "<" | "=" | ">" | "?" | "@" | "[" | "\\" | "]" | "^" | "_" | "{" | "}" | "|" | "~" <symbol> ::= <word_symbol> | <space> | <printable> | "," | "\t" | "\n" | "(" | ")" <word> ::= <word_char> | <word_char> <word> <char_value> ::= "'" <symbol> "'" <bool> ::= "faux" | "fausse" | "vrai" | "vraie"
As the language is heavily based on the French language, it has shaped many of our syntax decisions :
- We made it mandatory to put spaces where needed in case of punctiation, as it is a french grammar rule. This works for (’:’, ’.’, ’«’, ’»’, ’,’).
- We decided to replace all operations symbols by their literal French
equivalent as we wanted this language to use as many french words as
possible in order to represent France. Therefore, operations such
as
+or*are being replaced byplusandmultiplié par. This forces to have at least one space before and after such operations as they are now depicted as words. - As it can become easily overwhelming to type out every single
operation using its own word representation, we created alternatives
to some operations to make them shorter. For example, the XOR
^operation can either be typed out asexclusif ouorxor. The multiply*operation can also either bemultiplié parorfois. - We forced every file of our language to begin with
Bonjour,(Helloin english) in order to fit with the French Culture as french citizens are widely known for their politeness. For the same reasons,whileandifstructures ends byMerci.(Thank you=in english) and the main function ends with =Merci d'avance, Cordialement, <Name>\n<FamilyName>(forThank you in advance, Best regards, <Name>\n<FamilyName>) as this function is where the program stops. - Words used to describe instructions such as
J'invoque(Invokein english) orAffiche(Displayin english) were what we thought was the best way to directly translate such common instructions in french. - As we wanted users to have a unique experience while using the language, instructions directly made by users are conjugated in the first person in order to simulate a conversation. Other instructions executed directly by the machine such as displaying are in imperative form in the second person.
- Every single instructions ends with a
.as every single sentence in french ends the same way. whileandifstructures do not exactly share the same syntax in order to differentiate them properly :iftakes a,at the end of the condition whereaswhiledoesn’t.- Comparison are in the feminine form (
supérieure,égale) since the French wordvariableis feminine.
3.3.1. Language logic
- Header
A file in
Franc Calways starts with: “Bonjour,” followed by an empty line.Politeness is very important, and we wanted
Franc Cto express a proper and classic language, not a crappy street language. We promise you will learn the “bien parlé” by usingFranc C - Function Declaration
In order to declare a function, you must specify:
- The name of the function
J'aimerais définir le bloc répondant au nom de <function_name>,
where is the name of the function
- The return type of the function
de type de retour entier naturel,
for the function to return an integer.
de type de retour booléen,
for the function to return a boolean.
de type de retour nul,
if the function does not return anything. (void function)
- The parameters of the function
A parameter, in Franc C, is defined the same way as other programming languages : Those are variables that must be passed when the function is invoked for it to use them.
necéssitant comme entrée : - <variable_name1>, de type <type> - <variable_name2>, de type <type> ... ;You can include as many parameter as you want as long as they do not share the same name.
<variable_name1>and<variable_name2>are the name of the variables which are both different.<type>is eitherbooléenif the parameter is abooleanorentier naturelif it is anint.
Parameters are obviously optional. If the function does not take any parameters, you must specify :
nécessitant comme entrée : ;
- Variables
Variables created and used inside a function, in Franc C, are defined in the function definition. Since Franc C is a very verbose language and uses a lot of words, creating variables at the top of the function makes it a very good way to clearly see them.
Such variables are being declared the following way :
contenant les variables : - <variable_name1>, de type <type>, valant <value> - <variable_name2>, de type <type>, valant <value> ;Just like parameters, variables must not share the same name. A parameter and a variable can not share the same name either.
<variable_name1>and<variable_name2>are the name of the variables which are both different.<type>is eitherbooléenif the variable is abooleanorentier naturelif it is anint.<value>is either a number or a boolean expressionvraivraiefauxorfausse. It can also be a character value like\n(which will be 10).
This forces every single created variable to have a default value.
If the function does not use any variable, you must specify :
contenant les variables : ;
- The name of the function
- Function Body
A function declaration ends with
représenté par le code <logical_connector>.
<logical_connector>is eitherci-après,suivant, orci-dessous.
What must follow this declaration is the body of the function, which is what will be executed when the function is invoked.
In Franc C, what can be executed inside a function is being refered as an
instruction.Those are the available instructions that you can execute inside a function :
- Assignement
You can, at any time, change the value of a variable the following way :
J'aimerais que <variable_name> prenne la valeur <computables>.
or
<variable_name> prends la valeur <computables>.
<variable_name>represents the name of an existing well-defined variable of the function.<computables>represents an expression composed of values with operators. (*Check the<condition>defintion on the BNF)
- If
Execute a block of code only if a specific condition was met. It can be defined the following way :
Si <computables>, exécute le texte : <more_instructions> Merci.<computables>represents an expression composed of values with operators. (*Check the<condition>defintion on the BNF)<more_instructions>represents all the instructions that will be executed ONLY if the condition was met.Merci.represents the end of the condition.
You can also execute a block of code if the condition wasn’t met the following way :
Si <computables>, exécute le texte : <more_instructions> ; sinon, exécute le texte : <more_instructions_2> Merci.<computables>represents an expression composed of values with operators. (*Check the<condition>defintion on the BNF)<more_instructions>represents all the instructions that will be executed ONLY if the condition was met.<more_instructions_2>represents all the instructions that will be executed ONLY if the condition was not met.Merci.represents the end of the condition.
- Invoke
Invoke another function inside a function, you can also store its result in a well-defined function variable.
J'invoque le bloc <function_name>.
<function_name>is the name of an existing defined function.
If the function that is being invoked takes parameters, they must be specified the following way :
J'invoque le bloc <function_name>, avec les paramètres <computables>, <computables>, <computables>.
<function_name>is the name of an existing defined function.<computables>represents an expression composed of values with operators. (*Check the<condition>defintion on the BNF)
You must not give to a function more parameters than it takes.
In order to store the result of a function, while still giving parameters :
J'invoque le bloc <function_name>, et j'assigne la valeur de retour à la variable <variable_name>, avec les paramètres <computables>, <computables>, <computables>.
<function_name>is the name of an existing defined function.<variable_name>represents the name of an existing well-defined variable of the function.<computables>represents an expression composed of values with operators. (*Check the<condition>defintion on the BNF)
However, if you do not want to give any parameter :
J'invoque le bloc <function_name>, et j'assigne la valeur de retour à la variable <variable_name>.
<function_name>is the name of an existing defined function.<variable_name>represents the name of an existing well-defined variable of the function
- Loop
The only type of loop that is currently implemented in
Franc Care thewhileloops.A loop allows to execute a block of code until a condition cannot be met anymore.
Tant que <computables> exécute le code ci-après : <more_instructions> Merci.<computables>represents an expression composed of values with operators. (*Check the<condition>defintion on the BNF)<more_instructions>represents all the instructions that will be executed ONLY if the condition was met.Merci.represents the end of the loop.
- Return
During the execution of the function, you can return at any time a value. Returning a value will cause the function execution to stop.
The only condition to return a value is that the function must not be of type “nul”.
Enfin, renvoie <computables>.
<computables>represents an expression composed of values with operators. (*Check the<condition>defintion on the BNF)
If the function does not return anything (
nulfunction), you can still stop the execution at any time.Enfin, sors du bloc.
This works only if the function is a
nulfunction. - Show statements
This instruction is used to display either a text, or the result of a computation in the screen.
To display a computation :
Affiche <computables>.
<computables>represents an expression composed of values with operators. (*Check the<condition>defintion on the BNF)
This will display the character corresponding to the result on the screen. (E.g):
Affiche 10.will display\n.However, if you want to display a text :
Affiche « Meow ».
This will display the text
Meowon the screen.
- End of Function
You must always end your Function Body with a
Merci.. - Special case of ’Main’ Function
The main function is where the program starts. Therefore, its defintion must be clearly different from the other functions.
Such functions are being defined the following way :
Bonjour, En sachant que les variables principales sont : <variables> ; pourrais-tu s'il te plaît commencer la lecture ici ? <instructions> Merci d'avance, Cordialement, <family_name> <name><variables>is an optional field, and represents the variables of the function. They are declared the same way as a normal function.<instructions>represents all the instructions that will be executed.Merci d'avance, Cordialement, <family_name> <name>.represents the end of the main function<family_name>represents the developer’s last name, and must always end with a newline character (’\n’).<name>represents the developer’s first name, and must also end with a newline character (’\n’).
A main function does not take any parameter. Therefore, no parameter definition must be done. The value being returned in the main function represent the exit status of the program. If no value was being returned, the program exists by default with a status code of
0.
4. Bytecode reference
The bytecode starts with a magic number, followed by an address describing the placement of the PC at the start of the program, followed by the program itself.
4.1. Formal syntax (BNF)
<bytecode> ::= <magic_number> <adress> <operation_list> <magic_number> ::= "E\fE\f" ; 0x45 0xc 0x45 0xc <adress> ::= <byte> <byte> <byte> <byte> <byte> <byte> <byte> <byte> <operation_list> ::= <operation> | <operation> <operation_list> <operation> ::= <not> | <lnot> | <neg> | <and> | <or> | <xor> | <shl> | <shr> | <land> | <lor> | <add> | <sub> | <mult> | <div> | <mod> | <gt> | <ge> | <lt> | <le> | <eq> | <diff> | <updz> | <push_stack> | <pop_stack> | <pop> | <call> | <ret> | <jmp> | <zjmp> | <aff> | <affs> | <push_value> | <push_label> <not> ::= "\003" <lnot> ::= "\004" <neg> ::= "\005" <and> ::= "\006" <or> ::= "\a" <land> ::= "\b" <lor> ::= "\t" <xor> ::= "\n" <shl> ::= "\v" <shr> ::= "\f" <add> ::= "\r" <sub> ::= "\016" <mult> ::= "\017" <div> ::= "\020" <mod> ::= "\021" <gt> ::= "\022" <ge> ::= "\023" <lt> ::= "\024" <le> ::= "\025" <eq> ::= "\026" <diff> ::= "\027" <updz> ::= "\030" <push_stack> ::= "\035" <arguments> <pop_stack> ::= "\036" <arguments> <pop> ::= "\037" <call> ::= """ <ret> ::= "#" <jmp> ::= "$" <zjmp> ::= "%" <aff> ::= "&" <affs> ::= "'" <arguments> <string> <push_value> ::= "Y" <arguments> <push_label> ::= "[" <arguments> <arguments> ::= <byte> <byte> <byte> <byte> <byte> <byte> <byte> <byte> <string> ::= <byte> | <byte> <string> <byte > = "\0" .. "ÿ"
4.2. Instructions
Since the virtual machine is stack-based, every instruction representing an operation will pop it’s operands, and push the result to the stack.
| mnemonic | description | bnf object |
|---|---|---|
| not | Not, invert the bits of the operand | not |
| lnot | Logical not, invert the boolean interpretation of the operand | lnot |
| neg | Negate, switch the sign of the operand | neg |
| and | And, performs a binary AND operation between two operands | and |
| or | Or, performs a binary OR operation between two operands | or |
| xor | Exclusive or, performs a binary XOR operation between two operands | xor |
| shl | Left bitshift, shifts the bits of the operand to the left, appending zeroes | shl |
| shr | Right bitshift, shifts the bits of the operand to the right, appending zeroes | shr |
| land | Logical and, boolean interpretation of the logical AND on two operands | land |
| lor | Logical or, boolean interpretation of the logical OR on two operands | lor |
| add | Addition, adds the value of two operands | add |
| sub | Substraction, subtract the value of two operands | sub |
| mult | Multiplication, multiplies the value of two operands | mult |
| div | Division, return the dividend of the euclidian division of operand 1 by operand 2 | div |
| mod | Modulo, return the rest of the euclidian division of operand 1 by operand 2 | mod |
| gt | Greater than, compares the two operands and return a result for boolean interpretation | gt |
| ge | Greater than or equals to, compares the two operands and return a result for boolean interpretation | ge |
| lt | Less than, compares the two operands and return a result for boolean interpretation | lt |
| le | Less than or equals to, compares the two operands and return a result for boolean interpretation | le |
| eq | Equals, compares the two operands and return a result for boolean interpretation | eq |
| neq | Different from, compares the two operands and return a result for boolean interpretation | diff |
| updz | Update zero flag, pops the value from the stack and set it as the new zero flag | updz |
| push | Push value, push a 64 bits value to the stack | pushvalue |
| push | Push relative address, push a 64 bits relative address to the stack, corresponding generally to a label | pushlabel |
| push | Push from stack pointer relative address, push to the stack the 8 bytes found at the stack pointer + the address given as operand | pushstack |
| pop | Pop empty, pop 8 bytes from the stack | pop |
| pop | Pop to stack pointer relative address, pop 8 bytes from the stack, and write them to the stack pointer + the address given as operand | popstack |
| call | Call, op 8 bytes from the stack, register the new stack pointer, push the PC and stack pointer to the callstack, and jump to the value of the 8 bytes popped from the stack | call |
| ret | Return, reset the stack to the stack pointer, pop the callstack and jump to the address from the callstack pop | ret |
| jmp | Jump, pop an address from the stack, and set the PC to it | jmp |
| zjmp | Zero jump, pop an address from the stack, and if the zero flag is set, set the PC to it | zjmp |
| aff | Show, pop 8 bytes from the stack, apply a modulo 256 to the result, and write a single ascii character corresponding to the result | aff |
| affs | Show string, the first operand is a 64 bits integer corresponding to the number of caracters, that we will call n. The next n bytes are the ascii values of the characters that will be wrote | affs |
5. Virtual Machine reference
5.1. Introduction
When defining an assembly language for the Franc C binary, we focused on the ease of development of the compiler. We just had one requirement that allowed an easy development of the virtual machine: our assembly shall be stack-based, drawing inspiration from the Java Virtual Machine (JVM).
5.1.1. Stack-Based vs Register-Based
| Aspect | Stack-Based (fcvm) | Register-Based (e.g., Lua VM) |
|---|---|---|
| Storage | Operand stack (LIFO) | Virtual registers |
| Instructions | Compact, implicit operands, prone to errors | Explicit operand addressing |
| Complexity | Simple to implement | More complex logic |
| Bytecode Size | Bigger overall (smaller instructions but way more quantiry) | Smaller overall |
We chose a stack-based approach for the following reasons:
- Simpler bytecode generation: The compiler does not have to deal with registers, knowing it does not have the capacity to remember data while processing multiple operations.
- Monotyped language: The fact that the Franc C is an integer only language allows for fixed-size parameter encoding. One of the main advantages of the register-based approach is to allow for type encoding.
5.2. Stack
The stack shoud be implemented as a contiguous array of bytes with the following characteristics:
- Data type: Array of bytes
- Element size: 8 bytes per element (64-bit signed integers)
- Alignment: All operations push/pop exactly 8 bytes
- Stack size: Always a multiple of 8 bytes
- Stack Management
To perform function calls, the VM must maintains the stack pointer. It points to the start of the current stack local context. it is modified by the instructions call and ret.
5.2.1. Initialization
By default, the stack should be initialized with 8 bytes representing the return value of the program. The default value is implementation-defined.
5.2.2. Basic Operations
| Operation | Description | Stack Effect |
|---|---|---|
| PUSH | Push 8 bytes onto the stack | SP unchanged, Counter += 1 |
| POP | Remove 8 bytes from the stack | SP unchanged, Counter -= 1 |
5.3. CallStack
The callstack manages function calls and returns, enabling the VM to: - Navigate between functions - Restore execution context after function returns - Maintain function-local state - Detect program termination
5.3.1. Memory Layout
The callstack can be implemented as a contiguous array of frame
records:
By default, the callstack can be initialized as an empty array.
Example implementation : - Data type: Array of tuples [(PC, SP)] -
PC (Program Counter): Return address (next instruction after CALL) -
SP (Stack Pointer): Stack state at call time
| Frame | PC | SP | Description |
|---|---|---|---|
| 0 | 0 | 0 | Entry point |
| 1 | 12 | 16 | First function call |
| 2 | 57 | 24 | Current function locale start |
5.3.2. Function call mechanism
When executing call <address>:
- Save current context: Push
(PC + 1, SP)onto the call stackPC + 1: Address of the instruction following CALLSP: Current stack pointer (preserves caller’s stack)
- Jump to function: Set
PC = <address>
When executing Ret:
- Pop frame: Retrieve
(saved_PC, saved_SP)from call stack - Restore context:
- Set
PC = saved_PC(return to caller) - Set
SP = saved_SP(restore stack state)
- Set
- Check termination: If call stack is empty after pop, terminate program
5.4. Zero flag
The Zero Flag is a single-bit register used for conditional branching
and control flow decisions.
It stores the result of comparison and test operations,
enabling the implementation of if and while.
- Data type:
unsigned char - Default value:
1
The Zero Flag follows this logic:
| Condition | Zero Flag Value |
|---|---|
| != 0 | 1 (true) |
| == 0 | 0 (false) |
5.5. End cases
The program terminates normally when: - A Ret instruction is executed
from the main function - The call stack becomes empty after the return
Version : 1.2
Last update : Novembre 2025
Author : Acacademie Franc C’aise
6. Franc C Compiler
The Franc C Compiler (FCC) is a toolchain that allows the compilation of one or multiple Franc C Reference code (.fr) files into a single bytecode executable by the FCVM program.
6.1. Parameters list and long descriptions
fcc FILESNAME [-o|--output FILENAME] [-n|--no-larousse] [-d|--debug]
6.1.1. FILESNAME
The name of the input files. They need to be valid .fr source files.
6.1.2. –output FILENAME
The name of the output bytecode file. Only considered without the –debug flag.
Without the presence of this flag, the default output file is a.fcp.
6.1.3. –no-larousse
By default, the FCC program compiles with the Larousse sources included. See the larousse section.
This flag allows compilation without any files from the Larousse library.
6.1.4. –debug
This flags allows a debug output. Instead of writing the bytecode to a file, a debug view of the assembly is wrote to the standard output (/dev/stdout on most GNU/linux machines).
6.2. Larousse
The Larousse is the standard library who accompanies the FCC program. If
the Larousse is to be included, the fcc program will search a folder
named larousse, and then list in all subdirectories of level-1 the
.fr source files. Those found source files will be added to the input
list, and compiled.
6.3. Functioning
The FCC program works with different steps.
6.3.1. Getting inputs
The inputs are retrived via the Larousse library and the user-provided files.
6.3.2. Lexing and parsing
Each file is lexed into a list of token, then parsed into an Abstract Syntax Tree (AST).
See the parsing reference for more information.
This step may fail. Each file being converted one after the other, if one fails, the remaining ones are not even considered and an error is returned.
6.3.3. Combination
Each AST representing each files given for compilation is then combined.
In this step, the presence of a single principal function is verified, and if no or multiple are found, an error is returned.
The result is a CAST (Combined Abstract Syntax Tree)
6.3.4. Compilation
The compilation step is where the CAST is converted to a Context datastructure, linked with an Assembly datastructure. It is not yet bytecode, just a structure representing the final output.
The context structure is important for the next step.
This step is really linear, there is no acknowledgement of the rest of the program when compiling a subpart.
6.3.5. Verifications
This step ensures some basic verifications.
It takes the context from the previous step, and analyses it to find any flaws.
For example, we check here if a function is invoked whilst it’s reference is non-existant.
6.3.6. Output
This step is the final one.
If the --debug flag is given, it will take the structured assembly and
write it down just as it is stored.
Elsewise, it is converted to bytecode in really two small steps. The first one is to convert labels to relative positions. The second one is to convert everything to it’s bytecode relation.
Finaly, the resulted bytecode is written to the designated output file.
6.3.7. Justifications
Regarding the lexing and parsing part, we decided so separate those to allow for easier syntaxic sugar implementation and code readability.
The combination part was added to allow for multiple input files to be
grouped together before the compilation process. It is easier to do it
like this than to try and combine multiple outputed bytecodes together,
like the ld program does when compiling a C program.
The compilation is linear, and therefore, we had to separate the verification process from this step, to allow for a really easy-to-read code. Having a linear compilation reduces the capacity of complex program logic, but since the Franc C is a really simple imperative language, it poses no problems to do it this way. We simply store some context in a datastructure for the verification step to look in a global point of view, instead of the linearly reduced point of view of the compilation.
Finaly, we send the assembly to out output step, that is in charge of converting it to the final bytecode, or to write it as a debug output.
6.4. Lexer & Parsing reference
We have chosen Haskell to implement the lexing and parsing of the language, as functional languages like Haskell make it particularly easy to express, abstract, and combine parsers. This allows the Franc C to be very verbose, which was definitely needed.
As we wanted to speed things up and release a first version as quickly
as possible, we decided to use a pre-existing parsing library
Megaparsec. The library offers fast parsing performance, is an
improved successor of Parsec, and provides high-quality error messages
that greatly simplify debugging and development.
The Lexer and Parser both refers to two different parts:
- The
Lexer, which usesMegaparsecreads the source code’s syntax, checks that it follows the correct grammar, then converts the text into a list of tokens that the parser can later use. For example, the codeSi x est égale à 0, exécute le textewould be transformed into the list as[If, Symbol "x", Operation Equals, Number 0, Then] - The
Parser, on the other hand, takes this list of tokens to transform it into an Abstract Syntax Tree (AST). It uses Haskell’s recursive principles to build that tree so that the compiler can later use it to generate the final executable library.
The Parser is only executed if the Lexer did not fail to lex the
targetted source code.
6.4.1. Safety measures
The safety of the lexing process is ensured by Megaparsec, which
provides robust error handling and syntax checking. During the lexing
part, Megaparsec ensures the input strictly follows the defined grammar.
If the syntax is incorrect, the lexer fails gracefully by producing a
clear and descriptive error message, preventing the program from
continuing with an invalid input. If the syntax is valid, the lexer
successfully returns a well-structured list of tokens that can safely be
passed to the parser.
Since the Lexer always produces a clean and valid list of tokens, the Parser does not include any additional safety checks on its input. However, it makes sure that no more than one main function was defined.
6.4.2. Error list
- Grammar Error : Triggered when Megaparsec fails to lex or parse the source code due to invalid syntax.
- Double Main Definition : Raised when more than one main function was defined in the program.
6.4.3. Known caveats
- Lexing Error Printing : We are aware that Megaparsec does not always display clear or accurate error messages in some cases. This issue is due to our current lexing implementation rather than Megaparsec itself, and should be fixed soon or later.
- Parsing Input : The parser currently does not validate its input and fully relies on the lexer to always produce correct and consistent tokens. While this works for now, it could lead to potential issues in the future.
6.5. Compilation reference
We are here referring to compilation as the compilation and the verification step. Thus, in this section, the compilation step will refer to both of these steps.
Given the simple nature of the logic of the Franc C language, we chose here to implement a simple compilation.
The compilation step do not use any external dependencies.
We simply store a context that is updated throughout the compilation process. This context contains :
- A list of variables, with their name, size, and position relative to the stack pointer.
- A label counter, used to suffix labels with a number to distinguish them.
- A list of function context, representing each definitions.
- A list of function context, representing each invoke.
The function context contains :
- The name of the function.
- A boolean value, used to describe if this function is either returning something or of the void type ; or describes if this functions is either assigning a value when invoking or not.
- A parameter count, that indicates the number of parameters in this function.
The fact that the entire language is only integer-based facilitates the concept of variable storage and parameter types. No types appear in the compilation process, except the void type for function definitions and invoke.
6.5.1. Function call parameters, return value and variables
When a function is called, it have parameters, a return value and some variables.
Here is a map of how they are stored in the stack:
┌─────────────────┐ │ Caller frame │ ├─────────────────┤ │ Parameters │ ← PUSHREL/POPREL access ├─────────────────┤ │ Return Value │ ← PUSHREL/POPREL access ├─────────────────┤ ← SP │ Local vars │ ← PUSHREL/POPREL access ├─────────────────┤ │ Temporaries │ ← Stack top └─────────────────┘
Local variables: Accessed via SP-relative offsets - PUSHREL 0: First
local - PUSHREL 8: Second local (8-byte offset) - POPREL N: Store to
local at offset N
Return value and paramers: Accessed via SP-relative offsets -
PUSHREL -8: Return value - PUSHREL -16: First parameter -
PUSHREL -24: Second parameter
6.5.2. Safety measures
During the compilation step, we check for user errors :
- No main function is given.
- Multiple main functions are given.
- A function is defined with the name of an already defined function.
- A function defines a parameter, or a variable, with the name of an already existing, local to the function, parameter or variable.
- A function that is not referenced is invoked.
- A function that takes n parameters is invoked with more or less than n parameters.
- A return value is given to a function of type void.
- An empty return is given to a function of type non-void.
- An assignement of the return of a void-typed function is made.
6.5.3. Intended undefined behaviours
- Integer over/underflows
We do not devine the result of an integer overflow, or an integer underflow. Our integers are defined solely as signed integers on 64 bits.
The minimum integer is defined as the maximum integer negated. In the C programming language, it is defined as being one less, but in our implementation of the negate sign “-”, we chose for parsing and lexing simplicity to use the negate solely as an operation. Take the example “-5”, we will not register it as the -5 number, but as the number 5 on whom we apply the negate operation. This explains the fact that our minimum integer is defined as the negated maximum integer.
The maximum integer is defined as 263 - 1, or 9’223’372’036’854’775’807.
- Known caveats
- Lack of static analysis
The FCC compilation step do not implement any static analysis.
These are the direct cause of some – sadly – not implemented behavious:
- We cannot check for direct division by zero. For example: (10 divisé par 0) will not result in an error at compile-time.
- We cannot produce pre-computations. For example: (10 plus 10) will not result in the value 20 at compile-time, but in the computation of 10 plus 10 at run-time. It is highly inefficient.
- Lack of static analysis
6.6. Readable assembly
When the --debug flag is passed, the output of the program is a
readable assembly written to the standard output.
6.6.1. Syntax of the readable assembly
There are two cases in the syntax :
- label, they are written in the left margin, like so:
label_name: - instructions, they are written tabulated with four spaces, like so:
mnemonic [args]
6.6.2. List of instructions
| Operation | Parameter | Representation |
|---|---|---|
| label | name | name: |
| binary not | not |
|
| binary and | and |
|
| binary or | or |
|
| logical not | lnot |
|
| logical and | land |
|
| logical or | lor |
|
| exclusive or | xor |
|
| left bitshift | shl |
|
| right bitshift | shr |
|
| addition | add |
|
| substraction | sub |
|
| multiplication | mult |
|
| division | div |
|
| modulo | mod |
|
| comparison greater than | gt |
|
| comparison greater or equal to | ge |
|
| comparison less than | lt |
|
| comparison less or equal to | le |
|
| comparison equals | eq |
|
| comparison different to | neq |
|
| update zero flag | updz |
|
| push value | value | push value |
| push relative address | address | push [address] |
| push label relative address | label-name | push %label-name |
| push from stack pointer relative address | address | push @address |
| pop to stack pointer relative address | address | pop @address |
| pop | pop |
|
| invoke a function | call |
|
| return | ret |
|
| jump to address | jmp |
|
| jump to address if zero flag | zjmp |
|
| print character | aff |
|
| print string | string | affs "string" |
7. Franc C Virtual Machine
7.1. Input Descriptions
7.1.1. Bytecode File Format
The fcvm executes Franc C bytecode files.
They have the following header structure:
┌─────────────────────────────────────┐ │ Magic Number (4 bytes) │ 0x45 0x0C 0x45 0x0C ├─────────────────────────────────────┤ │ Bytecode Instructions │ Variable length └─────────────────────────────────────┘
Magic Number: 0x45 0x0C 0x45 0x0C (4 bytes), as defined in the
Franc C bytecode reference. - Validates file
format at VM initialization - Invalid magic number triggers immediate
error
7.1.2. Command-Line Interface
./fcvm <bytecode_file>
Exit Codes:
| Code | Description |
|---|---|
0 |
Success (or custom return value = 0) |
1-255 |
Custom exit code (return value mod 256) |
84 |
Any error (runtime or load-time) |
7.2. Architecture Implementation
7.2.1. Memory Layout
The VM maintains three core structures:
- The operand stack
- The callstack
- Three specific registers
- Operand Stack
Stores temporary computation values, variables and function return values.
┌─────────────┬─────────────┬─────────────┐ │ Element 0 │ Element 1 │ Element 2 │ │ (8 bytes) │ (8 bytes) │ (8 bytes) │ └─────────────┴─────────────┴─────────────┘ ↑ Stack TopProperties: - Type:
[Word8](list of bytes) - Element size: 8 bytes (64-bit) - Initial state: One zero element[0x00 × 8]- Max size: 1,000,000 elements (8 MB) - Operations: PUSH (prepend), POP (drop first 8 bytes) - Call Stack
Manages function call frames.
┌─────────────────────────────────┐ │ Frame: (PC: 0, SP: 0) │ ← Entry ├─────────────────────────────────┤ │ Frame: (PC: 12, SP: 16) │ ├─────────────────────────────────┤ │ Frame: (PC: 57, SP: 24) │ ← Current └─────────────────────────────────┘
Properties: - Type:
[(PC, SP)](list of tuples) - Frame content: Return address (PC) + Stack pointer (SP) - Initial state: Empty[]- Max depth: 1,000,000 framesBehavior: -
CALL: Push(PC+1, SP), jump to function -RET: Pop frame, restore context, trim stack - Termination: Empty call stack afterRET→ program ends - Registers
Program Counter (PC) - Points to current instruction in bytecode - Initial: 8 (after 4-byte magic number) - Updates:
+1(normal),+9(with data), or jump targetStack Pointer (SP) - Marks base of current function’s stack frame - Used for local variable access via
PUSHREL=/=POPREL- Modified duringCALLandRETZero Flag (ZF) - Type:
Word8(0 or 1) - Initial: 1 (true) - Semantics:0if stack top = 0, else1- Set byZFLAG, tested byZJMP
7.2.2. Instruction Set Overview
The VM implements 40 opcodes across 7 categories:
7.2.3. Function Call Convention
Calling sequence:
1. Push function address 2. Execute CALL → Saves (PC+1, SP) to call stack → Jumps to function
Return sequence:
1. Execute RET → Restores PC and SP → Trims stack to SP position → If call stack empty: program terminates
Stack frame layout:
┌──────────────────┐ │ Caller frame │ ├──────────────────┤ ← SP │ Local function │ ← Stack top └──────────────────┘
7.3. Exceptions List
All errors exit with code 84 and write to stderr.
7.3.1. Load-Time Errors
Invalid Magic Number
Message: *** UNRECONIZED FILE FORMAT Trigger: First 4 bytes ≠ 0x45 0x0C 0x45 0x0C
File I/O Error
Message: Error with file Trigger: File not found, permission denied, or read error
7.3.2. Runtime Errors
Arithmetic Errors
Message: *** 0 CAN'T BE USE in this operation Trigger: DIV or MOD with divisor = 0
Stack & CallStack Overflow
Message: *** STACK OVERFLOW Trigger: Operand stack ≥ 1M elements OR call stack ≥ 1M frames Cause: Excessive PUSH without POP, or infinite recursion
Stack Underflow
Message: *** STACK ERROR Trigger: Operation requires elements when stack_size ≤ 1 Affects: All binary/unary ops, CALL, JMP, ZJMP, AFF
7.4. Known Caveats
7.4.1. Undefined Behaviors
Integer Overflow - 64-bit signed integers wrap silently - Example:
INT64_MAX + 1 → UNDEFINED - No error raised
Invalid Opcodes - Treated as NOP (PC += 1) - Corrupted bytecode may execute partially - No detection mechanism
7.4.2. Performance Limitations
List-Based Stack - Push/Pop: O(1) ✓ - SP-relative access: O(n) ✗ - Impact: Slow for deep stacks (>1000 elements)
String Operations - AFFS creates intermediate structures - Linear in
string length (acceptable)
7.4.3. Portability Issues
Platform Word Size - PC and SP use Haskell Int type - 32-bit
systems: Max ~2GB bytecode - 64-bit systems: Max ~8GB bytecode - Not a
practical concern for typical use
Document Version: 1.1
Last Updated: November 2025
Author : Acacademie Franc C’aise
8. Larousse
8.1. Introduction (motives)
As programming using Franc C can easily become overwhelming, we decided
to provide a set of already implemented functions, in a library called
Larousse, as a way to facilitate some tasks.
Larousse also serves as a way to showcase our language’s capabilities,
and demonstrate its verbose syntax, and acts as a test suite for the
Académie Franc C'Aise, which runs functional tests on it.
8.2. Functions
| Functions | Parameters | Description | Benchmark* |
|---|---|---|---|
additionner |
nombre, second_nombre |
Returns nombre + second_nombre. |
0.086s |
afficher_nombre |
nombre |
Displays in the standard output the number nombre. |
0.067s |
carré_de |
nombre |
Returns nombre * nombre. |
0.074s |
diviser |
nombre, second_nombre |
Returns nombre / second_nombre. |
0.060s |
douze |
nombre |
Returns nombre / 12 if nombre is multiple of 12, otherwise -1. |
0.069s |
est_en_majuscule |
nombre |
Returns 1 if nombre is an uppercase letter, otherwise 0. |
0.063s |
est_en_minuscule |
nombre |
Returns 1 if nombre is a lowercase letter, otherwise 0. |
0.060s |
est_négatif |
nombre |
Returns 1 if nombre is negative, otherwise 0. |
0.070s |
est_premier |
nombre |
Returns 1 if nombre is a prime number, otherwise 0. |
0.051s |
fizzbuzz |
nombre |
Displays « Fizz » if nombre is multiple of 3, « Buzz » if multiple of 5, and « FizzBuzz » if multiple of both, otherwise displays nombre. |
0.056s |
imprimer_peigne |
- | Displays in ascending order all the smallest numbers composed of three digits. | 0.220s |
imprimer_peigne_2 |
- | Displays in ascending order all the smallest numbers composed of four digits. | 8.481s |
imprimer_peigne_nombre |
nombre |
Displays in ascending order all the smallest numbers composed of nombre digits. |
Depends |
max |
nombre, second_nombre |
Returns nombre if nombre is superior than second_nombre, otherwise second_nombre. |
0.060s |
min |
nombre, second_nombre |
Returns second_nombre if nombre is superior than second_nombre, otherwise nombre. |
0.092s |
multiplier |
nombre, second_nombre |
Returns nombre * second_nombre. |
0.065s |
prochain_premier |
nombre |
Returns the first prime number greater than nombre. |
0.072s |
précedent |
nombre |
Returns nombre - 1. |
0.069s |
racine_carré_de |
nombre |
Returns the integer square root of nombre. |
0.070s |
soustraire |
nombre, second_nombre |
Returns nombre - second_nombre. |
0.090s |
suivant |
nombre |
Returns nombre + 1. |
0.065s |
(*) Benchmark specs:
CPU : Intel i9 14900K GPU : NVIDIA RTX 4070 TI SUPER RAM: 32 GB
8.3. Académie Franc C’Aise
Académie Franc C'Aise is the functional tester made in Python for
the Franc C, and uses Larousse standard library.
You will find in every single Larousse source code function: - The
same function written in C. - A main function in Franc C which calls
the function in Franc C with various parameter, in the test
folder. - A main function in C which calls the function in C with
the same parameters, in the test folder.
Académie Franc C'Aise compiles both the C and Franc C version
using gcc and fcvm. It then runs both versions, and check whether
their inputs are the same as a way to test the language. It also
displays how much time each step took (Compilation, Execution) for both
languages then finally shows the time difference between those.
Franc C is in most cases faster to compile than C. However, C is
almost always faster in execution time than Franc C.
At the end of the program, Académie Franc C'Aise displays how many
tests passed out of how many were run. It then exits with a failure if
not all tests passed.