TGPD: The GLADOS Project Documentation

1. Other documentations

2. Introduction

The glados is a third year Epitech project where five people are asked to build a new programming language.

Who never had problems with a programming language ? Who never thought of building it’s own ? Who never imagined a language so litteral, that anyone could use it without technical knowledge ?

Introducing: the Franc C. The Franc C is a programming language with a litteral french syntax.

In this project, we built five different things.

The Franc C language itself, defining it’s grammar and application
A bytecode associated with the Franc C, used to run Franc C Programs
A virtual machine reference, used to dictate how the bytecode should be executed.
The Franc C Compiler, a tool used to compile Franc C Reference code into a Franc C Program
The Franc C Virtual Machine, a tool used to run Franc C Program on any computer.
The Larousse, a library of Franc C Reference code used to help Franc C programmers with some basic utilitarian functions

We chose to implement everything in the Haskell programming language, because it offers great parsing capabilities and perfect security. The functionnal paradigm is perfect for a compiler implementation as all cases must be treated, and it is easy to test with the HUnit testing library.

3. Franc C reference

3.1. Motives

The Franc C was built to allow french people to program in their home language.
The language is easily readable for a non-technical person.
The focus is put on the syntax, not in the functionalities of the language.

3.2. Inspirations

We took inspiration for our syntax on a beautiful language, linotte ; and for our implementation on the C programming language.

Obviously, the french language is our biggest inspirator.

3.2.1. Linotte

The linotte programming language is a high-level, french-litteral-syntaxed programming language made for introducing programmation to children.

We saw it as a great inspiration for our language with the next features:

Word delimitation at the end of functions, if and while conditionals.
Use of some syntax like “read” instead of “execute” code

The linotte syntax was, however, not litteral enough to our liking.

3.2.2. C

The C programming language is a low-level language. We took inspiration from it in the implementation of our language. For example, the C programming language is from the imperative paradigm, and so is our language.

We took many things from this language:

The imperative paradigm
The int-defined booleans In the C programming language, all integers can be interpreted as booleans. A false is a value strictly equal to zero, and a true is every other values.
The possibility of defining variables local to each function call
The possibility of using recursive function calls
The representation of a chararacter by it’s ascii value
The function used to print a character “putchar”
The existence of a standard library with some useful functions
All of the operations like xor, addition, bitshift…
The possibility for functions to return a value
The infix notation of operations, with parenthesis
The possibility to compute directly with any combination of operations in any part of the code where a value is required. For example, when passing a parameter, you could pass a computation, a variable value, or a mix of both

Many of these features are also present in other languages, but we took them from our knowledge of the C programming language.

3.2.3. French language

French is not a programming, but a spoken language.

We wanted a truly grammatically correct, complete, easily readable, and beautiful language.

Our choice was obviously french, as french syntax is way more precise than english’s.

We searched for dozen of grammar rules to inforce in our language.

3.3. Language Formal Syntax (BNF)

<header> ::= "Bonjour," <spaces>

<main> ::= "En" <spaces> "sachant" <spaces> "que" <spaces> "les" <spaces> "variables" <spaces> "principales" <spaces> "sont" <spaces> ":" <spaces> [<variable_list>]
 "\;" <spaces> "pourrais-tu" <spaces> "s'il" <spaces> "te" <spaces> "plaît" <spaces> "commencer" <spaces> "la" <spaces> "lecture" <spaces> "ici" <spaces> "?" [<spaces>] <block> [<spaces>] "Merci d'avance," <spaces> "Cordialement," <spaces> <name> <name>
 
<variable_definition> ::= "-" <spaces> <word> "," <spaces> "de" <spaces> "type" <spaces> <variable_type> "," <spaces> "valant" <spaces> <value> 

<variable_list> ::= <variable_definition> <spaces> [<variable_list>]

<exclusif_or> ::= "ou" <spaces> "exclusif" | "xor"

<arithmetic_operator> ::= "plus" | "moins"
                        | ("divisé" | "multiplié") <spaces> "par"
                        | "modulo"
                        | "fois"
                     
<comparison_operator> ::= "est"
                        | "est" <spaces> "égale" <spaces> "à"
                        | "est" <spaces> "différent" <spaces> "de"
                        | "est" <spaces> "inférieure" <spaces> "à"
                        | "est" <spaces> "supérieure" <spaces> "à"
                        | "est" <spaces> "inférieure" <spaces> "ou" <spaces> "égale" <spaces> "à"
                        | "est" <spaces> "supérieure" <spaces> "ou" <spaces> "égale" <spaces> "à"
                     
<logical_operator> ::= "et" | "ou"

<bitwise_operator> ::= "décalé" <spaces> "binairement" <spaces> "à" <spaces> "gauche"
    | "décalé" <spaces> "binairement" <spaces> "à" <spaces> "droite"
    | "et" <spaces> "binaire"
    | "ou" <spaces> "binaire"
    | <exclusif_or>
 
<operators> ::= <bitwise_operator> | <logical_operator> | <comparison_operator> | <arithmetic_operator>

<parenthesized_condition> ::= "(" <condition> ")"

<condition> ::= <word> [<spaces> <operators> <spaces> (<condition> | <parenthesized_condition>)]
              | <value> [<spaces> <operators> <spaces> (<condition> | <parenthesized_condition>)]
             
<parameter> ::= "-" <spaces> <word> "," <spaces> "de" <spaces> "type" <spaces> <variable_type>

<parameter_list> ::= <parameter> <spaces> [<parameter_list>]

<function_definition> ::= "J’aimerais" <spaces> "définir" <spaces> "le" <spaces> "bloc" <spaces> "répondant" <spaces> "au" <spaces> "nom" <spaces> "de" <spaces> <word> "," <spaces> "de" <spaces> "type" <spaces> "de" <spaces> "retour" <spaces> <function_type> "," <spaces> "nécessitant" <spaces> "comme" <spaces> "entrée" <spaces> ":" <spaces> [<parameter_list>] "\;" <spaces> "contenant" <spaces> "les" <spaces> "variables" <spaces> ":" <spaces> [<variable_list>] "\;" <spaces> "représenté" <spaces> "par" <spaces> "le" <spaces> "code" <spaces> ("suivant" | "ci-après" | "ci-dessous") "." [<spaces>] <block> <spaces> "Merci."

<text> ::= <symbol> [<text>]

<comment> ::= "*" <text> "*"

<quoted_value> ::= "« " <text> " »"

<block> ::= <statement_list>

<statement_list> ::= <statement> [[<spaces>] <statement_list> [<spaces]]

<statement> ::= <condition_statement> | <return_statement> | <loop_statement> | <display_statement> | <invoke_statement> | <assignment_statement>

<return_statement> ::= "Enfin," <spaces> "renvoie" <spaces> <condition> "." | "Enfin," <spaces> "sors" <spaces> "du" <spaces> "bloc."

<condition_definition> ::= "Si" <spaces> <condition> "," <spaces> "exécute" <spaces> "le" <spaces> "texte" <spaces> ":"

<condition_statement> ::= <condition_definition> <space> <block> "Merci."
    | <condition_definition> <space> <block> "\;" <space> "sinon" <space> "exécute" <space> "le" <space> "texte" <space> ":" <block> "Merci."
 
<loop_statement> ::= "Tant" <spaces> "que" <condition> <spaces> "exécute" <spaces> "le" <spaces> "code" <spaces> "ci-après" <spaces> ":" <spaces> <block> "Merci."

<display_statement> ::= "Affiche" (<condition> | <quoted_value>) "."

<invoke_params> ::= <condition> ["," <condition>]

<invoke_statement> ::= "J'invoque" <spaces> "le" <spaces> "bloc" <spaces> <word>
    ["," <spaces> "et" <spaces> "j'assigne" <spaces> "la" <spaces> "valeur" <spaces> "de" <spaces> "retour" <spaces> "à" <spaces> "la" <spaces> "variable" <spaces> <word>]
    ["," <spaces> "avec les paramètres" <spaces> <invoke_params>] "."
 
<assignment_statement> ::= "J'aimerais" <spaces> "que" <spaces> <word> <spaces> "prenne" <spaces> "la" <spaces> "valeur" <spaces> <condition> "." |
    <word> <spaces> "prend" <spaces> "la" <spaces> "valeur" <spaces> <condition>
 
<variable_type> ::= "entier" <spaces> "naturel" | "booléen"

<function_type> ::= "entier" <spaces> "naturel" | "booléen" | "nul"

<name> ::= <word> "\n"

<variables_list> ::= <variables> | <variables> <spaces> <variables_list>

<variables> ::= <name> "," <spaces> "de" <spaces> "type" <spaces> <type> "," <spaces> "valant" <spaces> <value>

<space> ::= "\t" | "\n" | "\r" | "\f" | "\v"

<spaces> ::= <space> | <space> <spaces>

<number> ::= 0 | 1 | 2 | 3 | ...

<signed_number> ::= "-" <number> | "~" <number> | "non" <spaces> <number>

<value> ::= <signed_number> | <number> | <bool> | <char_value>

<word_char> ::= <printable> | <word_symbol>

<printable> ::= "A" .. "Z"
              | "a" .. "z"
              | "0" .. "9"
             
<word_symbol> ::= "!" | "\"" | "#" | "$" | "%" | "&" | "'" | "*" | "+" | "-" | "/" 
           | ":" | "\;" | "<" | "=" | ">" | "?" | "@"
           | "[" | "\\" | "]" | "^" | "_" | "{" | "}" | "|" | "~"
         
<symbol> ::= <word_symbol> | <space> | <printable> | "," | "\t" | "\n" | "(" | ")"

<word> ::= <word_char> | <word_char> <word>

<char_value> ::= "'" <symbol> "'"

<bool> ::= "faux" | "fausse" | "vrai" | "vraie"

As the language is heavily based on the French language, it has shaped many of our syntax decisions :

We made it mandatory to put spaces where needed in case of punctiation, as it is a french grammar rule. This works for (’:’, ’.’, ’«’, ’»’, ’,’).
We decided to replace all operations symbols by their literal French equivalent as we wanted this language to use as many french words as possible in order to represent France. Therefore, operations such as + or * are being replaced by plus and multiplié par. This forces to have at least one space before and after such operations as they are now depicted as words.
As it can become easily overwhelming to type out every single operation using its own word representation, we created alternatives to some operations to make them shorter. For example, the XOR ^ operation can either be typed out as exclusif ou or xor. The multiply * operation can also either be multiplié par or fois.
We forced every file of our language to begin with Bonjour, (Hello in english) in order to fit with the French Culture as french citizens are widely known for their politeness. For the same reasons, while and if structures ends by Merci. (Thank you=in english) and the main function ends with =Merci d'avance, Cordialement, <Name>\n<FamilyName> (for Thank you in advance, Best regards, <Name>\n<FamilyName>) as this function is where the program stops.
Words used to describe instructions such as J'invoque (Invoke in english) or Affiche (Display in english) were what we thought was the best way to directly translate such common instructions in french.
As we wanted users to have a unique experience while using the language, instructions directly made by users are conjugated in the first person in order to simulate a conversation. Other instructions executed directly by the machine such as displaying are in imperative form in the second person.
Every single instructions ends with a . as every single sentence in french ends the same way.
while and if structures do not exactly share the same syntax in order to differentiate them properly : if takes a , at the end of the condition whereas while doesn’t.
Comparison are in the feminine form (supérieure, égale) since the French word variable is feminine.

3.3.1. Language logic

Header

A file in Franc C always starts with: “Bonjour,” followed by an empty line.

Politeness is very important, and we wanted Franc C to express a proper and classic language, not a crappy street language. We promise you will learn the “bien parlé” by using Franc C
Function Declaration

In order to declare a function, you must specify:
1. The name of the function
```
J'aimerais définir le bloc répondant au nom de <function_name>,
```
  where is the name of the function
2. The return type of the function
```
de type de retour entier naturel,
```
  for the function to return an integer.
```
de type de retour booléen,
```
  for the function to return a boolean.
```
de type de retour nul,
```
  if the function does not return anything. (void function)
3. The parameters of the function
  A parameter, in Franc C, is defined the same way as other programming languages : Those are variables that must be passed when the function is invoked for it to use them.
```
necéssitant comme entrée :
    - <variable_name1>, de type <type>
    - <variable_name2>, de type <type>
    ...
;
```
  You can include as many parameter as you want as long as they do not share the same name.
  - <variable_name1> and <variable_name2> are the name of the variables which are both different.
  - <type> is either booléen if the parameter is a boolean or entier naturel if it is an int.
  Parameters are obviously optional. If the function does not take any parameters, you must specify :
```
nécessitant comme entrée : ;
```
4. Variables
  Variables created and used inside a function, in Franc C, are defined in the function definition. Since Franc C is a very verbose language and uses a lot of words, creating variables at the top of the function makes it a very good way to clearly see them.
  
  Such variables are being declared the following way :
```
contenant les variables :
    - <variable_name1>, de type <type>, valant <value>
    - <variable_name2>, de type <type>, valant <value>
;
```
  Just like parameters, variables must not share the same name. A parameter and a variable can not share the same name either.
  - <variable_name1> and <variable_name2> are the name of the variables which are both different.
  - <type> is either booléen if the variable is a boolean or entier naturel if it is an int.
  - <value> is either a number or a boolean expression vrai vraie faux or fausse. It can also be a character value like \n (which will be 10).
  This forces every single created variable to have a default value.
  
  If the function does not use any variable, you must specify :
```
contenant les variables : ;
```
Function Body
A function declaration ends with
```
représenté par le code <logical_connector>.
```
- <logical_connector> is either ci-après, suivant, or ci-dessous.
What must follow this declaration is the body of the function, which is what will be executed when the function is invoked.

In Franc C, what can be executed inside a function is being refered as an instruction.

Those are the available instructions that you can execute inside a function :
1. Assignement
  You can, at any time, change the value of a variable the following way :
```
J'aimerais que <variable_name> prenne la valeur <computables>.
```
  or
```
<variable_name> prends la valeur <computables>.
```
  - <variable_name> represents the name of an existing well-defined variable of the function.
  - <computables> represents an expression composed of values with operators. (*Check the <condition> defintion on the BNF)
2. If
  Execute a block of code only if a specific condition was met. It can be defined the following way :
```
Si <computables>, exécute le texte :
    <more_instructions>
Merci.
```
  - <computables> represents an expression composed of values with operators. (*Check the <condition> defintion on the BNF)
  - <more_instructions> represents all the instructions that will be executed ONLY if the condition was met.
  - Merci. represents the end of the condition.
  You can also execute a block of code if the condition wasn’t met the following way :
```
Si <computables>, exécute le texte :
    <more_instructions>
; sinon, exécute le texte :
    <more_instructions_2>
Merci.
```
  - <computables> represents an expression composed of values with operators. (*Check the <condition> defintion on the BNF)
  - <more_instructions> represents all the instructions that will be executed ONLY if the condition was met.
  - <more_instructions_2> represents all the instructions that will be executed ONLY if the condition was not met.
  - Merci. represents the end of the condition.
3. Invoke
  Invoke another function inside a function, you can also store its result in a well-defined function variable.
```
J'invoque le bloc <function_name>.
```
  - <function_name> is the name of an existing defined function.
  If the function that is being invoked takes parameters, they must be specified the following way :
```
J'invoque le bloc <function_name>, avec les paramètres <computables>, <computables>, <computables>.
```
  - <function_name> is the name of an existing defined function.
  - <computables> represents an expression composed of values with operators. (*Check the <condition> defintion on the BNF)
  You must not give to a function more parameters than it takes.
  
  In order to store the result of a function, while still giving parameters :
```
J'invoque le bloc <function_name>, et j'assigne la valeur de retour à la variable <variable_name>, avec les paramètres <computables>, <computables>, <computables>.
```
  - <function_name> is the name of an existing defined function.
  - <variable_name> represents the name of an existing well-defined variable of the function.
  - <computables> represents an expression composed of values with operators. (*Check the <condition> defintion on the BNF)
  However, if you do not want to give any parameter :
```
J'invoque le bloc <function_name>, et j'assigne la valeur de retour à la variable <variable_name>.
```
  - <function_name> is the name of an existing defined function.
  - <variable_name> represents the name of an existing well-defined variable of the function
4. Loop
  The only type of loop that is currently implemented in Franc C are the while loops.
  
  A loop allows to execute a block of code until a condition cannot be met anymore.
```
Tant que <computables> exécute le code ci-après :
    <more_instructions>
Merci.
```
  - <computables> represents an expression composed of values with operators. (*Check the <condition> defintion on the BNF)
  - <more_instructions> represents all the instructions that will be executed ONLY if the condition was met.
  - Merci. represents the end of the loop.
5. Return
  During the execution of the function, you can return at any time a value. Returning a value will cause the function execution to stop.
  
  The only condition to return a value is that the function must not be of type “nul”.
```
Enfin, renvoie <computables>.
```
  - <computables> represents an expression composed of values with operators. (*Check the <condition> defintion on the BNF)
  If the function does not return anything (nul function), you can still stop the execution at any time.
```
Enfin, sors du bloc.
```
  This works only if the function is a nul function.
6. Show statements
  This instruction is used to display either a text, or the result of a computation in the screen.
  
  To display a computation :
```
Affiche <computables>.
```
  - <computables> represents an expression composed of values with operators. (*Check the <condition> defintion on the BNF)
  This will display the character corresponding to the result on the screen. (E.g): Affiche 10. will display \n.
  
  However, if you want to display a text :
```
Affiche « Meow ».
```
  This will display the text Meow on the screen.
End of Function

You must always end your Function Body with a Merci..
Special case of ’Main’ Function
The main function is where the program starts. Therefore, its defintion must be clearly different from the other functions.

Such functions are being defined the following way :
```
Bonjour,

En sachant que les variables principales sont :
    <variables>
; pourrais-tu s'il te plaît commencer la lecture ici ?
    <instructions>
Merci d'avance,
Cordialement,
<family_name>
<name>
```
- <variables> is an optional field, and represents the variables of the function. They are declared the same way as a normal function.
- <instructions> represents all the instructions that will be executed.
- Merci d'avance, Cordialement, <family_name> <name>. represents the end of the main function
- <family_name> represents the developer’s last name, and must always end with a newline character (’\n’).
- <name> represents the developer’s first name, and must also end with a newline character (’\n’).
A main function does not take any parameter. Therefore, no parameter definition must be done. The value being returned in the main function represent the exit status of the program. If no value was being returned, the program exists by default with a status code of 0.

4. Bytecode reference

The bytecode starts with a magic number, followed by an address describing the placement of the PC at the start of the program, followed by the program itself.

4.1. Formal syntax (BNF)

<bytecode> ::= <magic_number> <adress> <operation_list>

<magic_number> ::= "E\fE\f" ; 0x45 0xc 0x45 0xc

<adress> ::= <byte> <byte> <byte> <byte> <byte> <byte> <byte> <byte>

<operation_list> ::= <operation> | <operation>  <operation_list>

<operation> ::= <not>
              | <lnot>
              | <neg>
              | <and>
              | <or>
              | <xor>
              | <shl>
              | <shr>
              | <land>
              | <lor>
              | <add>
              | <sub>
              | <mult>
              | <div>
              | <mod>
              | <gt>
              | <ge>
              | <lt>
              | <le>
              | <eq>
              | <diff>
              | <updz>
              | <push_stack>
              | <pop_stack>
              | <pop>
              | <call>
              | <ret>
              | <jmp>
              | <zjmp>
              | <aff>
              | <affs>
              | <push_value>
              | <push_label>
             
             
<not> ::= "\003"

<lnot> ::= "\004"

<neg> ::= "\005"

<and> ::= "\006"

<or> ::= "\a"

<land> ::= "\b"

<lor> ::= "\t"

<xor> ::= "\n"

<shl> ::= "\v"

<shr> ::= "\f"

<add> ::= "\r"

<sub> ::= "\016"

<mult> ::= "\017"

<div> ::= "\020"

<mod> ::= "\021"

<gt> ::= "\022"

<ge> ::= "\023"

<lt> ::= "\024"

<le> ::= "\025"

<eq> ::= "\026"

<diff> ::= "\027"

<updz> ::= "\030"

<push_stack> ::= "\035" <arguments>

<pop_stack> ::= "\036" <arguments>

<pop> ::= "\037"

<call> ::= """

<ret> ::= "#"

<jmp> ::= "$"

<zjmp> ::= "%"

<aff> ::= "&"

<affs> ::= "'" <arguments> <string>

<push_value> ::= "Y" <arguments>

<push_label> ::= "[" <arguments>

<arguments> ::= <byte> <byte> <byte> <byte> <byte> <byte> <byte> <byte>

<string> ::= <byte> | <byte> <string>

<byte > =  "\0" .. "ÿ"

4.2. Instructions

Since the virtual machine is stack-based, every instruction representing an operation will pop it’s operands, and push the result to the stack.

mnemonic	description	bnf object
not	Not, invert the bits of the operand	not
lnot	Logical not, invert the boolean interpretation of the operand	lnot
neg	Negate, switch the sign of the operand	neg
and	And, performs a binary AND operation between two operands	and
or	Or, performs a binary OR operation between two operands	or
xor	Exclusive or, performs a binary XOR operation between two operands	xor
shl	Left bitshift, shifts the bits of the operand to the left, appending zeroes	shl
shr	Right bitshift, shifts the bits of the operand to the right, appending zeroes	shr
land	Logical and, boolean interpretation of the logical AND on two operands	land
lor	Logical or, boolean interpretation of the logical OR on two operands	lor
add	Addition, adds the value of two operands	add
sub	Substraction, subtract the value of two operands	sub
mult	Multiplication, multiplies the value of two operands	mult
div	Division, return the dividend of the euclidian division of operand 1 by operand 2	div
mod	Modulo, return the rest of the euclidian division of operand 1 by operand 2	mod
gt	Greater than, compares the two operands and return a result for boolean interpretation	gt
ge	Greater than or equals to, compares the two operands and return a result for boolean interpretation	ge
lt	Less than, compares the two operands and return a result for boolean interpretation	lt
le	Less than or equals to, compares the two operands and return a result for boolean interpretation	le
eq	Equals, compares the two operands and return a result for boolean interpretation	eq
neq	Different from, compares the two operands and return a result for boolean interpretation	diff
updz	Update zero flag, pops the value from the stack and set it as the new zero flag	updz
push	Push value, push a 64 bits value to the stack	push_value
push	Push relative address, push a 64 bits relative address to the stack, corresponding generally to a label	push_label
push	Push from stack pointer relative address, push to the stack the 8 bytes found at the stack pointer + the address given as operand	push_stack
pop	Pop empty, pop 8 bytes from the stack	pop
pop	Pop to stack pointer relative address, pop 8 bytes from the stack, and write them to the stack pointer + the address given as operand	pop_stack
call	Call, op 8 bytes from the stack, register the new stack pointer, push the PC and stack pointer to the callstack, and jump to the value of the 8 bytes popped from the stack	call
ret	Return, reset the stack to the stack pointer, pop the callstack and jump to the address from the callstack pop	ret
jmp	Jump, pop an address from the stack, and set the PC to it	jmp
zjmp	Zero jump, pop an address from the stack, and if the zero flag is set, set the PC to it	zjmp
aff	Show, pop 8 bytes from the stack, apply a modulo 256 to the result, and write a single ascii character corresponding to the result	aff
affs	Show string, the first operand is a 64 bits integer corresponding to the number of caracters, that we will call n. The next n bytes are the ascii values of the characters that will be wrote	affs

5. Virtual Machine reference

5.1. Introduction

When defining an assembly language for the Franc C binary, we focused on the ease of development of the compiler. We just had one requirement that allowed an easy development of the virtual machine: our assembly shall be stack-based, drawing inspiration from the Java Virtual Machine (JVM).

5.1.1. Stack-Based vs Register-Based

Aspect	Stack-Based (fcvm)	Register-Based (e.g., Lua VM)
Storage	Operand stack (LIFO)	Virtual registers
Instructions	Compact, implicit operands, prone to errors	Explicit operand addressing
Complexity	Simple to implement	More complex logic
Bytecode Size	Bigger overall (smaller instructions but way more quantiry)	Smaller overall

We chose a stack-based approach for the following reasons:

Simpler bytecode generation: The compiler does not have to deal with registers, knowing it does not have the capacity to remember data while processing multiple operations.
Monotyped language: The fact that the Franc C is an integer only language allows for fixed-size parameter encoding. One of the main advantages of the register-based approach is to allow for type encoding.

5.2. Stack

The stack shoud be implemented as a contiguous array of bytes with the following characteristics:

Data type: Array of bytes
Element size: 8 bytes per element (64-bit signed integers)
Alignment: All operations push/pop exactly 8 bytes
Stack size: Always a multiple of 8 bytes

Stack Management

To perform function calls, the VM must maintains the stack pointer. It points to the start of the current stack local context. it is modified by the instructions call and ret.

5.2.1. Initialization

By default, the stack should be initialized with 8 bytes representing the return value of the program. The default value is implementation-defined.

5.2.2. Basic Operations

Operation	Description	Stack Effect
PUSH	Push 8 bytes onto the stack	SP unchanged, Counter += 1
POP	Remove 8 bytes from the stack	SP unchanged, Counter -= 1

5.3. CallStack

The callstack manages function calls and returns, enabling the VM to: - Navigate between functions - Restore execution context after function returns - Maintain function-local state - Detect program termination

5.3.1. Memory Layout

The callstack can be implemented as a contiguous array of frame records:
By default, the callstack can be initialized as an empty array.

Example implementation : - Data type: Array of tuples [(PC, SP)] - PC (Program Counter): Return address (next instruction after CALL) - SP (Stack Pointer): Stack state at call time

Frame	PC	SP	Description
0	0	0	Entry point
1	12	16	First function call
2	57	24	Current function locale start

5.3.2. Function call mechanism

When executing call <address>:

Save current context: Push (PC + 1, SP) onto the call stack
- PC + 1: Address of the instruction following CALL
- SP: Current stack pointer (preserves caller’s stack)
Jump to function: Set PC = <address>

When executing Ret:

Pop frame: Retrieve (saved_PC, saved_SP) from call stack
Restore context:
- Set PC = saved_PC (return to caller)
- Set SP = saved_SP (restore stack state)
Check termination: If call stack is empty after pop, terminate program

5.4. Zero flag

The Zero Flag is a single-bit register used for conditional branching and control flow decisions.
It stores the result of comparison and test operations,
enabling the implementation of if and while.

Data type: unsigned char
Default value: 1

The Zero Flag follows this logic:

Condition	Zero Flag Value
!= 0	`1` (true)
== 0	`0` (false)

5.5. End cases

The program terminates normally when: - A Ret instruction is executed from the main function - The call stack becomes empty after the return

Version : 1.2
Last update : Novembre 2025
Author : Acacademie Franc C’aise

6. Franc C Compiler

The Franc C Compiler (FCC) is a toolchain that allows the compilation of one or multiple Franc C Reference code (.fr) files into a single bytecode executable by the FCVM program.

6.1. Parameters list and long descriptions

fcc FILESNAME [-o|--output FILENAME] [-n|--no-larousse] [-d|--debug]

6.1.1. FILESNAME

The name of the input files. They need to be valid .fr source files.

6.1.2. –output FILENAME

The name of the output bytecode file. Only considered without the –debug flag.

Without the presence of this flag, the default output file is a.fcp.

6.1.3. –no-larousse

By default, the FCC program compiles with the Larousse sources included. See the larousse section.

This flag allows compilation without any files from the Larousse library.

6.1.4. –debug

This flags allows a debug output. Instead of writing the bytecode to a file, a debug view of the assembly is wrote to the standard output (/dev/stdout on most GNU/linux machines).

6.2. Larousse

The Larousse is the standard library who accompanies the FCC program. If the Larousse is to be included, the fcc program will search a folder named larousse, and then list in all subdirectories of level-1 the .fr source files. Those found source files will be added to the input list, and compiled.

6.3. Functioning

The FCC program works with different steps.

6.3.1. Getting inputs

The inputs are retrived via the Larousse library and the user-provided files.

6.3.2. Lexing and parsing

Each file is lexed into a list of token, then parsed into an Abstract Syntax Tree (AST).

See the parsing reference for more information.

This step may fail. Each file being converted one after the other, if one fails, the remaining ones are not even considered and an error is returned.

6.3.3. Combination

Each AST representing each files given for compilation is then combined.

In this step, the presence of a single principal function is verified, and if no or multiple are found, an error is returned.

The result is a CAST (Combined Abstract Syntax Tree)

6.3.4. Compilation

The compilation step is where the CAST is converted to a Context datastructure, linked with an Assembly datastructure. It is not yet bytecode, just a structure representing the final output.

The context structure is important for the next step.

This step is really linear, there is no acknowledgement of the rest of the program when compiling a subpart.

6.3.5. Verifications

This step ensures some basic verifications.

It takes the context from the previous step, and analyses it to find any flaws.

For example, we check here if a function is invoked whilst it’s reference is non-existant.

6.3.6. Output

This step is the final one.

If the --debug flag is given, it will take the structured assembly and write it down just as it is stored.

Elsewise, it is converted to bytecode in really two small steps. The first one is to convert labels to relative positions. The second one is to convert everything to it’s bytecode relation.

Finaly, the resulted bytecode is written to the designated output file.

6.3.7. Justifications

Regarding the lexing and parsing part, we decided so separate those to allow for easier syntaxic sugar implementation and code readability.

The combination part was added to allow for multiple input files to be grouped together before the compilation process. It is easier to do it like this than to try and combine multiple outputed bytecodes together, like the ld program does when compiling a C program.

The compilation is linear, and therefore, we had to separate the verification process from this step, to allow for a really easy-to-read code. Having a linear compilation reduces the capacity of complex program logic, but since the Franc C is a really simple imperative language, it poses no problems to do it this way. We simply store some context in a datastructure for the verification step to look in a global point of view, instead of the linearly reduced point of view of the compilation.

Finaly, we send the assembly to out output step, that is in charge of converting it to the final bytecode, or to write it as a debug output.

6.4. Lexer & Parsing reference

We have chosen Haskell to implement the lexing and parsing of the language, as functional languages like Haskell make it particularly easy to express, abstract, and combine parsers. This allows the Franc C to be very verbose, which was definitely needed.

As we wanted to speed things up and release a first version as quickly as possible, we decided to use a pre-existing parsing library Megaparsec. The library offers fast parsing performance, is an improved successor of Parsec, and provides high-quality error messages that greatly simplify debugging and development.

The Lexer and Parser both refers to two different parts:

The Lexer, which uses Megaparsec reads the source code’s syntax, checks that it follows the correct grammar, then converts the text into a list of tokens that the parser can later use. For example, the code Si x est égale à 0, exécute le texte would be transformed into the list as [If, Symbol "x", Operation Equals, Number 0, Then]
The Parser, on the other hand, takes this list of tokens to transform it into an Abstract Syntax Tree (AST). It uses Haskell’s recursive principles to build that tree so that the compiler can later use it to generate the final executable library.

The Parser is only executed if the Lexer did not fail to lex the targetted source code.

6.4.1. Safety measures

The safety of the lexing process is ensured by Megaparsec, which provides robust error handling and syntax checking. During the lexing part, Megaparsec ensures the input strictly follows the defined grammar. If the syntax is incorrect, the lexer fails gracefully by producing a clear and descriptive error message, preventing the program from continuing with an invalid input. If the syntax is valid, the lexer successfully returns a well-structured list of tokens that can safely be passed to the parser.

Since the Lexer always produces a clean and valid list of tokens, the Parser does not include any additional safety checks on its input. However, it makes sure that no more than one main function was defined.

6.4.2. Error list

Grammar Error : Triggered when Megaparsec fails to lex or parse the source code due to invalid syntax.
Double Main Definition : Raised when more than one main function was defined in the program.

6.4.3. Known caveats

Lexing Error Printing : We are aware that Megaparsec does not always display clear or accurate error messages in some cases. This issue is due to our current lexing implementation rather than Megaparsec itself, and should be fixed soon or later.
Parsing Input : The parser currently does not validate its input and fully relies on the lexer to always produce correct and consistent tokens. While this works for now, it could lead to potential issues in the future.

6.5. Compilation reference

We are here referring to compilation as the compilation and the verification step. Thus, in this section, the compilation step will refer to both of these steps.

Given the simple nature of the logic of the Franc C language, we chose here to implement a simple compilation.

The compilation step do not use any external dependencies.

We simply store a context that is updated throughout the compilation process. This context contains :

A list of variables, with their name, size, and position relative to the stack pointer.
A label counter, used to suffix labels with a number to distinguish them.
A list of function context, representing each definitions.
A list of function context, representing each invoke.

The function context contains :

The name of the function.
A boolean value, used to describe if this function is either returning something or of the void type ; or describes if this functions is either assigning a value when invoking or not.
A parameter count, that indicates the number of parameters in this function.

The fact that the entire language is only integer-based facilitates the concept of variable storage and parameter types. No types appear in the compilation process, except the void type for function definitions and invoke.

6.5.1. Function call parameters, return value and variables

When a function is called, it have parameters, a return value and some variables.

Here is a map of how they are stored in the stack:

┌─────────────────┐
│  Caller frame   │
├─────────────────┤
│  Parameters     │  ← PUSHREL/POPREL access
├─────────────────┤
│  Return Value   │  ← PUSHREL/POPREL access
├─────────────────┤  ← SP
│  Local vars     │  ← PUSHREL/POPREL access
├─────────────────┤
│  Temporaries    │  ← Stack top
└─────────────────┘

Local variables: Accessed via SP-relative offsets - PUSHREL 0: First local - PUSHREL 8: Second local (8-byte offset) - POPREL N: Store to local at offset N

Return value and paramers: Accessed via SP-relative offsets - PUSHREL -8: Return value - PUSHREL -16: First parameter - PUSHREL -24: Second parameter

6.5.2. Safety measures

During the compilation step, we check for user errors :

No main function is given.
Multiple main functions are given.
A function is defined with the name of an already defined function.
A function defines a parameter, or a variable, with the name of an already existing, local to the function, parameter or variable.
A function that is not referenced is invoked.
A function that takes n parameters is invoked with more or less than n parameters.
A return value is given to a function of type void.
An empty return is given to a function of type non-void.
An assignement of the return of a void-typed function is made.

6.5.3. Intended undefined behaviours

Integer over/underflows

We do not devine the result of an integer overflow, or an integer underflow. Our integers are defined solely as signed integers on 64 bits.

The minimum integer is defined as the maximum integer negated. In the C programming language, it is defined as being one less, but in our implementation of the negate sign “-”, we chose for parsing and lexing simplicity to use the negate solely as an operation. Take the example “-5”, we will not register it as the -5 number, but as the number 5 on whom we apply the negate operation. This explains the fact that our minimum integer is defined as the negated maximum integer.

The maximum integer is defined as 2⁶³ - 1, or 9’223’372’036’854’775’807.
Known caveats
1. Lack of static analysis
  The FCC compilation step do not implement any static analysis.
  
  These are the direct cause of some – sadly – not implemented behavious:
  - We cannot check for direct division by zero. For example: (10 divisé par 0) will not result in an error at compile-time.
  - We cannot produce pre-computations. For example: (10 plus 10) will not result in the value 20 at compile-time, but in the computation of 10 plus 10 at run-time. It is highly inefficient.

6.6. Readable assembly

When the --debug flag is passed, the output of the program is a readable assembly written to the standard output.

6.6.1. Syntax of the readable assembly

There are two cases in the syntax :

label, they are written in the left margin, like so: label_name:
instructions, they are written tabulated with four spaces, like so: mnemonic [args]

6.6.2. List of instructions

Operation	Parameter	Representation
label	name	`name:`
binary not		`not`
binary and		`and`
binary or		`or`
logical not		`lnot`
logical and		`land`
logical or		`lor`
exclusive or		`xor`
left bitshift		`shl`
right bitshift		`shr`
addition		`add`
substraction		`sub`
multiplication		`mult`
division		`div`
modulo		`mod`
comparison greater than		`gt`
comparison greater or equal to		`ge`
comparison less than		`lt`
comparison less or equal to		`le`
comparison equals		`eq`
comparison different to		`neq`
update zero flag		`updz`
push value	value	`push value`
push relative address	address	`push [address]`
push label relative address	label-name	`push %label-name`
push from stack pointer relative address	address	`push @address`
pop to stack pointer relative address	address	`pop @address`
pop		`pop`
invoke a function		`call`
return		`ret`
jump to address		`jmp`
jump to address if zero flag		`zjmp`
print character		`aff`
print string	string	`affs "string"`

7. Franc C Virtual Machine

7.1. Input Descriptions

7.1.1. Bytecode File Format

The fcvm executes Franc C bytecode files.

They have the following header structure:

┌─────────────────────────────────────┐
│ Magic Number (4 bytes)              │  0x45 0x0C 0x45 0x0C
├─────────────────────────────────────┤
│ Bytecode Instructions               │  Variable length
└─────────────────────────────────────┘

Magic Number: 0x45 0x0C 0x45 0x0C (4 bytes), as defined in the Franc C bytecode reference. - Validates file format at VM initialization - Invalid magic number triggers immediate error

7.1.2. Command-Line Interface

./fcvm <bytecode_file>

Exit Codes:

Code	Description
`0`	Success (or custom return value = 0)
`1-255`	Custom exit code (return value mod 256)
`84`	Any error (runtime or load-time)

7.2. Architecture Implementation

7.2.1. Memory Layout

The VM maintains three core structures:

The operand stack
The callstack
Three specific registers

Operand Stack

Stores temporary computation values, variables and function return values.

┌─────────────┬─────────────┬─────────────┐
│  Element 0  │  Element 1  │  Element 2  │
│  (8 bytes)  │  (8 bytes)  │  (8 bytes)  │
└─────────────┴─────────────┴─────────────┘
      ↑
    Stack Top

Properties: - Type: [Word8] (list of bytes) - Element size: 8 bytes (64-bit) - Initial state: One zero element [0x00 × 8] - Max size: 1,000,000 elements (8 MB) - Operations: PUSH (prepend), POP (drop first 8 bytes)

Call Stack

Manages function call frames.

┌─────────────────────────────────┐
│  Frame: (PC: 0, SP: 0)          │  ← Entry
├─────────────────────────────────┤
│  Frame: (PC: 12, SP: 16)        │
├─────────────────────────────────┤
│  Frame: (PC: 57, SP: 24)        │  ← Current
└─────────────────────────────────┘

Properties: - Type: [(PC, SP)] (list of tuples) - Frame content: Return address (PC) + Stack pointer (SP) - Initial state: Empty [] - Max depth: 1,000,000 frames

Behavior: - CALL: Push (PC+1, SP), jump to function - RET: Pop frame, restore context, trim stack - Termination: Empty call stack after RET → program ends

Registers

Program Counter (PC) - Points to current instruction in bytecode - Initial: 8 (after 4-byte magic number) - Updates: +1 (normal), +9 (with data), or jump target

Stack Pointer (SP) - Marks base of current function’s stack frame - Used for local variable access via PUSHREL=/=POPREL - Modified during CALL and RET

Zero Flag (ZF) - Type: Word8 (0 or 1) - Initial: 1 (true) - Semantics: 0 if stack top = 0, else 1 - Set by ZFLAG, tested by ZJMP

7.2.2. Instruction Set Overview

The VM implements 40 opcodes across 7 categories:

Bytecode Definition.

7.2.3. Function Call Convention

Calling sequence:

1. Push function address
2. Execute CALL
   → Saves (PC+1, SP) to call stack
   → Jumps to function

Return sequence:

1. Execute RET
   → Restores PC and SP
   → Trims stack to SP position
   → If call stack empty: program terminates

Stack frame layout:

┌──────────────────┐
│  Caller frame    │
├──────────────────┤  ← SP
│  Local function  │  ← Stack top
└──────────────────┘

7.3. Exceptions List

All errors exit with code 84 and write to stderr.

7.3.1. Load-Time Errors

Invalid Magic Number

Message: *** UNRECONIZED FILE FORMAT
Trigger: First 4 bytes ≠ 0x45 0x0C 0x45 0x0C

File I/O Error

Message: Error with file
Trigger: File not found, permission denied, or read error

7.3.2. Runtime Errors

Arithmetic Errors

Message: *** 0 CAN'T BE USE in this operation
Trigger: DIV or MOD with divisor = 0

Stack & CallStack Overflow

Message: *** STACK OVERFLOW
Trigger: Operand stack ≥ 1M elements OR call stack ≥ 1M frames
Cause:   Excessive PUSH without POP, or infinite recursion

Stack Underflow

Message: *** STACK ERROR
Trigger: Operation requires elements when stack_size ≤ 1
Affects: All binary/unary ops, CALL, JMP, ZJMP, AFF

7.4. Known Caveats

7.4.1. Undefined Behaviors

Integer Overflow - 64-bit signed integers wrap silently - Example: INT64_MAX + 1 → UNDEFINED - No error raised

Invalid Opcodes - Treated as NOP (PC += 1) - Corrupted bytecode may execute partially - No detection mechanism

7.4.2. Performance Limitations

List-Based Stack - Push/Pop: O(1) ✓ - SP-relative access: O(n) ✗ - Impact: Slow for deep stacks (>1000 elements)

String Operations - AFFS creates intermediate structures - Linear in string length (acceptable)

7.4.3. Portability Issues

Platform Word Size - PC and SP use Haskell Int type - 32-bit systems: Max ~2GB bytecode - 64-bit systems: Max ~8GB bytecode - Not a practical concern for typical use

Document Version: 1.1
Last Updated: November 2025
Author : Acacademie Franc C’aise

8. Larousse

8.1. Introduction (motives)

As programming using Franc C can easily become overwhelming, we decided to provide a set of already implemented functions, in a library called Larousse, as a way to facilitate some tasks.

Larousse also serves as a way to showcase our language’s capabilities, and demonstrate its verbose syntax, and acts as a test suite for the Académie Franc C'Aise, which runs functional tests on it.

8.2. Functions

Functions	Parameters	Description	Benchmark*
`additionner`	`nombre`, `second_nombre`	Returns `nombre + second_nombre`.	0.086s
`afficher_nombre`	`nombre`	Displays in the standard output the number `nombre`.	0.067s
`carré_de`	`nombre`	Returns `nombre * nombre`.	0.074s
`diviser`	`nombre`, `second_nombre`	Returns `nombre / second_nombre`.	0.060s
`douze`	`nombre`	Returns `nombre / 12` if `nombre` is multiple of 12, otherwise `-1`.	0.069s
`est_en_majuscule`	`nombre`	Returns `1` if `nombre` is an uppercase letter, otherwise `0`.	0.063s
`est_en_minuscule`	`nombre`	Returns `1` if `nombre` is a lowercase letter, otherwise `0`.	0.060s
`est_négatif`	`nombre`	Returns `1` if `nombre` is negative, otherwise `0`.	0.070s
`est_premier`	`nombre`	Returns `1` if `nombre` is a prime number, otherwise `0`.	0.051s
`fizzbuzz`	`nombre`	Displays « Fizz » if `nombre` is multiple of 3, « Buzz » if multiple of 5, and « FizzBuzz » if multiple of both, otherwise displays `nombre`.	0.056s
`imprimer_peigne`	-	Displays in ascending order all the smallest numbers composed of three digits.	0.220s
`imprimer_peigne_2`	-	Displays in ascending order all the smallest numbers composed of four digits.	8.481s
`imprimer_peigne_nombre`	`nombre`	Displays in ascending order all the smallest numbers composed of `nombre` digits.	Depends
`max`	`nombre`, `second_nombre`	Returns `nombre` if `nombre` is superior than `second_nombre`, otherwise `second_nombre`.	0.060s
`min`	`nombre`, `second_nombre`	Returns `second_nombre` if `nombre` is superior than `second_nombre`, otherwise `nombre`.	0.092s
`multiplier`	`nombre`, `second_nombre`	Returns `nombre * second_nombre`.	0.065s
`prochain_premier`	`nombre`	Returns the first prime number greater than `nombre`.	0.072s
`précedent`	`nombre`	Returns `nombre - 1`.	0.069s
`racine_carré_de`	`nombre`	Returns the integer square root of `nombre`.	0.070s
`soustraire`	`nombre`, `second_nombre`	Returns `nombre - second_nombre`.	0.090s
`suivant`	`nombre`	Returns `nombre + 1`.	0.065s

(*) Benchmark specs:

CPU : Intel i9 14900K
GPU : NVIDIA RTX 4070 TI SUPER
RAM: 32 GB

8.3. Académie Franc C’Aise

Académie Franc C'Aise is the functional tester made in Python for the Franc C, and uses Larousse standard library.

You will find in every single Larousse source code function: - The same function written in C. - A main function in Franc C which calls the function in Franc C with various parameter, in the test folder. - A main function in C which calls the function in C with the same parameters, in the test folder.

Académie Franc C'Aise compiles both the C and Franc C version using gcc and fcvm. It then runs both versions, and check whether their inputs are the same as a way to test the language. It also displays how much time each step took (Compilation, Execution) for both languages then finally shows the time difference between those.

Franc C is in most cases faster to compile than C. However, C is almost always faster in execution time than Franc C.

At the end of the program, Académie Franc C'Aise displays how many tests passed out of how many were run. It then exits with a failure if not all tests passed.