Lexical structure
Tokens
Each source file is divided into tokens, starting from the beginning of the file.
Identifiers (id) are case sensitive. Some of them are reserved; see section Reserved words below.
id :: alpha alphanum+ alpha :: "a".."z" | "A".."Z" | "_" alphanum :: alpha | digit digit :: "0".."9"
Numeric literals (int and float) are entered in base 10. Floating point literals can optionally have a fractional part, separated with a dot, and an integer exponent, separated with the letter e. If the exponent is present, the numeric value before the exponent is multiplied by 10**e, where e is the numeric value of the exponent.
int :: digit+ float :: digit+ exponent | digit* "." digit+ [ exponent ] exponent :: ("e" | "E") ["+" | "-"] digit+
String literals (str) are entered within single or double quotes. The surrounding quotes are not part of the string value. Literal double quotes in double-quoted strings and literal single quotes in single-quoted strings must be duplicated.
str :: <"> (<any character except ", CR or LF> | <"> <">)* <"> | <'> (<any character except ', CR or LF> | <'> <'>)* <'>
A sequence of form \uHHHH within a string literal, where each H is a hexadecimal digit (0..9, a..f or A..F), is mapped to the character code represented by the hexadecimal number. Backslash characters are not special within string literals unless immediately followed by "u" and a 4-digit hexadecimal number.
Various non-alphanumeric operator and punctuator tokens are defined:
opsym :: "+" | "-" | "*" | "/" | "**" | ":" | "==" | "!=" | "<" | ">" | ">=" | "<=" punct :: "(" | ")" | "[" | "]" | "," | "=" | "+=" | "-=" | "*=" | "/=" | "**=" | "::"
Newlines and semicolons can be used as statement separators (br). They are interchangeable. Repeated statement separators behave identically to a single statement separator.
br :: (newline | ";")+ newline :: <CR> <LF> | <LF> | <CR>
Whitespace and comments are ignored before and after tokens. Whitespace characters are optional, except between a token ending with an alphanumeric character and another token starting with an alphanumeric character, in which case they are required. Finally, there must be no whitespace characters before the initial-comment and utf8-bom tokens.
whitespace :: " " | <TAB> comment :: "--" <any character except CR or LF>*
An initial source line starting with #! is interpreted as a comment:
initial-comment :: "#!" <any character except CR or LF>*
The special utf8-bom token may be present at the start of UTF-8 encoded files:
utf8-bom :: <EF> <BB> <BF>
Joining lines
Newlines after the following tokens are interpreted as whitespace, not as statement separators:
+ - * / ** div mod and or : to is == != < <= >= ( [ = += -= *= /= **= ,
This can be used to divide long lines into multiple shorter lines.
A br token after a > token is ignored in expressions, but not in other contexts. This rule allows a type annotation to end with a '>' token, even when followed by a br token.
Reserved words
The following words are reserved and cannot be used as identifiers (i.e. as names of global or local definitions, as member names or as module name components):
and as bind break case class const def div dynamic elif else encoding end except finally for if implements import in interface is mod module nil not or private raise repeat return self super switch to try until var while
Restricted names
Module name components and names of global definitions starting with two underscores (__) are reserved for internal use by the implementation. The implementation may freely define such names for any purpose, but user programs should not depend on their presence or absence to remain portable with different Alore implementations.
Additionally, it is recommended that the following names not be used as the first component of a module name, since they are reserved for use in future releases of Alore:
alore argparse compiler crypt csv email fileutil ftp getpass httpserver json locale logging process queue readline serialize smtp sqlite ssl stack subprocess tempfile timezone traceback udp unicode url xml xmlrpc xmltree
It is likely that some of these names will never be used in any future Alore release. Future Alore releases may remove some names from this list; these changes are retroactively applied to all earlier Alore versions as well.
Source file encodings
Alore source files may be encoded in ASCII, UTF-8 or ISO-8859-1 (Latin 1). See section Encoding declaration for information on specifying the encoding of a source file.
All 7-bit character codes except CR and LF (10 and 13, respectively) can be used in comments and string literals, including null characters, independent of the source file encoding. Quotes, however, may have to be doubled within string literals.
In an ISO-8859-1 encoded source file, all character codes in range from 128 to 255, inclusive, can be used in comments and string literals. Similarly in a UTF-8 encoded source file, all valid UTF-8 sequences for code points between 128 and 65535, inclusive, can be used in comments and string literals. Any character code between 0 and 65535 can be entered in a string literal using the \uHHHH form, independent of the source file encoding.