Lexical structure
Tokens
Each source file is divided into tokens, starting from the beginning of the file.
Identifiers (id) are case sensitive. Some of them are reserved; see section Reserved words below.
id :: alpha alphanum+ alpha :: "a".."z" | "A".."Z" | "_" alphanum :: alpha | digit digit :: "0".."9"
Numeric literals (int and float) are entered in base 10. Floating point literals can optionally have a fractional part, separated with a dot, and an integer exponent, separated with the letter e. If the exponent is present, the numeric value before the exponent is multiplied by 10**e, where e is the numeric value of the exponent.
int :: digit+ float :: digit+ exponent | digit* "." digit+ [ exponent ] exponent :: ("e" | "E") ["+" | "-"] digit+
String literals (str) are entered within double quotes. The surrounding quotes are not part of the string value. Literal double quotes can be entered in string literals by duplicating them.
str :: <"> (<any character except ", CR or LF> | <"> <">)* <">
Various non-alphanumeric operator and punctuator tokens are defined:
opsym :: "+" | "-" | "*" | "/" | "**" | ":" | "==" | "!=" | "<" | ">" | ">=" | "<=" punct :: "(" | ")" | "[" | "]" | "," | "=" | "+=" | "-=" | "*=" | "/=" | "**=" | "::"
Newlines and semicolons can be used as statement separators (br). They are interchangeable. Repeated statement separators behave identically to a single statement separator.
br :: (newline | ";")+ newline :: <CR> <LF> | <LF> | <CR>
Whitespace and comments are ignored before and after tokens. Whitespace characters are optional, except between a token ending with an alphanumeric character and another token starting with an alphanumeric character, in which case they are required. Finally, there must be no whitespace characters before the initial-comment and utf8-bom tokens.
whitespace :: " " | <TAB> comment :: "--" <any character except CR or LF>*
An initial source line starting with #! is interpreted as a comment:
initial-comment :: "#!" <any character except CR or LF>*
The special utf8-bom token may be present at the start of UTF-8 encoded files:
utf8-bom :: <EF> <BB> <BF>
Joining lines
Newlines after the following tokens are interpreted as whitespace, not as statement separators:
+ - * / ** div mod and or : to is == != < <= > >= ( [ = += -= *= /= **= ,
This can be used to divide long lines into multiple shorter lines.
Reserved words
The following words are reserved and cannot be used as identifiers (i.e. as names of global or local definitions, as member names or as module name components):
and bind break case class const def div dynamic elif else encoding end except finally for if implements import in interface is mod module nil not or private raise repeat return self super switch to try until var while
Restricted names
Module name components and names of global definitions starting with two underscores (__) are reserved for internal use by the implementation. The implementation may freely define such names for any purpose, but user programs should not depend on their presence or absence to remain portable with different Alore implementations.
Additionally, it is recommended that the following names not be used as the first component of a module name, since they are reserved for use in future releases of Alore:
compiler crypt email ftp ftpserver httpserver locale postgres process queue serialize smtp sqlite ssl stack timezone udp unicode xml xmltree
It is likely that some of these names will never be used in any future Alore release. Future Alore releases may remove some names from this list; these changes are retroactively applied to all earlier Alore versions as well.
Encoding
Alore source files may be encoded in ASCII, UTF-8 or ISO-8859-1 (Latin 1). See section Encoding declaration for information on specifying the encoding of a source file.
All 7-bit character codes except CR and LF (10 and 13, respectively) can be used in comments and string literals, including null characters, independent of the source file encoding. Double quotes, however, must be doubled within string literals.
In an ISO-8859-1 encoded source file, all character codes in range from 128 to 255, inclusive, can be used in comments and string literals. Similarly in a UTF-8 encoded source file, all valid UTF-8 sequences for code points between 128 and 65535, inclusive, can be used in comments and string literals.