Lexical structure

Tokens

Each source file is divided into tokens, starting from the beginning of the file.

Identifiers (id) are case sensitive. Some of them are reserved; see section Reserved words below.

id :: alpha alphanum+
alpha :: "a".."z" | "A".."Z" | "_"
alphanum :: alpha | digit
digit :: "0".."9"

Numeric literals (int and float) are entered in base 10. Floating point literals can optionally have a fractional part, separated with a dot, and an integer exponent, separated with the letter e. If the exponent is present, the numeric value before the exponent is multiplied by 10**e, where e is the numeric value of the exponent.

int :: digit+
float :: digit+ exponent | digit* "." digit+ [ exponent ]
exponent :: ("e" | "E") ["+" | "-"] digit+

String literals (str) are entered within double quotes. The surrounding quotes are not part of the string value. Literal double quotes can be entered in string literals by duplicating them.

str :: <"> (<any character except ", CR or LF> | <"> <">)* <">

Various non-alphanumeric operator and punctuator tokens are defined:

opsym :: "+" | "-" | "*" | "/" | "**" | ":" | "==" | "!=" | "<" | ">" | ">=" | "<="
punct :: "(" | ")" | "[" | "]" | "," | "=" | "+=" | "-=" | "*=" | "/=" | "**=" | "::"

Newlines and semicolons can be used as statement separators (br). They are interchangeable. Repeated statement separators behave identically to a single statement separator.

br :: (newline | ";")+
newline :: <CR> <LF> | <LF> | <CR>

Whitespace and comments are ignored before and after tokens. Whitespace characters are optional, except between a token ending with an alphanumeric character and another token starting with an alphanumeric character, in which case they are required. Finally, there must be no whitespace characters before the initial-comment and utf8-bom tokens.

whitespace :: " " | <TAB>
comment :: "--" <any character except CR or LF>*

An initial source line starting with #! is interpreted as a comment:

initial-comment :: "#!" <any character except CR or LF>*

The special utf8-bom token may be present at the start of UTF-8 encoded files:

utf8-bom :: <EF> <BB> <BF>

Joining lines

Newlines after the following tokens are interpreted as whitespace, not as statement separators:

+ - * / ** div mod and or : to is == != < <= > >= ( [ = += -= *= /= **= ,

This can be used to divide long lines into multiple shorter lines.

Reserved words

The following words are reserved and cannot be used as identifiers (i.e. as names of global or local definitions, as member names or as module name components):

and
bind
break    
case     
class    
const    
def      
div
dynamic
elif     
else     
encoding 
end      
except
finally  
for      
if
implements
import   
in
interface
is       
mod      
module   
nil      
not      
or       
private  
raise   
repeat  
return  
self    
super   
switch  
to      
try     
until   
var     
while

Restricted names

Module name components and names of global definitions starting with two underscores (__) are reserved for internal use by the implementation. The implementation may freely define such names for any purpose, but user programs should not depend on their presence or absence to remain portable with different Alore implementations.

Additionally, it is recommended that the following names not be used as the first component of a module name, since they are reserved for use in future releases of Alore:

compiler
crypt
email
ftp
ftpserver
httpserver
locale
postgres
process
queue
serialize
smtp
sqlite
ssl
stack
timezone
udp
unicode
xml
xmltree

It is likely that some of these names will never be used in any future Alore release. Future Alore releases may remove some names from this list; these changes are retroactively applied to all earlier Alore versions as well.

Encoding

Alore source files may be encoded in ASCII, UTF-8 or ISO-8859-1 (Latin 1). See section Encoding declaration for information on specifying the encoding of a source file.

All 7-bit character codes except CR and LF (10 and 13, respectively) can be used in comments and string literals, including null characters, independent of the source file encoding. Double quotes, however, must be doubled within string literals.

In an ISO-8859-1 encoded source file, all character codes in range from 128 to 255, inclusive, can be used in comments and string literals. Similarly in a UTF-8 encoded source file, all valid UTF-8 sequences for code points between 128 and 65535, inclusive, can be used in comments and string literals.