Posts Tagged ‘syntax’

building a programming language, part 2

Friday, September 4th, 2009

Following up on my original post, this time, version 1.0 “final” for the basic structure and syntax of the language, still unnamed, although someone suggested I should integrate ‘two’ and ‘hash’ in there. Oh well.

The spirit remains, but there have been a few changes in the language as a whole. The entire codeblock-as-type feature has been dropped until it can be further perfected syntactically, and I’ve taken a more pragmatic approach in general, things need to get done after all.

Built-in types

boolean
A boolean type, true or false.

integer
Pretty obvious, integer numbers, 64bit signed.

float
Less obvious, implemented as a C-style double.

string
A string type, most types can be cast to this.

list
Simple, LISP-style, linked-list.

array
Variable-length, numerically indexed arrays.

hash
Variable-length hashmap, /[[:alnum:]_]+?/ keys.

function
Standard first-class citizen functions.

class
First-class citizen classes.

dynamic
Special untyped variable, can contain *any* type value at any time, default type

Examples of primitive types

boolean b = true;
integer x = 5;
float j = 5.4E-4;
string s = 'test string';
list l = list(1, 2, 3, 4, 5);
array m = array('a', 'b', 'c', 'd', 'e');
hash z = hash(a: 1, b: 2, c: 3, d: 4, test_key: 666);
 
function f = function(a, b, c, d) uses (m) {
  sum(a, b, c, d, m[1]);
};
 
class c = class(
  private integer attr_name: 5,
  public function constructor myConstructor: function() {
    this.attr_name *= 10;
  }
);
dynamic y = 20;
p = 20;

A few sidenotes;

  • Keys in hash-maps are unquoted, because they are properly constrained.
  • Functions are always closures, but require explicit declaration of the enclosed variables.
  • Classes syntactically resemble hash-maps, but have modifier keywords, like private, protected, constructor and destructor.
  • The variables y and p are for all intents and purposes identical in content and type, untyped variables *are* dynamic variables.
  • All composite types (like lists, arrays and classes) have a constructor-function, which is an actual function that can be called with eval()-esque intent.

In-depth look into the composite types

Lists

Lists are singly-linked lists, and the functions list(), first() and rest() are the primitive operators on lists.

x = list(1, 2, 3, 4, 5);
first(x);
  => 1
rest(x);
  => list(3, 4, 5)

first() and rest() are modifying functions, they pop from the list whatever they return, so the list x is now a list with the value (2). Trying to pop from an empty array results in a ‘null’ value being returned. While-loops can be constructed around this, but lists are iteratable, so the built-in foreach() will do just fine.

foreach(list(1, 2, 3, 4, 5) as li) { ... }

Arrays

Arrays are numerically indexed, sparse datastructures. So you can have a value at index 1 and at 5, nothing stops you from breaking sequence. By default an array can contain values of any type (so it’s implicitly an array of dynamic values) but this behaviour can be changed easily.

integer array a = array(1, 2, 3, 4, 5, 6, 7, 8, 9);
a[9] = 10;
a[1] = null;
a;
  => array(1, 3, 4, 5, 6, 7, 8, 9, 10);
 
a[1];
  => null;

Array a is explicitly an array of integer values, and so cannot be assigned anything else. Doing so will result in a run-time error. (Type-checking is always done in run-time).
Unsetting array indecis is done by setting the value to null, as you can see when evaluating the array, index 1 does not exist. Evaluating a non-existant index of an array produces null as well. There is no difference between non-existing indecis and values set to null.

Hash-maps

Similar to arrays in syntax to arrays, maps are variable in length, and implicitly of type ‘dynamic’. This can be changed in the same way it can for arrays.

integer hash m = hash(a: 1, b: 2, c: 3, d: 4);
x = map(function (val, key) { ... }, m);

Introducing the function map, although it could be easily constructed using lower-level operations, a high-order function provided by the language runtime.

Functions

Being first-class citizens, functions get assigned to variables, and nothing else. All functions are closures, but require a list of variables to enclose. This is to keep GC simpler and it requires the developer think about what he or she wants to enclose.

square = function(x) { return x^2; } // f(x) = x^2;
 
deriv = function(f, dx) {
  return function(x) uses (f, dx) {
    return (f(x + dx) - f(x)) / dx
  };
};
 
d_square = deriv(square, 0.001); // Approximates f'(x) = 2x;

Provided is an example demonstrating the use of this language in creating a function that produces a function approximating the derivative of another function.

Classes

Classes are a bit different from the usual OOP approach, and are halfway between normal OOP classes and prototype-based languages. Classes are first-class citizens, and can be modified in run-time. Immediately the usual implied trust that a class will remain the same during it’s entire lifetime no longer exists, although there are methods of regaining that trust.

p = class(
  private integer x,
  private integer y,
 
  public constructor function myConstructor: function() {
    this.x = 5; this.y = 5;
  },
 
  public function toString() {
    return "x: " ~ x ~ ", y: " ~ y;
  }
);
 
object = new p;
p.toString();
  => string "x: 5, y: 5"

Looks reasonable enough, create a class, make an object, execute method, the usual. Inheritance is also mostly the same as PHP and Java classes.

k = class() extends p; // Empty class extending class p.
 
z = interface(public function toList);
t = class(
  public function toList: function() {
    return list(this.x, this.y);
  }
) extends p implements z;

t is a class implementing interface z and extending class p. It’s a bit ugly with the modifiers at the bottom, but it works fine. But because classes are also first-class citizens, and not the global structures we know from Java and C++ for example, you can hide classes in context, effectively using the code-scope mechanism to implement a system similar to namespaces in that it provides some classes can be directly accessed from the entire project, and some are internal to a function, a class or any other scope-seperating block.

The role/mixin/trait feature described in the first post about this programming language has been dropped until I get my head around it well enough to see the implications of adding it.

That about sums it up for the built-in types for version 1.0 of the language specification, though I’m sure there are flaws in here that need resolving before it can be implemented properly. Critisism, as always, is appreciated, save for spelling errors and such, I am already aware my English skill needs upgrading :-) For now, I can start working on the grammar/syntacical parser of the interpreter. After that comes a symbol table and after that I’ll be sure to post an update here.

building a programming language, part 1

Monday, June 29th, 2009

As a follow-up from a school assignment (where I had to build a full parser for a very simple language) I decided to construct a language of my own, just for fun.

Orignally, I was a C programmer, my part-time job entails PHP, and I have a fondness for languages like Scheme, Lua and Javascript. Drawing inspiration from all these language, I have (for now) decided on a syntax like this;

Variables

int x = 42;
dynamic j = 'x';
y = 'x';

j and y are both of the same type, ‘dynamic‘, in the case of y, this is implied. Dynamic variables can contain any type of content, others are type strict.

Built-in types

int x = 42;
float j = 4.5;
string s = "hello world";
array l = (1, 2, 3, 4, 'a', 'b', 'c', 'd');
map m = [a: 1, b: 2, c: 3, d: 4];
code c = { a = 5; a++; doSomething(a); };
function f = function() {};
class c = class [];

Statements are first class citizens of the language, albeit not for the right reasons yet.

Functions

string s = "Amount ";
 
f = function(int x, function y, z) uses (s) {
  x += y(z);
  return s ~ x;
}

This here example shows literal strings as a language construct, and first class citizenship of functions. All functions are closures, but require (like in PHP) the desired variables in the closure to be explicitly summed up. String concatenation is done through the ‘~‘ operator.

Classes

surface = class [
  private x : 0,
  private int y,
 
  public constructor makeNew : function {
    this->x = 5;
    this->y = 7;
  },
]

Classes, in my opinion, are more like hash-tables than functions, as such, the syntax reflects this. ‘constructor‘ is a special keyword that indicates that the method defined is the constructor of said object. As shown, the name of this method can be anything. I haven’t quite decided on overloading and it’s implications, so I’ve left it at that. As with functions, classes are first class citizens of the language. That means they can be assigned to variables, copied and have operations performed on them. An example:

// Class with two attributes, implied public, of undefined type
a = class [ x, y ];
 
// 'b' is an empty class that extends 'a'
b = class [] extends a; 
 
c = interface [ public int x, public function getName ];
d = class []; // 'd' is an empty class
d implements c; // which now implements 'c'

Operators like ‘extends‘ and ‘implements‘ can be applied to a class during it’s entire lifetime. Instances of those classes made before the operations do not change with them.

/* Classes can be extended with anonymous classes,
 * same for interfaces.
 * Not really useful, but syntactically it's valid
 * sidenote: newlines are regarded as whitespace */
a = class [ x, y ] extends class [ 
     function getName() { return 'name'; }, int c ]
     implements interface [ function getType ];
 
x = class [ function getThing(a, b) { return a + b; } ];
 
a has x;     // 'a' now has a method 'getThing'
b extends a; // but 'b' does not.

A trick I picked up from Perl 6, class-traits. I believe this will be very beneficial in bridging traditional classes and prototype-based languages by allowing class extending without adding to the inheritance chain.

Operators

/* All common symbolic operators (+, -, &&, extends) have
 * function-like counterparts. Internally, the symbolic ops
 * are just syntactic sugar for the function-like ops */
print(2 + 2);
print(add(2, 2));
 
print(true && false);
print(and(true, false));
 
// Variable amount of parameters when it makes sense
print(and(true, true, true, true, true, true, false));
print(subtract(40, 3, 4, 6, 2, 13));

But also

/* Symbol quoting stolen from Lisp,
 * symbols a, b and c are not evaluated */
function a = _function(('a, 'b, 'c), (x, y, z), {
  return add(a, b, c) * subtract(x, y, z); });

Which, as you can see is exceedingly ugly. Neither concise nor elegant, the syntax has grown to represent what I want to avoid in general. This either means I will need to get a clue as to resolving this syntactically, or I will have to remove the operators-as-functions feature from the language, but I haven’t decided. The entire concept of code as a first class citizen of the language (like in Smalltalk) was created out of necessity to make this work.

Conclusion (for now, anyway)

Although I have started with a parser for the basic language, stuff like the operators-as-functions and first class citizenship of codeblocks need to be resolved to add to the whole of the language instead of looking like I just clobbered it all together. Also, I still need to name it… ‘project cwx’ sounds increasingly cheezy.