I’ve had some free time lately, and I found a project for the past few days to keep me occupied. It all started in a group chat when someone brought up the idea of building his own programming language where as much as possible is symbols, primarily various types of brackets. Every command is in some sort of brackets (sort of like Lisp), and everything not in some sort of bracket is a comment. An example program (provided by the author) is this:
To break it down, the program does the following:
(#A)reads A from standard in and requires A be a number
&(A>12)represents an if statement, where the condition is A>12
*(A-=3)means that if the condition (A>12) is true, then subtract 3 from A
!(A+=2)means that if the condition is false (i.e. A <= 12), then subtract 2 from A
(A>>)means print A to standard out
This inspired me to build my own programming language. I was trying to figure out what the most ridiculous way to draw out control structures was, and after thinking for a while, I decided to require the programmer to use text to represent the control flow of the program by drawing everything with arrows. Thus, I called the language Arrow (mostly because that was the first name I could think of that I couldn’t find an existing language for). It is available on my GitHub for anyone who wants to read more about it or maybe try using it. You can look in “primesieve.arw” and “string.arw” for some examples of functions. There are two basic control structures: if and loop.
/--< false condition | body | more body \-->
In that case, if the given condition is false, then the body is skipped over, indicated by the arrow. Here’s a basic example:
bool even even = true /--< x % 2 == 0 | even = false \-->
In that case, if x is even, then the variable “even” will be set to true. How this works is that if x modulo 2 is zero (i.e. x is even), then the body of the if statement will be skipped and even will retain its original value of true. However, if x is odd, then the body of the if statement will not be skipped over and x will evaluate to false.
/--> | loop body | more loop body \--< continue condition
The loop functions as a standard do-while. I couldn’t find a way to satisfactorily represent a standard while with arrows, so I gave up and just did do-while (you can always skip over it with an if statement if the condition is false if you want a real while loop). The arrow indicates that if the continue condition is true, go back to the top. A basic example of this:
int counter counter = 0 int base base = 2 /--> | base = base * 2 | counter = counter + 1 \--< counter < 5
That code would calculate to the fifth power by looping until the counter reaches five, and at each iteration multiplying base by two.
I also couldn’t be satisfied with a programming language without adding functions. Functions also use a syntax that involves visualizing the control flow with text. An important nuance here is that functions must only have one return at the very end. Also, functions cannot be void (though I may implement that in a future update it time permits, and you can ignore the return value of a function). A basic example:
function /--> int five() | ^ 5
As you might imagine, that’s a function that returns five. The key idea here is how the return is specified (and code body goes above the return). The return (^) points back towards the function name. An important thing to note here is that you can only return from a function in one place (which must be the last instruction in a function body). Similar to how there are no break/continue statements, this requires the control flow to be visually obvious and have no jumping around, keeping the flow visible with the arrows.
Putting this together
With these frameworks, programs can be put together. A simple example of a loop inside a function is a function that calculates an integer square root (or the ceiling if the number is not a perfect square):
function /--> int squareRoot(int x) | int candidate | candidate = 0 | /--> //repeat the loop until the candidate squared is greater than or equal to the input | | candidate = candidate + 1 | \--< candidate * candidate < x ^ candidate
You can get more complicated, since control structures can be nested. To make this more fun, here’s a function that detects if a number is prime:
function /--> bool isPrime(int x) | bool result | result = true | /--< x < 3 | | int factor | | int ceiling | | ceiling = squareRoot(x) | | factor = 2 | | /--> | | | /--< x % factor != 0 | | | | result = false //if the factor divides the input, then the input is not prime | | | \--> | | | factor = factor + 1 | | \--< factor < ceiling and result //bail out as soon as we see a factor | \--> ^ result
That function (which uses the square root function from earlier) loops from 2 to the square root of a number and sees if any of those values divide the number. If so, it returns false, and if not, returns true.
In the process of developing this, I built a fairly powerful lexer framework that can, given a language spec built with objects, perform the lexical analysis to generate tokens automatically. It’s definitely not the most efficient tool (it uses greedy matching and a ton of string operations), but it was a good exercise in object-oriented design and writing reusable tools. The basic building blocks are as follows, and everything except “fixed string” recursively nests on other token types.
- Fixed string
- Fixed sequence
- Multiple options
Given rules using these elements (as well as a few other rules on how to actually get a useful result, such as what type to attach to each token and when to combine underlying tokens), the lexer can generate the tokens for a program. A few simple examples:
- Digit is multiple options of the fixed strings “0”, “1”, … , “9”
- Unsigned integer is repeated digit at least one time
- Identifier is sequence of letter followed by repeated alphanumerics zero or more times
Given that I am a very strong believer in strong typing, I created a type system for Arrow. The three base types are int, char, and bool, and arrays of any dimension can be created from those types. Everything is copied/passed by value (and arrays are deep copied). Strings are just arrays of characters, with the length representing the length of the string. Every variable starts out uninitialized, and attempts to read an uninitialized variable will result in an error. Arrays also can be either initialized or uninitialized. To initialize an array, dimensions must be specified in the declaration (for example,
int x would initialize an array of length five). Once an array is initialized, each cell is an uninitialized value.
Strings are also a part of the language, with a string being an array of chars. Unlike C which uses null-terminated strings, since arrays are a well-defined type in Arrow (as opposed to C which just has a pointer to the head of the array), the length of the array determines the length of the string.
Scoping and static semantics
Using mostly what I learned in Programming Language Concepts class, I was able to implement static semantics fairly easily. I designed the call stack (and the static symbol stack) to have both a static link and a dynamic link, with the static link being used for scoping and dynamic link being used for a return location. I also was able to implement static type synthesis on the parse tree using techniques I learned in that class. Currently there are no global variables, but with how the scoping system is implemented I could add those with minimal effort.
Though it almost came as an afterthought, input/output were added to the language. Output is pretty simple, using the print keyword (note that it’s not a function, but instead a language construct). Multiple things can be printed by separating them with commas. Input is a bit more tricky, and currently input can only be taken to integers and strings. To receive input, use the input keyword than specify a type (for example, “
input int” or “input
char“). Input commands can be used in expressions like a regular function call, but the argument given is a type and not a variable.
The overall high level design was taken from what I learned in Compiler Design class. In particular, having a separate lexer, parser, parse tree, and backend (interpreter).
I may have accidentally created something useful
I intended the Arrow language mostly to be a joke as well as maybe something to keep me occupied. However, as I was developing it I realized that it may actually have some useful value. Certainly it’s not a great tool for developing real-world applications, but the visualized control flow may be a useful tool for teaching control structures or helping visualize algorithms. The control flow graph is almost built into your source code here. The lack of any sort of GOTO statements (throw, return, break, continue) also means that the control flow has to be painfully obvious and can only be shown with the control structures that require the control flow to be directly visible.
If anyone creates something useful, I’d love to see it!