SILVER

Developing a High-Level, Human-Readable Computer Language

I created a computer language that aims to be easy to write and understand, hinging only on a person's English language skills and basic problem solving. It has applications for both those who are not proficient in programming as well as education.
Kevin Paul
Grade 11

Problem

Learning Curve Difficulty

Modern computer languages can be hard to navigate and even harder to learn. The learning curve for computer languages is comprised both of having to learn novel syntax and learning to think like a computer. The way to solve problems using computers is fundamentally different than every-day problem solving, and for those who don't have strong mathematical or scientific backgrounds already, that can be a challenge to understand. This difficulty is exacerbated by the cryptic syntax that scripting languages (which a programmer would consider "trivial") can have. It is for these reasons that many people are turned away from programming, even though, had they taken the time to learn it, they could have been talented problem solvers, engineers, or entrepreneurs

 

Convenience

Many scientists, mathematicians, or those with an idea or solution in their head need a way to express it. For anyone who has learned a computer language, the way to express it is obvious: write a program. But for those who never learned to program or struggle with computers, they could be turned away from a powerful tool for trivial reasons. Therefore, there is a disconnect between those with novel ideas, and the tools they need to bring them to fruition.

If these individuals had a language that they 1) could easily understand, 2) could easily write, and 3) had rapid prototyping and easy debugging, they could bring their ideas to life. Instead of having to struggle for days debugging a cryptic language like C++, or spending hours reading the documentation for Python libraries, and rather jumping into a text editor and quickly being able to write high-level imperative code, they could get what's in their head out faster.

 

Education

Currently, teaching a large group of students, who have never seen programming before, to program is difficult. Students may lose interest quickly because of all the overhead and boilerplate they have to wrap their heads around. Without any of that, students might quickly catch interest in realizing the power of knowing how to code holds. Students learn best when what you're teaching them is right in their hands. If they have an idea for a project, they're free to explore programming on their own time and really develop an interest. I know for certain that there are many students who find their Computer Science class a chore, when they would otherwise be extremely talented programmers had they been introduced to it in the right way.

All of that relies on having an easy-to-read, easy-to-write, and fast computer language. So I built one!

 

 

Method

What I Built

In order to create an abstract language that a computer can execute, we need to build an interpreter. So, that's what I did.

I used a tool called ANTLR and the C# language to create a grammar, lexer, parser; all the ingredients for an interpreter. Creating a language like this not only dips into the field of Computer Science, but also Linguistics. Creating a "grammar" basically means specifying how lines of code should be constructed semantically. Essentially, it defines the formula to create a valid sentence in the language.

Here is a part of the file that defines the grammar of the language, note that the grammar is very long and complicated in its entirety:

Now, this file might look confusing for some, however, all one needs to know is that it is an exact specification of valid semantic construction in my language. It is unique in that, in a regular spoken language, there is often no way to concisely express all valid semantic constructions. There are always exceptions, rules that don't follow logic but rather intuition, and totally abstract ideas. However, since it's a computer we're dealing with and not humans, we have to be as precise as possible.

Novel Features

This language breaks almost every rule of conventional language creation. Why? Because I think these rules limit the innovation in this space.

For example, a very well-known rule in language design is that is it pertinent to only have 1 way to express yourself. It is generally considered bad practice to have more than one way to express something because it can cause ambiguity or confusion. However, I think such a convention should be disregarded.

So, this language supports something I'm calling "aliasing", which will be explained in detail further down in this document.

Why did I choose the C# language?

I chose C# mainly because it's my language of choice. I also chose it because, similar to my language, development in it is rapid. However, this does present a potential drawback: it's slow. C# is fairly well-optimized, however, the overhead of language parsing limits the language's speed. In the future, if speed becomes a priority, I could transition to C++ or Rust as they're much faster alternatives both supported by ANTLR.

What Does the Language Support?

Currently, the language is not fully Turing-complete, however, it supports:

  • Function/Coroutine creation
  • Loops
    • For loop (including novel syntatic variations to it)
      • ' for x in [1, 3, 5, 7, 9] ' ...
    • While/Until loop
      • ' until condition is true ' ...  
        ' while condition is true ' ...
    • Repeat loop
      • ' do 1000 times ' ...
    • From loop
      • ' from 0 to 1000 using use variable ' ...
  • If / Else Statements
  • Primitive error handling & traceback
  • Custom Number object that can support much larger integers than current languages
  • Variable assignment
    • Object types include:
      • String
      • List
      • Booleans
      • Number
  • Number & Boolean Comparison
  • String, Number, and List mathematics

To give a sense of completeness, the language is suitable for basic programs like mathematical calculations, data manipulation on a large scale, text input and output, and algorithms like ciphers with loops.

Some Example Code

Say, for instance, we took this simple C program which will literally just output the words "Hello World!" to the console.

int main(void)
{
    printf("Hello World!\n");
}

And we took that same program, but wrote it in my language

say <- "Hello World!"

That's it, that's all you need. Not too impressive however. Let's do something more complex.

Take this C program, which will iterate from 1 to 100, take every number to the power of 2 and print their total sum.

#include <math.h>
#include <stdio.h>

int main()
{
    int sum = 0;
    for (int i = 1; i <= 100; i++)
    {
        sum += pow(i, 2);
    }
    printf("%d", sum);
}

If you're not a programmer, this is essentially meaningless garbage which, somehow, when put through a computer, spits out 338350. If you reading this are not yourself a programmer, and don't understand what this program is trying to do, you're in the same boat as a lot of people! Even people that have tried learning a computer language but have given up might have a hard time understanding such a program.

Now, the same program but in my language would look like this:

sum = 0
using every i from 1 to 100
    sum = sum + i to the power of 2
end

print <- sum

Easier to read, right? It doesn't need any boilerplate like those curly braces, 'int main()', semicolons, and the syntax is a lot less cryptic.

This type of construction is called a "for loop" and is one of the most important constructions in computer programming. What it does is define a variable, iterate it from a start value to an end value, and for each value, will run the code inside of the loop. Essentially like sigma notation in mathematics.

Target Audience

As mentioned previously, I thought that having more than one way to express your code would be a useful feature for this language. Most languages (either intentionally or unintentionally) have a "target audience". For example, Java or C++ are geared towards programmers who are looking to build something robust, fast, and portable. They can demonstrate some examples of extremely concise functional code but with the drawback of being cryptic as a result. Languages like Pascal, Python, or BASIC have very easy-to-understand syntax and are therefore very often used as first languages for beginners to learn. These days, it's mostly Python that is taught, and Pascal and BASIC are not really used in modern software.

Because Python is so widely-understood, we see it appear in a lot of modern software; however, it doesn't scale well in certain applications. Python is really good for web servers thanks to those who have put time into developing fast frameworks. But, in GUI applications, game development, and embedded software, Python's overhead starts to get the best of it, and noticeable drops in speed will be seen in these applications.

So, what does this mean for my language? Well, I'm nowhere near experienced enough to optimize such a high-level language to the point where it is actually feasible in applications similar to where even Python slows down. However, I can tackle the second problem: syntax aliasing.

Aliasing

You may have noticed that in the Example code section I used the function 'print' to give the output to the console. However, in the prior example I actually invoked the function called 'say'. These two are functionally identical; they're two aliases for the same function. In fact, I have given three aliases to this one function: `print`, `say`, and `output`.

This allows those who are less versed in programming, or those who just want to write more comprehensible code to choose more verbose options. In fact, no programming language has a function called `say`, but it does describe what exactly the function does, right? Languages will often opt for more technical terms like `print` (which means "print to the console", as any competent programmer would know, but what if a beginner thought this function would make their printer spin up and print something?).

There are a lot of aliases available in this language:

A screenshot of the "Aliases.yml" file in the language implementation.

This is the file that defines what the valid aliases are. It can be edited super easily, just by adding a line with the desired modification.

As you can see, when I raised a variable to the power of two in the earlier example code, this is the file that told my language that "to the power of" was a valid semantic construction. The classic carat ` ^ ` symbol is also available for those who want to save some typing. Any alias can be used interchangeably in the code, and one can use as many different aliases as one wants within one file.

There is also some alternative syntax not defined in this file. For example, in the example code, we invoked a function with the arrow `<-` operator, something I invented. However, the classic way to invoke a language, with parentheses `()` is also valid. So,

say("Hello!")
say <- "Hello!"

Are both functionally equivalent.

I think that this sort of functionality is extremely powerful in a language, and widens its target audience. If you're teaching someone to code, you use can use extremely verbose syntax that would be easier to understand. However, if you're a versed programmer, you can use the classic syntax you'd see in Python or Java, as most of it is supported by the language.

 

Analysis

Languages Other than English

    Designing a language that is meant to "read like English" leaves out a huge portion of people who don't speak English, or aren't proficient enough to understand both programming and English at once. This, in fact, is a problem all over the field of computer science, as there are a handful of programming languages that only support one language.

   However, as one can see, through the file that specifies keyword aliasing, there is a unique opportunity. Instead of putting keywords in English, we can remove all of them, and instead put them in French, Italian, Spanish, German, and a handful of other languages that share similar grammatical structures to English. I say that because the alias file only controls the keywords, but not the grammatical structure. So, in order to support a language like Turkish, Russian, or Mandarin Chinese, we'd have to change the Grammar file, reshuffling the order of keywords.

 

More Code Examples

Palindrome checking function

function isPalindrome(phrase)
    return (phrase as string).reverse() == phrase as string
end

print <- isPalindrome <- 1553551                => true
print <- isPalindrome <- "racecar"              => true
print <- isPalindrome <- "big blue building"    => false

'From' loops - list of even squares

list_of_squares = new list

from 0 to 1000 counting by 2 use num
    list_of_squares.add(num squared)
end

print <- list_of_squares                    => [0, 4, 16, 36, 64, 100, 144, 196, 256, 324, 400, 484, 576, 676, 784, ... 9604, 10000]

Asking for user input

name = ask <- "What's your name? "
print <- "your name backwards is: " + (name.reverse())

// When Executed :

What's your name?
> Kevin
your name backwards is: niveK

Until Loops

x = 1

until x is greater than 1000
    x = x * 2
end

print <- "x is " + (x as string)      => "x is 1024"

 

Conclusion

Conclusion

   In conclusion, I think that a computer language like this can have a wide impact on the Computer Science space. Designing it to be accessible to many more people than any language before is a unique property. Essentially, it unlocks software design for those who might not have access to all the literature necessary to learn a traditional language.

 

Future Ambitions

Dialects

    As individuals begin to get their bearings with the language more, they will inevitably develop preferences for certain syntax over others. This compromises the founding motivation for the language, as it makes it harder to read for individuals who have different preferences. Just like any spoken language, there are different dialects that may be harder to understand for those from another dialect.

   Therefore, I present a solution to this: personal dialects. This will be a file/program that keeps track of an individual's preferences in syntax. As the programmer types, a simple AI can pick up on their preferences and automatically fill out their preference file (or they could do it by hand, if desired). Then, when someone sends them a program in another dialect, they can run it through their dialect converter and all the syntax will be converted to their own preferences instead of the author's. Therefore, they can more easily understand the program as it will be in their "own words". This essentially eliminates the "confusion" that having multiple ways to express yourself creates. I would have not come up with this had I followed "convention" which advises not to do this for that reason.

Mathematics

    When I set out to build this language, there were many goals I wanted to hit. I wanted to create a good language for education, but also a language useful for other things. I did not want to compromise the utility of the language for its simplicity/ease. I think there is a point where this language can be equally utilitarian as it is to learn. Therefore, I want to add support for complex mathematics.

    1. Algebra & Equation support

          There are currently close to no languages that support algebraic expressions. This is because a "variable" must be defined before the program runs and have a set value. Therefore, having unknowns is not really syntactically succinct with the rest of a language. However, if I introduce a new code block type (like for each, if/else. etc.) that instead allows algebraic expressions within it and follows totally different rules inside of it, it can exist in parallel with conventional programming workflow.

    2. Built-in graphing

           It would essentially be a built-in function that, when called, will display a window with your graph. I don't believe that there are any computer languages that support graphing so easily. It is usually the case that one has to download and import a library that will do it for them. However, having it available so easily and out-of-the-box is very useful to those doing any kind of math, data analysis, or timing. It wouldn't have a whole lot of configuration options (think of a TI-83 graphing calculator), however, it would add a lot of substance to the mathematical side of the language.

    3. Complex math (Calculus, Linear Algebra, etc.)

          This is currently the most far-fetched idea, however, it's not impossible to implement. There would be a lot of roadblocks implementing integration, infinite series, and other calculus natively. It is possible instead to call the WolframAlpha API (which already can do all these things) from the language implementation itself. This would be worth more than just going to the Wolfram website yourself, because all of your variables, loops, and other code in your program would be able to integrate with your math calculations. That's something you can't do on the WolframAlpha website.

 

Citations

SAELI, M., PERRENET, J., JOCHEMS, W. M., & ZWANEVELD, B. (2011). Teaching programming in secondary school: A pedagogical content knowledge perspective. Informatics in Education, 10(1), 73-88. doi:10.15388/infedu.2011.06

Mészárosová, E. (2015). Is python an appropriate programming language for teaching programming in secondary schools?    International Journal of Information and Communication Technologies in Education, 4(2), 5-14. doi:10.1515/ijicte-2015-0005

Hammer, M. (1975). A very high-level programming language for data processing applications.

Deursen, A. V., Klint, P., & Visser, J. (1998, March 2). Domain-specific languages: An annotated bibliography. Retrieved January 19, 2019, from https://dl.acm.org/citation.cfm?id=352035

Grigonis, R. (n.d.). FIFTH-GENERATION COMPUTERS. Retrieved January 19, 2019, from https://www.atariarchives.org/deli/fifth_generation.php

Peters, T. (2004, August 19). PEP 20 -- The Zen of Python. Retrieved January 20, 2019, from https://www.python.org/dev/peps/pep-0020/

Hulegaard, G. S. (2015, December 18). Why is Python so popular despite being so slow? Retrieved January 20, 2019, from https://www.quora.com/Why-is-Python-so-popular-despite-being-so-slow

Stack Overflow User "mikera". (2010, July 10). Compiled vs. Interpreted Languages. Retrieved January 20, 2019, from https://stackoverflow.com/a/3265602/9628054

Ravenbrook Limited. (2018). Memory Management Reference. Retrieved January 20, 2019, from https://www.memorymanagement.org/mmref/recycle.html#mmref-recycle

Kulkarni, A. (2014, July 16). Why is python so widely used? Retrieved January 20, 2019, from https://www.quora.com/Why-is-python-so-widely-used

Beal, V. (n.d.). High-level language. Retrieved January 20, 2019, from https://www.webopedia.com/TERM/H/high_level_language.html

TechEmpower. (2015, April 21). Web Framework Benchmarks. Retrieved January 20, 2019, from https://www.techempower.com/benchmarks/#section=data-r10&hw=peak&test=json

Sinha, S. (2017, December 16). What is the difference between byte code and machine code and what are its advantages? Retrieved January 20, 2019, from https://www.quora.com/What-is-the-difference-between-byte-code-and-machine-code-and-what-are-its-advantages

Acknowledgement

I would like to thank my mentor for this project, Ms. Mia MacTavish, for helping me flush out the ideas of my project in the final days before the fair, as well as providing constructive criticism in the ongoing development of it.
 
I would also like to thank Dr. Moshe Renert and Mr. Kevin Cammack for guiding me as to what I should do with my project and where I should take it. They saw how great the idea was and encouraged me to pursue it.
 
I would especially like to thank Dr. Dave Carlgren for nurturing the idea of this project in its early days. He has since moved to China to teach, however, I continue developing the ideas we discussed two years ago. He gave me so many ideas for this project and showed me its potential, and I am very grateful for that. Without him, this project would be one-tenth of what it is now (and it's still nowhere near complete!)