Quenya – Beyond syntax

Edit – I have been looking at other alternatives to writing my own parser/language, although I did learn a lot when trying to write my own. Nemerle seems like the most likely candidate to me, given how extensible and C#-like it is. Among mainstream languages, I am also beginning to favor F#. Thanks to everyone in the comments section who pointed out various options to me.

I guess it’s finally time to let people know what I’ve been working on (and thinking about) for the past six months or so.

The driving force behind this new project is my increasing frustration with existing languages. While most of them have been crafted masterfully, there’s always something in each that isn’t available and prevents me from writing the most elegant or readable code for that compiler. With C#, my list of gripes is long and would eat up a whole post. Let’s just say that while I love C#, there are many features I wish it had. And most of the time, it’s not something that can’t be added, it’s just not in the compiler because

  • The feature probably wasn’t considered or was cut because it wasn’t an important use-case
  • The development team couldn’t add it in due to time constraints (that’s fair)
  • Or they believe it’s not something that the language needs (these are far more frustrating, because you know they’re never going to be in the language)

Again, I state these without mentioning what it is I want, which is indeed the point of this post. :)

Without further ado, I present a proposed implementation for my (.NET) compiler which I have dubbed Quenya for the time-being. I intended this compiler to be syntax agnostic (or at the very least, extremely extensible) and therefore the bulk of the idea revolves around a way for users to re-define source code and for the compiler to be able to parse it.

Before we begin, I would like to point you to this awesome post by Luke Hoban, who is one of my favorite blog authors ever. I have used the source code for his monadic parser to build Quenya’s parsing logic.

To allow Quenya to parse something, it is not enough to provide it with source code. Quenya needs a grammar/parser (or a reference to one) and the source to parse. The parser is then expected to return a set of known AST nodes when called upon to parse the source code (linked to that grammar/parser). The rest of the compiler phases then proceed as usual and the compiler spits out IL. The figure below shows an overview of the parsing process in Quenya.

quenya parsing overview

Quenya parsing overview

But it’s probably more cumbersome to write an entire parser each time you want to add or modify a single feature of the base grammar you’re using right? That’s where the monadic parser written in C# comes in. To illustrate, let’s take a simple example. We’ll write a simple parser that can parse words (alphabets) given a string and then extend it to have more features without rewriting the entire parser.

// A simpleton parser, one that doesn't understand the . (period) and that will fail if it encounters one.
public abstract class SimpleSentenceParser: CharParsers
{
    // Defines the set of all whitespace characters
	private Parser _whitespaceChars;
    public virtual Parser WhitespaceChars
    {
        get
        {
            return _whitespaceChars ??
                   (_whitespaceChars = Char(' ').OR(Char('\t').OR(Char('\n')).OR(Char('\r'))));
        }
    }
 
    // A whitespace is 0 or more repetitions of a valid Whitespace character
	private Parser _whitespace;
    public virtual Parser Whitespace
    {
        get
        {
            return _whitespace ??
                   (_whitespace = Rep(WhitespaceChars));
        }
    }
 
    // A word is multiple (alphabetic) characters that are not whitespaces
	private Parser _word;
    public virtual Parser Word
    {
        get
        {
            return _word ??
                   (_word = from w in Whitespace
                            from c in Char(char.IsLetter)
                            from word in Rep(Char(char.IsLetter))
                            select new WordTerm(word.Aggregate(new string(new[] {c}), (acc, ch) => acc + ch)));
        }
    }
 
    // A sentence is a set of words - simplistic, but will work for single sentences
	private Parser _sentence;
    public virtual Parser Sentence
    {
        get
        {
            return _sentence ??
                   (_sentence = from words in Rep(Word)
                                select new SentenceTerm(words));
        }
    }
 
    // Set of all sentences in the input (start symbol for this grammar)
	private Parser _all;
    public virtual Parser All
    {
        get
        {
            return _all ??
                   (_all = from sentence in Sentence
                           select (Term) sentence);
        }
    }
}
 
// A concrete implementation of the above parser that can accept string input
public class SimpleSentenceParser: SimpleSentenceParser
{
    public override Parser AnyChar // Pick one character at a time
    {
        get
        {
            return input => input.Length > 0 ? 
				new Result(input[0], input.Substring(1)) : 
				null;
        }
    }
}

Note that the code above is quite readable (as a grammar) due to the use of C# query expressions. Now let’s look at a parser that “extends” the capabilities of this parser.

// A slightly more intelligent parser, it can identify sentences properly
// Note that this parser does NOT redefine what a word means. It only redefines a sentence.
public abstract class BetterSentenceParser: SimpleSentenceParser
{
    private Parser _whitespaceChars;
    public override Parser WhitespaceChars // Adds to existing whitespace characters
    {
        get
        {
            return _whitespaceChars ??
                (_whitespaceChars = base.WhitespaceChars.OR(Char(',')).OR(Char(';')));
        }
    }
 
    // New, defines sentence terminators to allow this parser to understand them
	private Parser _sentenceTerminators;
    public virtual Parser SentenceTerminators
    {
        get
        {
            return _sentenceTerminators ??
                   (_sentenceTerminators = Char('.'));
        }
    }
 
    // Redefines what a sentence means and takes sentence terminators
	// into account
	private Parser _sentence;
    public override Parser Sentence
    {
        get
        {
            return _sentence ??
                   (_sentence = from words in Rep(Word)
                                from terminator in SentenceTerminators
                                select new SentenceTerm(words));
        }
    }
 
    // This now returns multiple sentences
	private Parser _all;
    public override Parser All // Actually selects multiple sentences
    {
        get
        {
            return _all ??
                   (_all = from sentences in Rep(Sentence)
                           select (Term)new ParagraphTerm(sentences));
        }
    }
}
 
// A concrete implementation of a parser that can accept string input
public class BetterSentenceParser : BetterSentenceParser
{
    public override Parser AnyChar
    {
        get
        {
            return input => input.Length > 0 ? new Result(input[0], input.Substring(1)) : null;
        }
    }
}

Ah, that looks fairly simple! The observant among you no doubt noticed that this parser extends (Sentence) and modifies existing behavior (WhitespaceChars) without having to do all the work by itself. Using the concrete version of these two parsers, we can now parse stuff!

internal class Program
{
    private const string SimpleText = @"Lorem ipsum dolor sit amet";
 
    private const string ComplexText =
        @"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus enim lorem," +
        @" tincidunt ac sollicitudin fringilla, vulputate a dolor. Sed eu viverra ligula.";
 
    private static void Main(string[] args)
    {
        // Will parse SimpleText into multiple words and one sentence
		{
			var parser = new SimpleSentenceParser();
			var result = parser.All (SimpleText);
		}
 
        // Will parse ComplexText partially, fail when it encounters the '.'
		{
			var parser = new SimpleSentenceParser();
			var result = parser.All (ComplexText);
		}
 
        // Will parse ComplexText into multiple words and multiple sentences
		{
			var parser = new ComplexSentenceParser();
			var result = parser.All (ComplexText);
		}
    }
}

Let’s now extend this concept given a fully functional C# parser definition. The user will now be able to “modify” the syntax of the language by defining his own parser that changes certain productions and/or terminals in the grammar. Add in a mechanism to link a given source file to a given grammar/parser (attribute on the parser = file extension of source) and you have Quenya, a language that encourages people to define and modify it.

Granted, this is probably tougher than just learning one language, but considering that the code you write using your DSL or modified base language will be much, much more suited to your purposes (if you use it wisely) and that most of your code will be written in the straight-up base grammar for your language makes this worth it (to me, at least – I’d be happy to hear your thought/comments). Also, people who don’t care about language extensibility can just use a vanilla grammar for the language of their choice and continue doing what they’ve been doing.

The next post on Quenya will deal with the implications of a language like this and how it can be used (or abused) in a real context.

Download the source code here: QuenyaParsing

Site under reconstruction

My site recently got hacked into, and I am in the process of restoring all the old pages, posts and comments (thank heavens for WordPress and plugins). If something is broken, please contact me so I can restore it.

Please bear with me while I get the site back up.

In the meantime, you can reach me at ananth _at_ ananthonline _dot_ net

Thank you for your patience.

Ananth

Downloads page is working again

I have fixed the Brahma | Downloads page, and it is back up. Thanks to everyone who pointed that out, and sorry it took so long for me to get around to it!

Brahma.OpenCL works and is available to check out!

Ok, I finally got around to cleaning up the Brahma code (just a little, mind you – there’s loads more to do!) and checking it in! Yes, that means you can try it out now! Because there are no releases, you’re going to have to check out the code with the instructions from here. The source contains a few samples I showed off at the Twin Cities Code Camp last weekend, so that’s the place to start poking around. I would love to see questions, suggestions and offers for help with samples Smile.

 

I’m hoping to be able to put in more work on Brahma in a week or so, so stay tuned!

New Brahma syntax/usage

Earlier versions of Brahma used a LINQ statement exactly like a query. You could “transform” data, but weren’t allowed any control over where it went in the output. You could also just “select” one value, no more and no less. Another limitation was; when operating on two differently sized data-parallel arrays (Phew! That’s a mouthful!) you had to make a choice about the size of the output, the “kernel” would be called for all elements in the output array. It was also hard to loop, do conditionals and other kinds of logic that are almost indispensable in complex algorithms.

With OpenCL, kernel code has finally become expressive enough to truly perform general purpose computations. Brahma allows you (albeit in a limited fashion due to current language restrictions – of course, if we could write multiline lambdas and the compiler would give me expression trees of those, I wouldn’t have any of these “workarounds”. Oh, well.) to do this:

  1. provider.CompileKernel<Buffer<int32>, Buffer<int32>>((d1, d2) =>
  2. from x in Loop.For(() => 0, i => i < _width, i => i + 1)
  3. from y in Loop.For(() => 0, i => i < _height, i => i + 1)
  4. let idx = y * _width + x
  5. select new[]
  6. {
  7. result1[idx] <= (d1[idx] + d2[idx]),
  8. result2[idx] <= (d1[idx] - d2[idx])
  9. });

Note that DataParallelArray<T> is now called Buffer<T> – isn’t that a lot easier? The kernel above is two nested loops that calculate something off two 2D images and write the results to 2 output buffers called result1 and result2. (I know the example doesn’t make sense, it’s just there to show usage!)

The observant among you probably noticed a few things

  • I AM selecting one output item for each inner loop iteration (= _width * _height items) – Well, yes. But I’m writing two results each time (I can write none if I don’t want to).
  • Why am I comparing result[idx] and (d1[idx]….)? Nope. Brahma overloads the <= operator and in this case you read it as “is assigned”. So in Brahma parlance, you’re saying: result[idx] “is assigned” d1[idx]….
  • result1 and result2 are NOT lambda parameters. While Brahma will allow you to use only “side-effects”, you can also use other overloads to provide parameters that you assign to. This comes in handy when you want to use different output buffers but compile the query only once.

And in that sense, you are also free to pass non-buffer arguments to the kernel (and write to them from one – depending on your compute device you may have memory restrictions, though). For example, a structure containing control values can now be “passed” in per run!

Here’s one more (idiotic) sample that uses the current thread id instead of running nested loops per kernel launch. Ok, so I’m still thinking about the DeviceMethods name.

  1. provider.CompileKernel<Buffer<int32>, Buffer<int32>>((d1, d2) =>
  2. from x in DeviceMethods.GlobalID(0) // ID (and coordinates) of the thread
  3. from y in DeviceMethods.GlobalID(1)
  4. let idx = y * _width + x
  5. select new[]
  6. {
  7. result1[idx] <= (d1[idx] + d2[idx]),
  8. result2[idx] <= (d1[idx] - d2[idx])
  9. });

I think the new kernel syntax/usage is neat, functional and compact. What do you think? I’m still working on several bits of this design … so this is when you can be heard! Please tell me what you think!

TidePowerd

I’m at the nVidia GPU conference and I just learned of (and met) a company called TidePowerd, who seem to have taken the IL decompilation idea (that Brahma 1.0 was based on) into a commercial product. This, to me is good because

  1. It looks like someone thought many-core/GPGPU developer tools for .NET was an idea worthy of commercialization (a drum I’ve been banging for many years now)
  2. It’s nice to see the seed of the idea that IL can be effectively transformed to SIMD/SIMT type code be realized in a fuller fashion.

Personally, I’d be REALLY interested in seeing what TidePowerd does and how the industry responds to their offerings.

OpenCL.Net published!

I just published my version of .NET bindings for OpenCL over on Codeplex last night. Why another, you ask? There’s already so many out there … Cloo, OpenCL.NET from hoopoe and another OpenCL.Net over at Sourceforge (and more?). Well, …

  • Every API out there has an object-oriented version of the API that’s easily usable from .NET. Sure, it’s easy … but looks very different from the raw OpenCL API if I wanted to use it, right? There’s reason #1 right there. I wanted a set of .NET bindings that look, feel and act like the C-style API so I (read: the users) could port existing samples or read books/tutorials and understand what’s going on.
  • Which brings us to our second question: Don’t all of the mentioned bindings also have the C-style API exposed? Sure, they do. But some of them mark the exposed API as unsafe, use of which would force users to mark their own projects as unsafe too. Secondly, every set of bindings I saw had raw translations of the API transliterating <anything>* to IntPtr (and others along those lines) meaning we have to use their OO abstractions to get anything done easily. I wanted something more .NET friendly at the API level, however.

So I guess the reason I wrote my own bindings is: I wanted to strike the middle ground between OO + .NET friendly and raw API translation. You can find this over at Codeplex. The API isn’t complete yet, but some things I’m particularly happy about are:

  1. No unsafe code (yet).
  2. Everything is .NET friendly AND you can use it as you would the raw API (overloads).
  3. As little explicit marshaling as possible, in fact all the exposed methods that invoke the extern’d methods in OpenCL.dll are only one call/line.

Initially, I started with these bindings for my project Brahma, but decided it was a large enough effort to put up separately. I hope people find it useful alongside the other tools for OpenCL for .NET that are already available. I hope to have the API completed and released soon. Stay tuned.

Parsing command line arguments with C# & LINQ

Whenever I sit down to write an application (mostly a console application), I find myself wishing I had some code I could just drop in to parse command line arguments with. A friend of mine has authored the excellent ConsoleFX library, and it’s really good if you want to do heavy processing, validations and complex command line argument structures with help. Most of the time, however, I don’t.

So I spent the better part of my evening writing something that I hope I will be able to carry around as a code snippet small enough to use anytime, anywhere in an elegant fashion.

  1. #region CommandLine
  2. static class CommandLine
  3. {
  4. public class Switch // Class that encapsulates switch data. Not meant for direct use.
  5. {
  6. public Switch(string name, Action<IEnumerable<string>> handler, string shortForm = null)
  7. {
  8. Name = name;
  9. Handler = handler;
  10. ShortForm = shortForm;
  11. }
  12. public string Name
  13. {
  14. get;
  15. private set;
  16. }
  17. public string ShortForm
  18. {
  19. get; private set;
  20. }
  21. public Action<IEnumerable<string>> Handler
  22. {
  23. get; private set;
  24. }
  25. public int InvokeHandler(string[] values)
  26. {
  27. Handler(values);
  28. return 1;
  29. }
  30. }
  31. // The regex that extracts names and comma-separated values for switches
  32. // in the form (<switch>[:value,value,value])+
  33. private static readonly Regex _regex =
  34. new Regex(@"s*(?<switch>(?<name>w+)(:s*(?<value>w+(s*,s*w+)*))?)",
  35. RegexOptions.Compiled | RegexOptions.CultureInvariant |
  36. RegexOptions.ExplicitCapture | RegexOptions.IgnoreCase);
  37. private const string NameGroup = "name"; // Names of capture groups
  38. private const string ValueGroup = "value";
  39. public static void Process(this string[] args, Action printUsage, params Switch[] switches)
  40. {
  41. // Run through all matches in the concatenated argument list and if any
  42. // of the switches match, get the values and invoke the handler we were
  43. // given. We do a Sum() here for 2 reasons; a) To actually run the handlers
  44. // and b) see if any were invoked at all (each returns 1 if invoked).
  45. // If none were invoked, we simply invoke the printUsage handler.
  46. if ((from Match match in _regex.Matches(string.Join(" ", args))
  47. from arg in switches
  48. where match.Success &&
  49. ((string.Compare(match.Groups[NameGroup].Value, arg.Name, true) == 0) ||
  50. (string.Compare(match.Groups[NameGroup].Value, arg.ShortForm, true) == 0))
  51. select arg.InvokeHandler(match.Groups[ValueGroup].Value.Split(','))).Sum() == 0)
  52. printUsage(); // We didn't find any switches
  53. }
  54. }
  55. #endregion

All of the magic happens inside the Process method, where the query tries to match the arguments against the ones we want to check for. The .Sum() aggregate method does two things.

  1. It enumerates the elements of the IQueryable, actually calling the handlers in the process. Nothing happens till then!
  2. Each of the InvokeHandler calls returns 1, so the result of the .Sum() is 0 if there were no matches (which is when we call the supplied printUsage handler).

Neat, huh? Using it is even simpler!

  1. args.Process(
  2. () => Console.WriteLine("Usage is switch0:value switch:value switch2"),
  3. new CommandLine.Switch("switch0",
  4. val => Console.WriteLine("switch 0 with value {0}",
  5. string.Join(" ", val))),
  6. new CommandLine.Switch("switch1",
  7. val => Console.WriteLine("switch 1 with value {0}",
  8. string.Join(" ", val)), "s1"),
  9. new CommandLine.Switch("switch2",
  10. val => Console.WriteLine("switch 2 with value {0}",
  11. string.Join(" ", val))));
  12. }

Any handlers you provide will be called for each switch or it’s shortform(if found, of course). Now get started writing those fantastic console applications.

Obligatory disclaimer and License: The code above is for you to do whatever you want to. Even if it blows you up (like Dr. Manhattan blows Rorschach up), don’t come back from the dead to haunt me. I would, however appreciate it if you kept the comment attributing that code to me.

Double Edit: Modified the Regex so it can accept quoted parameters (even filenames).

Download the code snippet from the code snippets page.

Website upgrade

I recently had to upgrade the WordPress installation on my website, and decided to do a bunch of housecleaning besides.

I’d been looking for a good content management system that allows me to keep a blog, wiki(s) and forums (across different subdomains – brahma, blog, etc.) and I couldn’t find anything easy to use and powerful at the same time. I’m a big stickler for themes and I hate having to manage and re-theme different applications to look the same.

After a lot of searching, I’ve decided to use WordPress as my CMS and am going to be trying out these wonderful plugins:

  1. WordPress Wiki – This looks interesting, although I have no idea how I’m going to integrate this into a subdomain.domain.net/wiki URL structure. If anyone has any ideas, I’ll be glad to hear them.
  2. Simple Press – This is the other thing I wanted on my website, and I’m quite happy WordPress has a plugin for it.

Unfortunately, to do a lot of this, I might have to shuffle some URLs and pages around. I will do my best to keep the site as functional as possible (perhaps even resorting to the evil redirection trick), but if you can’t find anything or see a problem with the website, please contact me so I can fix the problem. Thanks!

ATI Stream SDK on non-AMD machines

It seems the ATI stream SDK installer doesn’t install anything but the OpenCL profiler on machines with no AMD/ATI hardware. Running the MSI’s from “C:C:ATISUPPORTstreamsdk_2-0-0_XXXXPackagesApps” folder seems to install OpenCL (CPU only, of course) support for me.

Note: I originally found this solution on geeks3d.com.

– Edit —
ATI/AMD have since fixed this, and everything installs with the latest version of the Stream SDK.

Recent comments