Highlighting PowerShell Core

I use a patched version of Aram Kocharyan’s Crayon Syntax Highlighter as a syntax highlighter plugin for WordPress, currently version 5.3.2 running on PHP 7.3.14. I wanted the plugin to highlight PowerShell Core scripts in the same way that the PSReadLine module does.

Configuring the theme

I created a new Crayon theme Solarized Dark PS that mapped the PSReadLine colours to the plugin elements as follows:

PSReadLine Color option Plugin element
Comment COMMENT
Keyword KEYWORD
Command and Member STATEMENT
Parameter RESERVED
Operator OPERATOR
DefaultToken IDENTIFIER and Unhighlighted
Type TYPE
Number CONSTANT
String STRING
Variable VARIABLE

Configuring the language

The language grammar for PowerShell supplied with Crayon was a follows (for those elements which reference the default grammar, I have added the default as a following comment):

Crayon’s language grammar is limited in what it can express. The regular expressions for each element are combined to form a single regular express with alternative patterns:

(?:(regex1)|(regex2)| ... |(regexn))

That means that the code to be highlighted can only be analysed into mutually exclusive captures. PHP 7.3 uses Perl Compatible Regular Expressions 2 (PCRE 2) and lookbehind assertions must be of fixed length. That means the ability to detect context is limited.

I considered the supplied grammar to be lacking in certain respects, so I replaced it with the following:

I match block comments (<##>) before end of line comments (# …), and allow the former to span multiple lines, by using [\s\S]*? rather than .*?.

I also allow here-strings (@""@ or @''@) to span multiple lines.

I assume that with the exception of $$, $? and $^, variables have the form $(\w[\w?]*:)?(\w[\w?]*); commands, functions and members have the form ([A-Z_][\w]*)(-[A-Z_][\w]*)*; and parameters have the form -[A-Z_?][\w?-]+:*?. I also assume that parameters are preceded by a space.

After comments and strings, I first check for variables. I then check for operators that are words (for example, -eq), parameters, keywords, identifiers defined by function, members, types, commands and then other operators except + and -, which appear in certain literal real numbers.