I use a patched version of Aram Kocharyan’s Crayon Syntax Highlighter as a syntax highlighter plugin for WordPress, currently version 5.3.2 running on PHP 7.3.14. I wanted the plugin to highlight PowerShell Core scripts in the same way that the PSReadLine
module does.
Configuring the theme
I created a new Crayon theme Solarized Dark PS that mapped the PSReadLine
colours to the plugin elements as follows:
PSReadLine Color option | Plugin element |
---|---|
Comment | COMMENT |
Keyword | KEYWORD |
Command and Member | STATEMENT |
Parameter | RESERVED |
Operator | OPERATOR |
DefaultToken | IDENTIFIER and Unhighlighted |
Type | TYPE |
Number | CONSTANT |
String | STRING |
Variable | VARIABLE |
Configuring the language
The language grammar for PowerShell supplied with Crayon was a follows (for those elements which reference the default grammar, I have added the default as a following comment):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
COMMENT ((?<!`)#.*?$)|((?<!`)<#.*?(?<!`)#>) HERESTRING:STRING ((?<!`)(@\".*?^\s*\"@))|((?<!`)(@\'.*?^\s*\'@)) STRING ((?<!`)".*?(?<!`)")|((?<!`)'.*?') FUNCTIONS:RESERVED (\b(?alt:reserved.txt)\b)|((?-i)[A-Z]\w+-[A-Z]\w+(?i)) STATEMENT \b(?alt:statement.txt)\b TYPE \b(?alt:type.txt)\b ENTITY (?default) # ENTITY (\b[a-z_]\w*\b(?=\s*\([^\)]*\)))|((?<!\.)(\b[a-z_]\w*\b)(?=[^}=|,.:;"'\)]*{))|(\b[a-z_]\w+\b\s+(?=\b[a-z_]\w+\b)) VARIABLE \$[A-Za-z_]\w*\b IDENTIFIER (?default) # IDENTIFIER \b[A-Za-z_]\w*\b CONSTANT -\w+\b OPERATOR (?default) # OPERATOR (?alt:operator.txt) SYMBOL (?default) # SYMBOL &[^;]+;|(?alt:symbol.txt) |
Crayon’s language grammar is limited in what it can express. The regular expressions for each element are combined to form a single regular express with alternative patterns:
(?:(regex1)|(regex2)| ... |(regexn))
That means that the code to be highlighted can only be analysed into mutually exclusive captures. PHP 7.3 uses Perl Compatible Regular Expressions 2 (PCRE 2) and lookbehind assertions must be of fixed length. That means the ability to detect context is limited.
I considered the supplied grammar to be lacking in certain respects, so I replaced it with the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
COMMENT <(?<!`)#[\s\S]*?(?<!`)#>|(?<!`)(#.*?$) HERESTRING:STRING (?<!`)(@\"[\s\S]*?^\s*\"@)|(?<!`)(@\'[\s\S]*?^\s*\'@) STRING (?<!`)".*?(?<!`)"|(?<!`)'.*?' VARIABLE \$([$?^]|(\w[\w?]*:)?\w[\w?]*)|@(\w[\w?]*:)?\w[\w?]* WORDOPERATOR:OPERATOR (?<![\w?-])-(?alt:word-operator.txt)(?![\w?-]) RESERVED (?<=\s)-[A-Z_?][\w?-]+:*? KEYWORD \b(?alt:statement.txt)(?!-)\b FUNDEF:IDENTIFIER (?<=[^-]function\s|[^-]filter\s)\s*[A-Z_][\w-]* MEMBER:STATEMENT (?<=\.)[A-Z_][\w-]* TYPE \b(?alt:type.txt)\b STATEMENT \b[A-Z_][\w-]*\b OPERATOR (?alt:operator.txt) INT:CONSTANT (0x[\dA-F]+|\d+)(ul|us|uy|[lnsuy])?([kmgtp]b)? CONSTANT (\d*\.\d+(e[+\-]?\d+)?|\d+e[+\-]?\d+)d?([kmgtp]b)? ARITH:OPERATOR [+-] |
I match block comments (<#
… #>
) before end of line comments (#
…), and allow the former to span multiple lines, by using [\s\S]*?
rather than .*?
.
I also allow here-strings (@"
… "@
or @'
… '@
) to span multiple lines.
I assume that with the exception of $$
, $?
and $^
, variables have the form $(\w[\w?]*:)?(\w[\w?]*)
; commands, functions and members have the form ([A-Z_][\w]*)(-[A-Z_][\w]*)*
; and parameters have the form -[A-Z_?][\w?-]+:*?
. I also assume that parameters are preceded by a space.
After comments and strings, I first check for variables. I then check for operators that are words (for example, -eq
), parameters, keywords, identifiers defined by function
, members, types, commands and then other operators except +
and -
, which appear in certain literal real numbers.