I use a patched version of Aram Kocharyan’s Crayon Syntax Highlighter as a syntax highlighter plugin for WordPress, currently version 5.3.2 running on PHP 7.3.14. I wanted the plugin to highlight PowerShell Core scripts in the same way that the PSReadLine module does.
Configuring the theme
I created a new Crayon theme Solarized Dark PS that mapped the PSReadLine colours to the plugin elements as follows:
| PSReadLine Color option | Plugin element |
|---|---|
| Comment | COMMENT |
| Keyword | KEYWORD |
| Command and Member | STATEMENT |
| Parameter | RESERVED |
| Operator | OPERATOR |
| DefaultToken | IDENTIFIER and Unhighlighted |
| Type | TYPE |
| Number | CONSTANT |
| String | STRING |
| Variable | VARIABLE |
Configuring the language
The language grammar for PowerShell supplied with Crayon was a follows (for those elements which reference the default grammar, I have added the default as a following comment):
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
COMMENT ((?<!`)#.*?$)|((?<!`)<#.*?(?<!`)#>) HERESTRING:STRING ((?<!`)(@\".*?^\s*\"@))|((?<!`)(@\'.*?^\s*\'@)) STRING ((?<!`)".*?(?<!`)")|((?<!`)'.*?') FUNCTIONS:RESERVED (\b(?alt:reserved.txt)\b)|((?-i)[A-Z]\w+-[A-Z]\w+(?i)) STATEMENT \b(?alt:statement.txt)\b TYPE \b(?alt:type.txt)\b ENTITY (?default) # ENTITY (\b[a-z_]\w*\b(?=\s*\([^\)]*\)))|((?<!\.)(\b[a-z_]\w*\b)(?=[^}=|,.:;"'\)]*{))|(\b[a-z_]\w+\b\s+(?=\b[a-z_]\w+\b)) VARIABLE \$[A-Za-z_]\w*\b IDENTIFIER (?default) # IDENTIFIER \b[A-Za-z_]\w*\b CONSTANT -\w+\b OPERATOR (?default) # OPERATOR (?alt:operator.txt) SYMBOL (?default) # SYMBOL &[^;]+;|(?alt:symbol.txt) |
Crayon’s language grammar is limited in what it can express. The regular expressions for each element are combined to form a single regular express with alternative patterns:
(?:(regex1)|(regex2)| ... |(regexn))
That means that the code to be highlighted can only be analysed into mutually exclusive captures. PHP 7.3 uses Perl Compatible Regular Expressions 2 (PCRE 2) and lookbehind assertions must be of fixed length. That means the ability to detect context is limited.
I considered the supplied grammar to be lacking in certain respects, so I replaced it with the following:
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
COMMENT <(?<!`)#[\s\S]*?(?<!`)#>|(?<!`)(#.*?$) HERESTRING:STRING (?<!`)(@\"[\s\S]*?^\s*\"@)|(?<!`)(@\'[\s\S]*?^\s*\'@) STRING (?<!`)".*?(?<!`)"|(?<!`)'.*?' VARIABLE \$([$?^]|(\w[\w?]*:)?\w[\w?]*)|@(\w[\w?]*:)?\w[\w?]* WORDOPERATOR:OPERATOR (?<![\w?-])-(?alt:word-operator.txt)(?![\w?-]) RESERVED (?<=\s)-[A-Z_?][\w?-]+:*? KEYWORD \b(?alt:statement.txt)(?!-)\b FUNDEF:IDENTIFIER (?<=[^-]function\s|[^-]filter\s)\s*[A-Z_][\w-]* MEMBER:STATEMENT (?<=\.)[A-Z_][\w-]* TYPE \b(?alt:type.txt)\b STATEMENT \b[A-Z_][\w-]*\b OPERATOR (?alt:operator.txt) INT:CONSTANT (0x[\dA-F]+|\d+)(ul|us|uy|[lnsuy])?([kmgtp]b)? CONSTANT (\d*\.\d+(e[+\-]?\d+)?|\d+e[+\-]?\d+)d?([kmgtp]b)? ARITH:OPERATOR [+-] |
I match block comments (<# … #>) before end of line comments (# …), and allow the former to span multiple lines, by using [\s\S]*? rather than .*?.
I also allow here-strings (@" … "@ or @' … '@) to span multiple lines.
I assume that with the exception of $$, $? and $^, variables have the form $(\w[\w?]*:)?(\w[\w?]*); commands, functions and members have the form ([A-Z_][\w]*)(-[A-Z_][\w]*)*; and parameters have the form -[A-Z_?][\w?-]+:*?. I also assume that parameters are preceded by a space.
After comments and strings, I first check for variables. I then check for operators that are words (for example, -eq), parameters, keywords, identifiers defined by function, members, types, commands and then other operators except + and -, which appear in certain literal real numbers.