Perl-compatible Regular ExpressionsPermalink to this headline
Log collecting and processing is one of the most important features in Wazuh, allowing users to know the status in real-time of the ThreatLockDown agent operating system and their running applications. Incorporation of PCRE regex support along with already existent OSRegex and OSMatch regex will open up a range of possibilities and, at the same time, enhances log comprehension and interpretation.
This section briefly the features of this type of regex, its enablement in rules and decoders, and some use cases applied to the default ruleset.
AdvantagesPermalink to this headline
QuantifiersPermalink to this headline
In addition to the already known *
and +
quantifiers, PCRE incorporates:
?
try match zero or one times. Example:https?
regex will match http and https{n}
try match exactly n times. Example:\d{4}
regex will match 1000 and any second or third millennium year.{n,}
try match n or more times. Example:\d{2,}
will 12, 123, 1234 and so on{n,m}
try to match between n and m times. Example:\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}
will match any IPv4 address.
All quantifiers can be used and combined with groups, expressions and literals.
Example: (\d{1,3}\.?){4}
it is shorter and equivalent to \d{1,3}\.?\d{1,3}\.?\d{1,3}\.?\d{1,3}.?
Case sensitivityPermalink to this headline
Compared to OSRegex and OSMatch, which are case insensitive, PCRE regex are case sensitive by default. This can be changed by using (?i)
.
Example: post will match (?i)POST|GET|PUT
regex but not POST|GET|PUT
.
Groups within groupsPermalink to this headline
PCRE provides ease and flexibility in data extraction. Unlike OSRegex, it allows groups within groups.
For example, in the next log, the regular expression from=<(.*?@(.*?))>
extracts the email
(john@email-dom.com) and domain (email-dom.com) into separate fields.
Sep 29 17:11:02 ramp sendmail[21549]: v8TLB2x7021549: from=<john@email-dom.com>, size=909, class=0, nrcpts=1, msgid=<201709292111.v8TLB1Nj021545@email.com>, proto=ESMTP, daemon=MTA, relay=[2001:0db8:85a3:0000:0000:8a2e:0370:7334]
Groups comparing: backreferencesPermalink to this headline
Backreferences match the same text as previously matched by a capturing group.
Groups can be referenced in the order they are declared with a backslash followed by the group number.
For example, in the next log, the regular expression ^(\d+\.\d+\.\d+\.\d+) \1
only match if both IPs
at the beginning of the log are equal.
10.10.10.11 10.10.10.11 - - [10/Apr/2017:13:18:05 -0700] "GET /injection/%0d%0aSet-Cookie HTTP/1.1" 404 271 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:22.0) Gecko/20100101 Firefox/22.0"
Character classes (character set)Permalink to this headline
In addition to the types of characters like \w
to match a word character or \d
to match the decimal digit,
a custom set of characters can be specified with []
.
Ranges of letters and numbers can also be specified. For example, [A-zA-Z0-5]
includes the numbers
from 0 to 5 and the entire alphabet in upper and lower case letters.
Example: The next regex \d+[-\/]\d+[-\/]\d+
will match any datetime despite separation character used.
Configuring PCREPermalink to this headline
PCRE can be enabled in rules and decodes using the type="pcre2"
attribute,
which also will allow to set other regex like type="osregex"
and type="osmatch"
for
OSRegex and OSMatch, respectively, depending on the case.
DecodersPermalink to this headline
A simple example of data extraction with PCRE. Here is a log of a program called example_pcre2:
Dec 25 20:45:02 MyHost example_pcre2[12345]: User 'admin' change email to 'admin@suspicious-domain.com'
Using PCRE in a decoder it is possible to extract the user, email and email domain:
<decoder name="example_pcre2"> <program_name>^example_pcre2$</program_name> </decoder> <decoder name="example_pcre2"> <parent>example_pcre2</parent> <regex type="pcre2">User '(.*?)' change email to '(.*?@(.*?))'</regex> <order>user, email, domain</order> </decoder>
RulesPermalink to this headline
Use case: Accurate PAM user alertsPermalink to this headline
The Linux Pluggable Authentication Modules(PAM) is a key component that brings authentication support for applications and services in UNIX-like systems, most of which are case sensitive. By default, some false positive alerts related to usernames may be generated, i.e users FOO and foo are not differentiated by the rules. This can be avoided by using PCRE case sensitivity, so they are handled as different users. The next custom rule generates an alert when foo user is logged to the system via ssh.
<rule id="100002" level="5">
<if_sid>5501</if_sid>
<description>foo user logged in.</description>
<user type="pcre2">foo</user>
</rule>
wazuh-logtest output show the triggered alert