Rules
Basic Functionality
How processors process log messages is defined via configurable rules. Each rule contains a filter that is used to select log messages. Other parameters within the rules define how certain log messages should be transformed. Those parameters depend on the processor for which they were created.
Rule Files
Rules are defined as YAML objects or JSON objects. Rules can be distributed over different files or multiple rules can reside within one file. Each file contains multiple YAML documents or a JSON array of JSON objects. The YAML format is preferred, since it is a superset of JSON and has better readability.
Depending on the filter, a rule can trigger for different types of messages.
Further details can be found in the section for processors.
1filter: 'command: execute' # A comment
2labeler:
3 label:
4 action:
5 - execute
6description: '...'
1filter: 'command: "execute something"'
2labeler:
3 label:
4 action:
5 - execute
6description: '...'
7---
8filter: 'command: "terminate something"'
9labeler:
10 label:
11 action:
12 - execute
13description: '...'
1{
2 "filter": "command: execute",
3 "labeler": {
4 "label": {
5 "action": ["execute"]
6 }
7 }
8 "description": "..."
9}
1[
2 {
3 "filter": "command: execute",
4 "labeler": {
5 "label": {
6 "action": ["execute"]
7 }
8 }
9 "description": "..."
10 },
11 {
12 "filter": "command: execute",
13 "labeler": {
14 "label": {
15 "action": ["execute"]
16 }
17 }
18 "description": "..."
19 }
20]
Log message field value access
All rules reference fields or field values of log messages.
This can be done via the dot notation.
To reference a nested field inside the log event, just give the whole path from the event root
to the desired field.
To reference the field information in the following example you would use the following
notation: more.nested.information.
If you do want to access a specific item inside a list of the event you can extend the dotted
notation with indices.
Given the following example you can access the list element lists with the following
notation: more.nested.sometimes.1.
In case you want to have more than one element then you can slice the list with the pattern
start:stop:step_size, e.g: more.nested.sometimes.0:2 which would return
["inside", "lists"].
This slicing is based on the native
python list slicing.
1{
2 "some": "data",
3 "more": {
4 "nested": {
5 "information": "is here",
6 "sometimes": ["inside", "lists", "of", "elements"]
7 }
8 }
9}
Warning
The dotted field notation is available in all processors, the use of indices to access list
elements is though not available in the Clusterer, Labeler and the
Pseudonymizer.
Filter
The filters are based on the Lucene query language, but contain some additional enhancements.
It is possible to filter for keys and values in log messages.
Dot notation is used to access subfields in log messages.
A filter for {'field': {'subfield': 'value'}} can be specified by
field.subfield': 'value'.
If a key without a value is given it is filtered for the existence of the key.
The existence of a specific field can therefore be checked by a key without a value.
The filter filter: field.subfield would match for every value subfield in
{'field': {'subfield': 'value'}}.
The special key * can be used to always match on any input.
Thus, the filter filter: * would match any input document.
The filter in the following example would match fields ip_address with the
value 192.168.0.1.
Meaning all following transformations done by this rule would be applied only
on log messages that match this criterion.
This example is not complete, since rules are specific to processors and require additional options.
1{ "filter": "ip_address: 192.168.0.1" }
It is possible to use filters with field names that contain white spaces or use special symbols
of the Lucene syntax. However, this has to be escaped.
The filter filter: 'field.a subfield(test): value' must be escaped as
filter: 'field.a\subfield(test): value'.
Other references to this field do not require such escaping.
This is only necessary for the filter.
It is necessary to escape twice if the file is in the JSON format - once for
the filter itself and once for JSON.
Operators
A subset of Lucene query operators is supported:
NOT: Condition is not true.
AND: Connects two conditions. Both conditions must be true.
OR: Connects two conditions. At least one them must be true.
In the following example log messages are filtered for which event_id: 1 is true and
ip_address: 192.168.0.1 is false.
This example is not complete, since rules are specific to processors and require additional options.
1{ "filter": "event_id: 1 AND NOT ip_address: 192.168.0.1" }
RegEx-Filter
It is possible to use regex expressions to match values.
To be recognized as a regular expression, the filter field has to start with
/.
1filter: 'ip_address: /192\.168\.0\..*/'
[Deprecated, but still functional] The field with the regex pattern must be added to the optional field
regex_fields in the rule definition.
In the following example the field ip_address is defined as regex field.
It would be filtered for log messages in which the value ip_address starts with
192.168.0..
This example is not complete, since rules are specific to processors and
require additional options.
1filter: 'ip_address: "192\.168\.0\..*"'
2regex_fields:
3- ip_address
RuleTree
For performance reasons on startup, all rules per processor are aggregated to a rule tree. Instead of evaluating all rules independently for each log message, the message is checked against the rule tree. Each node in the rule tree represents a condition that has to be met, while the leaves represent changes that the processor should apply. If no condition is met, the processor will just pass the log event to the next processor.
Rule Tree Configuration
To further improve the performance, it is possible to prioritize specific nodes of the rule tree, such that broader conditions are higher up in the tree. And specific conditions can be moved further down. The following json gives an example of such a rule tree configuration. This configuration will lead to the prioritization of category and message in the rule tree.
1{
2 "priority_dict": {
3 "category": "01",
4 "message": "02"
5 },
6 "tag_map": {
7 "check_field_name": "check-tag"
8 }
9}
A path to a rule tree configuration can be set in any processor configuration under the key
tree_config.