Executing Logprep

To execute Logprep the following command can be executed in the root directory of the project:

logprep run $CONFIG

Where $CONFIG is the path or a url to a configuration file (see Configuration).

To get help on the different parameters use:

logprep --help

Running logprep-ng

Logprep-ng is the next generation of logprep and is still in an experimental state. To execute logprep-ng, use the following command:

logprep-ng run $CONFIG

Where $CONFIG is the path or a url to a configuration file (see Configuration). This command starts the logprep-ng processing pipeline with the specified configuration and options.

Common usage examples:

# Run with default configuration
logprep-ng run

# Run with custom configuration file
logprep-ng run /path/to/config.yml

Available options can be viewed using:

logprep-ng run --help

Event Generation

Logprep has the additional functionality of generating events and sending them to two different targets. It can send events to kafka, while also loading events from kafka or reading them from file, and it can send events to a http endpoint as POST requests.

Following sections describe the usage of these event generators.

Kafka

Kafka is a load tester for generating events based on templated sample files stored in a dataset directory. These events are then sent to specified Kafka topics. The event generation process is identical to the Http generator.

The dataset directory containing the sample files must follow this structure:

| - Test-Logs-Directory
| | - Test-Logs-Class-1-Directory
| | | - config.yaml
| | | - Test-Logs-1.jsonl
| | | - Test-Logs-2.jsonl
| | - Test-Logs-Class-2-Directory
| | | - config.yaml
| | | - Test-Logs-A.jsonl
| | | - Test-Logs-B.jsonl

While the jsonl event files can have arbitrary names, the config.yaml needs to be called exactly that. It also needs to follow the following schema:

Example configuration file for the http event generator

target: example_topic
timestamps:
- key: TIMESTAMP_FIELD_1
    format: "%Y%m%d"
- key: TIMESTAMP_FIELD_1
    format: "%H%M%S"
    time_shift: "+0200"  # Optional, sets time shift in hours and minutes, if needed ([+-]HHMM)

To learn more about the Kafka event generator, run:

logprep generate kafka --help

Http

The http endpoint allows for generating events based on templated sample files which are stored inside a dataset directory.

The dataset directory with the sample files has to have the following format:

| - Test-Logs-Directory
| | - Test-Logs-Class-1-Directory
| | | - config.yaml
| | | - Test-Logs-1.jsonl
| | | - Test-Logs-2.jsonl
| | - Test-Logs-Class-2-Directory
| | | - config.yaml
| | | - Test-Logs-A.jsonl
| | | - Test-Logs-B.jsonl

While the jsonl event files can have arbitrary names, the config.yaml needs to be called exactly that. It also needs to follow the following schema:

Example configuration file for the http event generator

target: /endpoint/logsource/path
timestamps:
- key: TIMESTAMP_FIELD_1
    format: "%Y%m%d"
- key: TIMESTAMP_FIELD_1
    format: "%H%M%S"
    time_shift: "+0200"  # Optional, sets time shift in hours and minutes, if needed ([+-]HHMM)

To find out more about the usage of the http event generator execute:

logprep generate http --help

Pseudonymization Tools

Logprep provides tools to pseudonymize and depseudonymize values. This can be useful for testing and debugging purposes. But this can also be used to depseudonymize values pseudonymized by Logprep Pseudonymizer Processor.

These tools can be used to pseudonymize given strings using the same method as used in Logprep and provides functionality to depseudonymize values using a pair of keys.

generate keys

logprep pseudo generate -f analyst 1024
logprep pseudo generate -f depseudo 2048

this will generate four files to pseudonymize in the next step. the depseudo key has to be longer than the analyst key due to the hash padding involved in the procedure.

get help with logprep pseudo generate --help

pseudonymize

logprep pseudo pseudonymize analyst.crt depseudo.crt mystring

This will pseudonymize the provided string using the analyst and depseudo keys.: get help with logprep pseudo pseudonymize --help

depseudonymize

logprep pseudo depseudonymize analyst depseudo <output from above>

This will depseudonymize the provided string using the analyst and depseudo keys.

get help with logprep pseudo depseudonymize --help

Restart Behavior

Logprep reacts on failures during pipeline execution by restarting 5 (default) times. This restart count can be configured in the configuration file with the parameter restart_count. If the restart count is set to a negative number, the restart count is infinite and logprep will restart the pipelines immediately after a failure. On logprep start a random timeout seed is calculated between 100 and 1000 milliseconds. This seed is then doubled after each restart and is used as sleep period between pipeline restart tryouts.

If the pipeline restart succeeds, the restart count is reset to 0.

Exit Codes

class logprep.util.defaults.EXITCODES

Exit codes for logprep.

SUCCESS = 0: Successful execution.

ERROR = 1: General unspecified error.

CONFIGURATION_ERROR = 2: An error in the configuration.

PIPELINE_ERROR = 3: An error during pipeline processing.

ERROR_OUTPUT_NOT_REACHABLE = 4: The configured error output is not reachable.

conjugate(): Returns self, the complex conjugate of any int.

bit_length()

Number of bits necessary to represent self in binary.

>>> bin(37)
'0b100101'
>>> (37).bit_length()
6

bit_count()

Number of ones in the binary representation of the absolute value of self.

Also known as the population count.

>>> bin(13)
'0b1101'
>>> (13).bit_count()
3

to_bytes(length=1, byteorder='big', *, signed=False)

Return an array of bytes representing an integer.

length: Length of bytes object to use. An OverflowError is raised if the integer is not representable with the given number of bytes. Default is length 1.
byteorder: The byte order used to represent the integer. If byteorder is ‘big’, the most significant byte is at the beginning of the byte array. If byteorder is ‘little’, the most significant byte is at the end of the byte array. To request the native byte order of the host system, use `sys.byteorder’ as the byte order value. Default is to use ‘big’.
signed: Determines whether two’s complement is used to represent the integer. If signed is False and a negative integer is given, an OverflowError is raised.

classmethod from_bytes(bytes, byteorder='big', *, signed=False)

Return the integer represented by the given array of bytes.

bytes: Holds the array of bytes to convert. The argument must either support the buffer protocol or be an iterable object producing bytes. Bytes and bytearray are examples of built-in objects that support the buffer protocol.
byteorder: The byte order used to represent the integer. If byteorder is ‘big’, the most significant byte is at the beginning of the byte array. If byteorder is ‘little’, the most significant byte is at the end of the byte array. To request the native byte order of the host system, use `sys.byteorder’ as the byte order value. Default is to use ‘big’.
signed: Indicates whether two’s complement is used to represent the integer.

as_integer_ratio()

Return integer ratio.

Return a pair of integers, whose ratio is exactly equal to the original int and with a positive denominator.

>>> (10).as_integer_ratio()
(10, 1)
>>> (-10).as_integer_ratio()
(-10, 1)
>>> (0).as_integer_ratio()
(0, 1)

real: the real part of a complex number

imag: the imaginary part of a complex number

numerator: the numerator of a rational number in lowest terms

denominator: the denominator of a rational number in lowest terms

Healthchecks

Logprep provides a health endpoint which can be used to check the health of all components. The asgi app for the healthcheck endpoint is implemented in logprep.metrics.exporter.make_patched_asgi_app and will be recreated on every restart of logprep (e.g. after a configuration change) or on creation of the first pipeline process. The healthcheck endpoint is available at /health if metrics are enabled and can be accessed via HTTP GET.

On success, the healthcheck endpoint will return a 200 status code and a payload OK.
On failure, the healthcheck endpoint will return a 503 status code and a payload FAIL.

Healthchecks are implemented in components via the health() method. You have to ensure to call the super.health() method in new implemented health checks. The health is checked for the first time after the first pipeline process is started and then every 5 seconds. You can configure the healthcheck timeout on component level with the parameter health_timeout. The default value is 1 second.

Healthchecks are used in the provided helm charts as default for readiness probes.

Event Generation Guide

Prerequisites

Before running either the HTTP or Kafka event generation process, ensure that the required environment is set up as described in Docker Compose Example Deployment.

Start the required environment with the following command:

export PROMETHEUS_MULTIPROC_DIR="/tmp/logprep"
mkdir -p $PROMETHEUS_MULTIPROC_DIR
docker compose -f examples/compose/docker-compose.yml up -d

HTTP Event Generation

To start an example pipeline for HTTP event generation, execute the following steps:

Run the pipeline

logprep run ./examples/exampledata/config/http_pipeline.yml

Generate and send events to the HTTP endpoint:

logprep generate http --target-url http://localhost:9000/ --input-dir ./examples/exampledata/input_logdata --events 10000

When executed, the console should display output similar to the following:

"Number of failed events": 0,
"Number of successful events": 10000,
"Requests Connection Errors": 0,
"Requests Timeouts": 0,
"Requests http status 200": 20,
"Requests total": 20

The HTTP 200 status indicates that the generated data was successfully transferred. Since no batch size was specified, the default batch size was used, resulting in 20 batches being sent.

Advanced Usage of the HTTP Generator

Below are examples of how to invoke the HTTP generator with different options.

The --verify option enables or disables SSL verification for the HTTP request. It also allows you to specify a path to a certificate for verification.

logprep generate http --verify False --target-url http://localhost:9000/ --input-dir ./examples/exampledata/input_logdata --events 10000

The --shuffle option enables shuffling of events before batching, ensuring a randomized event order.

logprep generate http --shuffle True --target-url http://localhost:9000/ --input-dir ./examples/exampledata/input_logdata --events 10000

The --thread-count option specifies the number of threads to use for parallel event generation.

logprep generate http --thread-count 2 --target-url http://localhost:9000/ --input-dir ./examples/exampledata/input_logdata --events 10000

The --replace-timestamp option determines whether the timestamps of example events should be replaced during generation. The default is True

logprep generate http --replace-timestamp False --target-url http://localhost:9000/ --input-dir ./examples/exampledata/input_logdata --events 10000

The --tags option allows setting a tag for the generated events, which can be useful for categorization or filtering.

logprep generate http --tag loglevel --target-url http://localhost:9000/ --input-dir ./examples/exampledata/input_logdata --events 10000

The --timeout option specifies the HTTP request timeout duration (in seconds), controlling how long the generator waits for a response.

logprep generate http --timeout 2 --target-url http://localhost:9000/ --input-dir ./examples/exampledata/input_logdata --events 10000

The --loglevel option sets the logging level for displayed logs.

logprep generate http --loglevel DEBUG --target-url http://localhost:9000/ --input-dir ./examples/exampledata/input_logdata --events 10000

Kafka Event Generation

To generate events and send them to Kafka, follow these steps:

(optional) Run the logprep pipeline to check if processing from kafka works as expected:

logprep run ./examples/exampledata/config/pipeline.yml

Generate and send events to Kafka:

logprep generate kafka --input-dir ./examples/exampledata/input_logdata/  --batch-size 1000 --events 10000 --output-config '{"bootstrap.servers": "127.0.0.1:9092"}'

When executed, the console should display output similar to the following

"Is the producer healthy": true,
"Number of processed batches": 10,
"Number of successful events": 10000

This confirms that the Kafka producer is healthy and that all events have been successfully processed.

Additional Examples of Invoking the ConfluentKafka Generator

The options --shuffle, --thread-count, --replace-timestamp, --tags, --loglevel , can be used in the same way as for the http generator as shown in Advanced Usage of the HTTP Generator.

Here is an example of a more extensive output configuration for the ConfluentKafka generator.

logprep generate kafka --output-config '{"bootstrap.servers": "127.0.0.1:9092", "enable.ssl.certificate.verification" : "true"}' --input-dir ./examples/exampledata/input_logdata/ --batch-size 1000 --events 10000

For a full list of available options, refer to the ConfluentKafka documentation.

The --send_timeout option determines the maximum wait time for an answer from the broker on polling.

logprep generate kafka --send-timeout 2 --input-dir ./examples/exampledata/input_logdata/ --output-config '{"bootstrap.servers": "127.0.0.1:9092"}' --batch-size 1000 --events 10000