MATOMO - Import Command Example

From Wiki.IT-Arts.net


MATOMO PYTHON EXAMPLE COMMAND

##### MATOMO PYTHON COMMAND
# eg :
python3 /var/www/html/matomo/misc/log-analytics/import_logs.py --idsite=1 --recorders=4 --url='http://matomo.lanv' --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots /var/log/nginx/<WEBSITE.COM>.access.log.*.gz --accept-invalid-ssl-certificate


MAN FILE

###########################################################################################
#####:~# python3 /var/www/html/matomo/misc/log-analytics/import_logs.py --help
usage: import_logs.py [-h] [--auth-user AUTH_USER] [--auth-password AUTH_PASSWORD] [--debug] [--debug-tracker] [--debug-request-limit DEBUG_REQUEST_LIMIT] --url MATOMO_URL [--api-url MATOMO_API_URL]
                      [--tracker-endpoint-path MATOMO_TRACKER_ENDPOINT_PATH] [--dry-run] [--show-progress] [--show-progress-delay SHOW_PROGRESS_DELAY] [--add-sites-new-hosts] [--idsite SITE_ID]
                      [--idsite-fallback SITE_ID_FALLBACK] [--config CONFIG_FILE] [--login LOGIN] [--password PASSWORD] [--token-auth MATOMO_TOKEN_AUTH] [--hostname HOSTNAMES] [--exclude-path EXCLUDED_PATHS]
                      [--exclude-path-from EXCLUDE_PATH_FROM] [--include-path INCLUDED_PATHS] [--include-path-from INCLUDE_PATH_FROM] [--useragent-exclude EXCLUDED_USERAGENTS] [--enable-static] [--enable-bots]
                      [--enable-http-errors] [--enable-http-redirects] [--enable-reverse-dns] [--strip-query-string] [--query-string-delimiter QUERY_STRING_DELIMITER] [--log-format-name LOG_FORMAT_NAME]
                      [--log-format-regex LOG_FORMAT_REGEX] [--log-date-format LOG_DATE_FORMAT] [--log-hostname LOG_HOSTNAME] [--skip SKIP] [--recorders RECORDERS]
                      [--recorder-max-payload-size RECORDER_MAX_PAYLOAD_SIZE] [--replay-tracking] [--replay-tracking-expected-tracker-file REPLAY_TRACKING_EXPECTED_TRACKER_FILE] [--output OUTPUT]
                      [--encoding ENCODING] [--disable-bulk-tracking] [--debug-force-one-hit-every-Ns FORCE_ONE_ACTION_INTERVAL] [--force-lowercase-path] [--enable-testmode]
                      [--download-extensions DOWNLOAD_EXTENSIONS] [--add-download-extensions EXTRA_DOWNLOAD_EXTENSIONS] [--w3c-map-field KEY=VAL] [--w3c-time-taken-millisecs] [--w3c-fields W3C_FIELDS]
                      [--w3c-field-regex KEY=VAL] [--title-category-delimiter TITLE_CATEGORY_DELIMITER] [--dump-log-regex] [--ignore-groups REGEX_GROUPS_TO_IGNORE] [--regex-group-to-visit-cvar KEY=VAL]
                      [--regex-group-to-page-cvar KEY=VAL] [--track-http-method TRACK_HTTP_METHOD] [--retry-max-attempts MAX_ATTEMPTS] [--retry-delay DELAY_AFTER_FAILURE] [--request-timeout REQUEST_TIMEOUT]
                      [--include-host INCLUDE_HOST] [--exclude-host EXCLUDE_HOST] [--exclude-older-than EXCLUDE_OLDER_THAN] [--exclude-newer-than EXCLUDE_NEWER_THAN] [--add-to-date SECONDS_TO_ADD_TO_DATE]
                      [--request-suffix REQUEST_SUFFIX] [--accept-invalid-ssl-certificate] [--php-binary PHP_BINARY]
                      file [file ...]

Import HTTP access logs to Matomo. log_file is the path to a server access log file (uncompressed, .gz, .bz2, or specify - to read from stdin). You may also import many log files at once (for example set
log_file to *.log or *.log.gz). By default, the script will try to produce clean reports and will exclude bots, static files, discard http error and redirects, etc. This is customizable, see below.

positional arguments:
  file

optional arguments:
  -h, --help            show this help message and exit
  --auth-user AUTH_USER
                        Basic auth user
  --auth-password AUTH_PASSWORD
                        Basic auth password
  --debug, -d           Enable debug output (specify multiple times for more verbose)
  --debug-tracker       Appends &debug=1 to tracker requests and prints out the result so the tracker can be debugged. If using the log importer results in errors with the tracker or improperly recorded
                        visits, this option can be used to find out what the tracker is doing wrong. To see debug tracker output, you must also set the [Tracker] debug_on_demand INI config to 1 in your
                        Matomo's config.ini.php file.
  --debug-request-limit DEBUG_REQUEST_LIMIT
                        Debug option that will exit after N requests are parsed. Can be used w/ --debug-tracker to limit the output of a large log file.
  --url MATOMO_URL      REQUIRED Your Matomo server URL, eg. https://example.com/matomo/ or https://analytics.example.net
  --api-url MATOMO_API_URL
                        This URL will be used to send API requests (use it if your tracker URL differs from UI/API url), eg. https://other-example.com/matomo/ or https://analytics-api.example.net
  --tracker-endpoint-path MATOMO_TRACKER_ENDPOINT_PATH
                        The tracker endpoint path to use when tracking. Defaults to /piwik.php.
  --dry-run             Perform a trial run with no tracking data being inserted into Matomo
  --show-progress       Print a progress report X seconds (default: 1, use --show-progress-delay to override)
  --show-progress-delay SHOW_PROGRESS_DELAY
                        Change the default progress delay
  --add-sites-new-hosts
                        When a hostname is found in the log file, but not matched to any website in Matomo, automatically create a new website in Matomo with this hostname to import the logs
  --idsite SITE_ID      When specified, data in the specified log files will be tracked for this Matomo site ID. The script will not auto-detect the website based on the log line hostname (new websites will
                        not be automatically created).
  --idsite-fallback SITE_ID_FALLBACK
                        Default Matomo site ID to use if the hostname doesn't match any known Website's URL. New websites will not be automatically created. Used only if --add-sites-new-hosts or --idsite are
                        not set
  --config CONFIG_FILE  This is only used when --login and --password is not used. Matomo will read the configuration file (default: /var/www/html/matomo/config/config.ini.php) to fetch the Super User
                        token_auth from the config file.
  --login LOGIN         You can manually specify the Matomo Super User login
  --password PASSWORD   You can manually specify the Matomo Super User password
  --token-auth MATOMO_TOKEN_AUTH
                        Matomo user token_auth, the token_auth is found in Matomo > Settings > API. You must use a token_auth that has at least 'admin' or 'super user' permission. If you use a token_auth for a
                        non admin user, your users' IP addresses will not be tracked properly.
  --hostname HOSTNAMES  Accepted hostname (requests with other hostnames will be excluded). You may use the star character * Example: --hostname=*domain.com Can be specified multiple times
  --exclude-path EXCLUDED_PATHS
                        Any URL path matching this exclude-path will not be imported in Matomo. You must use the star character *. Example: --exclude-path=*/admin/* Can be specified multiple times.
  --exclude-path-from EXCLUDE_PATH_FROM
                        Each line from this file is a path to exclude. Each path must contain the character * to match a string. (see: --exclude-path)
  --include-path INCLUDED_PATHS
                        Paths to include. Can be specified multiple times. If not specified, all paths are included.
  --include-path-from INCLUDE_PATH_FROM
                        Each line from this file is a path to include
  --useragent-exclude EXCLUDED_USERAGENTS
                        User agents to exclude (in addition to the standard excluded user agents). Can be specified multiple times
  --enable-static       Track static files (images, css, js, ico, ttf, etc.)
  --enable-bots         Track bots. All bot visits will have a Custom Variable set with name='Bot' and value='$Bot_user_agent_here$'
  --enable-http-errors  Track HTTP errors (status code 4xx or 5xx)
  --enable-http-redirects
                        Track HTTP redirects (status code 3xx except 304)
  --enable-reverse-dns  Enable reverse DNS, used to generate the 'Providers' report in Matomo. Disabled by default, as it impacts performance
  --strip-query-string  Strip the query string from the URL
  --query-string-delimiter QUERY_STRING_DELIMITER
                        The query string delimiter (default: ?)
  --log-format-name LOG_FORMAT_NAME
                        Access log format to detect (supported are: amazon_cloudfront, common, common_complete, common_vhost, elb, gandi, haproxy, icecast2, iis, incapsula_w3c, ncsa_extended, nginx_json, ovh,
                        s3, shoutcast, traefik_json, w3c_extended). When not specified, the log format will be autodetected by trying all supported log formats.
  --log-format-regex LOG_FORMAT_REGEX
                        Regular expression used to parse log entries. Regexes must contain named groups for different log fields. Recognized fields include: date, path, query_string, ip, user_agent, referrer,
                        status, length, host, userid, generation_time_milli, event_action, event_name, timezone, session_time. For an example of a supported Regex, see the source code of this file. Overrides
                        --log-format-name.
  --log-date-format LOG_DATE_FORMAT
                        Format string used to parse dates. You can specify any format that can also be specified to the strptime python function.
  --log-hostname LOG_HOSTNAME
                        Force this hostname for a log format that doesn't include it. All hits will seem to come to this host
  --skip SKIP           Skip the n first lines to start parsing/importing data at a given line for the specified log file
  --recorders RECORDERS
                        Number of simultaneous recorders (default: 1). It should be set to the number of CPU cores in your server. You can also experiment with higher values which may increase performance
                        until a certain point
  --recorder-max-payload-size RECORDER_MAX_PAYLOAD_SIZE
                        Maximum number of log entries to record in one tracking request (default: 200).
  --replay-tracking     Replay piwik.php requests found in custom logs (only piwik.php requests expected). See https://matomo.org/faq/how-to/faq_17033/
  --replay-tracking-expected-tracker-file REPLAY_TRACKING_EXPECTED_TRACKER_FILE
                        The expected suffix for tracking request paths. Only logs whose paths end with this will be imported. By default requests to the piwik.php file or the matomo.php file will be imported.
  --output OUTPUT       Redirect output (stdout and stderr) to the specified file
  --encoding ENCODING   Log files encoding (default: utf8)
  --disable-bulk-tracking
                        Disables use of bulk tracking so recorders record one hit at a time.
  --debug-force-one-hit-every-Ns FORCE_ONE_ACTION_INTERVAL
                        Debug option that will force each recorder to record one hit every N secs.
  --force-lowercase-path
                        Make URL path lowercase so paths with the same letters but different cases are treated the same.
  --enable-testmode     If set, it will try to get the token_auth from the matomo_tests directory
  --download-extensions DOWNLOAD_EXTENSIONS
                        By default Matomo tracks as Downloads the most popular file extensions. If you set this parameter (format: pdf,doc,...) then files with an extension found in the list will be imported
                        as Downloads, other file extensions downloads will be skipped.
  --add-download-extensions EXTRA_DOWNLOAD_EXTENSIONS
                        Add extensions that should be treated as downloads. See --download-extensions for more info.
  --w3c-map-field KEY=VAL
                        Map a custom log entry field in your W3C log to a default one. Use this option to load custom log files that use the W3C extended log format such as those from the Advanced Logging W3C
                        module. Used as, eg, --w3c-map-field my-date=date. Recognized default fields include: date, time, cs-uri-stem, cs-uri-query, c-ip, cs(User-Agent), cs(Referer), sc-status, sc-bytes, cs-
                        host, cs-method, cs-username, time-taken Formats that extend the W3C extended log format (like the cloudfront RTMP log format) may define more fields that can be mapped.
  --w3c-time-taken-millisecs
                        If set, interprets the time-taken W3C log field as a number of milliseconds. This must be set for importing IIS logs.
  --w3c-fields W3C_FIELDS
                        Specify the '#Fields:' line for a log file in the W3C Extended log file format. Use this option if your log file doesn't contain the '#Fields:' line which is required for parsing. This
                        option must be used in conjunction with --log-format-name=w3c_extended. Example: --w3c-fields='#Fields: date time c-ip ...'
  --w3c-field-regex KEY=VAL
                        Specify a regex for a field in your W3C extended log file. You can use this option to parse fields the importer does not natively recognize and then use one of the --regex-group-to-XXX-
                        cvar options to track the field in a custom variable. For example, specifying --w3c-field-regex=sc-win32-status=(?P<win32_status>\S+) --regex-group-to-page-cvar="win32_status=Windows
                        Status Code" will track the sc-win32-status IIS field in the 'Windows Status Code' custom variable. Regexes must contain a named group.
  --title-category-delimiter TITLE_CATEGORY_DELIMITER
                        If --enable-http-errors is used, errors are shown in the page titles report. If you have changed General.action_title_category_delimiter in your Matomo configuration, you need to set
                        this option to the same value in order to get a pretty page titles report.
  --dump-log-regex      Prints out the regex string used to parse log lines and exists. Can be useful for using formats in newer versions of the script in older versions of the script. The output regex can be
                        used with the --log-format-regex option.
  --ignore-groups REGEX_GROUPS_TO_IGNORE
                        Comma separated list of regex groups to ignore when parsing log lines. Can be used to, for example, disable normal user id tracking. See documentation for --log-format-regex for list of
                        available regex groups.
  --regex-group-to-visit-cvar KEY=VAL
                        Track an attribute through a custom variable with visit scope instead of through Matomo's normal approach. For example, to track usernames as a custom variable instead of through the
                        uid tracking parameter, supply --regex-group-to-visit-cvar="userid=User Name". This will track usernames in a custom variable named 'User Name'. The list of available regex groups can
                        be found in the documentation for --log-format-regex (additional regex groups you may have defined in --log-format-regex can also be used).
  --regex-group-to-page-cvar KEY=VAL
                        Track an attribute through a custom variable with page scope instead of through Matomo's normal approach. For example, to track usernames as a custom variable instead of through the uid
                        tracking parameter, supply --regex-group-to-page-cvar="userid=User Name". This will track usernames in a custom variable named 'User Name'. The list of available regex groups can be
                        found in the documentation for --log-format-regex (additional regex groups you may have defined in --log-format-regex can also be used).
  --track-http-method TRACK_HTTP_METHOD
                        Enables tracking of http method as custom page variable if method group is available in log format.
  --retry-max-attempts MAX_ATTEMPTS
                        The maximum number of times to retry a failed tracking request.
  --retry-delay DELAY_AFTER_FAILURE
                        The number of seconds to wait before retrying a failed tracking request.
  --request-timeout REQUEST_TIMEOUT
                        The maximum number of seconds to wait before terminating an HTTP request to Matomo.
  --include-host INCLUDE_HOST
                        Only import logs from the specified host(s).
  --exclude-host EXCLUDE_HOST
                        Only import logs that are not from the specified host(s).
  --exclude-older-than EXCLUDE_OLDER_THAN
                        Ignore logs older than the specified date. Exclusive. Date format must be YYYY-MM-DD hh:mm:ss +/-0000. The timezone offset is required.
  --exclude-newer-than EXCLUDE_NEWER_THAN
                        Ignore logs newer than the specified date. Exclusive. Date format must be YYYY-MM-DD hh:mm:ss +/-0000. The timezone offset is required.
  --add-to-date SECONDS_TO_ADD_TO_DATE
                        A number of seconds to add to each date value in the log file.
  --request-suffix REQUEST_SUFFIX
                        Extra parameters to append to tracker and API requests.
  --accept-invalid-ssl-certificate
                        Do not verify the SSL / TLS certificate when contacting the Matomo server.
  --php-binary PHP_BINARY
                        Specify the PHP binary to use.

About Matomo Server Log Analytics: https://matomo.org/log-analytics/ Found a bug? Please create a ticket in https://github.com/matomo-org/matomo-log-analytics/ Please send your suggestions or successful user story to hello@matomo.org


LINKS