MATOMO - Import Command Example
From Wiki.IT-Arts.net
MATOMO PYTHON EXAMPLE COMMAND
##### MATOMO PYTHON COMMAND # eg : python3 /var/www/html/matomo/misc/log-analytics/import_logs.py --idsite=1 --recorders=4 --url='http://matomo.lanv' --recorders=4 --enable-http-errors --enable-http-redirects --enable-static --enable-bots /var/log/nginx/<WEBSITE.COM>.access.log.*.gz --accept-invalid-ssl-certificate
MAN FILE
########################################################################################### #####:~# python3 /var/www/html/matomo/misc/log-analytics/import_logs.py --help usage: import_logs.py [-h] [--auth-user AUTH_USER] [--auth-password AUTH_PASSWORD] [--debug] [--debug-tracker] [--debug-request-limit DEBUG_REQUEST_LIMIT] --url MATOMO_URL [--api-url MATOMO_API_URL] [--tracker-endpoint-path MATOMO_TRACKER_ENDPOINT_PATH] [--dry-run] [--show-progress] [--show-progress-delay SHOW_PROGRESS_DELAY] [--add-sites-new-hosts] [--idsite SITE_ID] [--idsite-fallback SITE_ID_FALLBACK] [--config CONFIG_FILE] [--login LOGIN] [--password PASSWORD] [--token-auth MATOMO_TOKEN_AUTH] [--hostname HOSTNAMES] [--exclude-path EXCLUDED_PATHS] [--exclude-path-from EXCLUDE_PATH_FROM] [--include-path INCLUDED_PATHS] [--include-path-from INCLUDE_PATH_FROM] [--useragent-exclude EXCLUDED_USERAGENTS] [--enable-static] [--enable-bots] [--enable-http-errors] [--enable-http-redirects] [--enable-reverse-dns] [--strip-query-string] [--query-string-delimiter QUERY_STRING_DELIMITER] [--log-format-name LOG_FORMAT_NAME] [--log-format-regex LOG_FORMAT_REGEX] [--log-date-format LOG_DATE_FORMAT] [--log-hostname LOG_HOSTNAME] [--skip SKIP] [--recorders RECORDERS] [--recorder-max-payload-size RECORDER_MAX_PAYLOAD_SIZE] [--replay-tracking] [--replay-tracking-expected-tracker-file REPLAY_TRACKING_EXPECTED_TRACKER_FILE] [--output OUTPUT] [--encoding ENCODING] [--disable-bulk-tracking] [--debug-force-one-hit-every-Ns FORCE_ONE_ACTION_INTERVAL] [--force-lowercase-path] [--enable-testmode] [--download-extensions DOWNLOAD_EXTENSIONS] [--add-download-extensions EXTRA_DOWNLOAD_EXTENSIONS] [--w3c-map-field KEY=VAL] [--w3c-time-taken-millisecs] [--w3c-fields W3C_FIELDS] [--w3c-field-regex KEY=VAL] [--title-category-delimiter TITLE_CATEGORY_DELIMITER] [--dump-log-regex] [--ignore-groups REGEX_GROUPS_TO_IGNORE] [--regex-group-to-visit-cvar KEY=VAL] [--regex-group-to-page-cvar KEY=VAL] [--track-http-method TRACK_HTTP_METHOD] [--retry-max-attempts MAX_ATTEMPTS] [--retry-delay DELAY_AFTER_FAILURE] [--request-timeout REQUEST_TIMEOUT] [--include-host INCLUDE_HOST] [--exclude-host EXCLUDE_HOST] [--exclude-older-than EXCLUDE_OLDER_THAN] [--exclude-newer-than EXCLUDE_NEWER_THAN] [--add-to-date SECONDS_TO_ADD_TO_DATE] [--request-suffix REQUEST_SUFFIX] [--accept-invalid-ssl-certificate] [--php-binary PHP_BINARY] file [file ...] Import HTTP access logs to Matomo. log_file is the path to a server access log file (uncompressed, .gz, .bz2, or specify - to read from stdin). You may also import many log files at once (for example set log_file to *.log or *.log.gz). By default, the script will try to produce clean reports and will exclude bots, static files, discard http error and redirects, etc. This is customizable, see below. positional arguments: file optional arguments: -h, --help show this help message and exit --auth-user AUTH_USER Basic auth user --auth-password AUTH_PASSWORD Basic auth password --debug, -d Enable debug output (specify multiple times for more verbose) --debug-tracker Appends &debug=1 to tracker requests and prints out the result so the tracker can be debugged. If using the log importer results in errors with the tracker or improperly recorded visits, this option can be used to find out what the tracker is doing wrong. To see debug tracker output, you must also set the [Tracker] debug_on_demand INI config to 1 in your Matomo's config.ini.php file. --debug-request-limit DEBUG_REQUEST_LIMIT Debug option that will exit after N requests are parsed. Can be used w/ --debug-tracker to limit the output of a large log file. --url MATOMO_URL REQUIRED Your Matomo server URL, eg. https://example.com/matomo/ or https://analytics.example.net --api-url MATOMO_API_URL This URL will be used to send API requests (use it if your tracker URL differs from UI/API url), eg. https://other-example.com/matomo/ or https://analytics-api.example.net --tracker-endpoint-path MATOMO_TRACKER_ENDPOINT_PATH The tracker endpoint path to use when tracking. Defaults to /piwik.php. --dry-run Perform a trial run with no tracking data being inserted into Matomo --show-progress Print a progress report X seconds (default: 1, use --show-progress-delay to override) --show-progress-delay SHOW_PROGRESS_DELAY Change the default progress delay --add-sites-new-hosts When a hostname is found in the log file, but not matched to any website in Matomo, automatically create a new website in Matomo with this hostname to import the logs --idsite SITE_ID When specified, data in the specified log files will be tracked for this Matomo site ID. The script will not auto-detect the website based on the log line hostname (new websites will not be automatically created). --idsite-fallback SITE_ID_FALLBACK Default Matomo site ID to use if the hostname doesn't match any known Website's URL. New websites will not be automatically created. Used only if --add-sites-new-hosts or --idsite are not set --config CONFIG_FILE This is only used when --login and --password is not used. Matomo will read the configuration file (default: /var/www/html/matomo/config/config.ini.php) to fetch the Super User token_auth from the config file. --login LOGIN You can manually specify the Matomo Super User login --password PASSWORD You can manually specify the Matomo Super User password --token-auth MATOMO_TOKEN_AUTH Matomo user token_auth, the token_auth is found in Matomo > Settings > API. You must use a token_auth that has at least 'admin' or 'super user' permission. If you use a token_auth for a non admin user, your users' IP addresses will not be tracked properly. --hostname HOSTNAMES Accepted hostname (requests with other hostnames will be excluded). You may use the star character * Example: --hostname=*domain.com Can be specified multiple times --exclude-path EXCLUDED_PATHS Any URL path matching this exclude-path will not be imported in Matomo. You must use the star character *. Example: --exclude-path=*/admin/* Can be specified multiple times. --exclude-path-from EXCLUDE_PATH_FROM Each line from this file is a path to exclude. Each path must contain the character * to match a string. (see: --exclude-path) --include-path INCLUDED_PATHS Paths to include. Can be specified multiple times. If not specified, all paths are included. --include-path-from INCLUDE_PATH_FROM Each line from this file is a path to include --useragent-exclude EXCLUDED_USERAGENTS User agents to exclude (in addition to the standard excluded user agents). Can be specified multiple times --enable-static Track static files (images, css, js, ico, ttf, etc.) --enable-bots Track bots. All bot visits will have a Custom Variable set with name='Bot' and value='$Bot_user_agent_here$' --enable-http-errors Track HTTP errors (status code 4xx or 5xx) --enable-http-redirects Track HTTP redirects (status code 3xx except 304) --enable-reverse-dns Enable reverse DNS, used to generate the 'Providers' report in Matomo. Disabled by default, as it impacts performance --strip-query-string Strip the query string from the URL --query-string-delimiter QUERY_STRING_DELIMITER The query string delimiter (default: ?) --log-format-name LOG_FORMAT_NAME Access log format to detect (supported are: amazon_cloudfront, common, common_complete, common_vhost, elb, gandi, haproxy, icecast2, iis, incapsula_w3c, ncsa_extended, nginx_json, ovh, s3, shoutcast, traefik_json, w3c_extended). When not specified, the log format will be autodetected by trying all supported log formats. --log-format-regex LOG_FORMAT_REGEX Regular expression used to parse log entries. Regexes must contain named groups for different log fields. Recognized fields include: date, path, query_string, ip, user_agent, referrer, status, length, host, userid, generation_time_milli, event_action, event_name, timezone, session_time. For an example of a supported Regex, see the source code of this file. Overrides --log-format-name. --log-date-format LOG_DATE_FORMAT Format string used to parse dates. You can specify any format that can also be specified to the strptime python function. --log-hostname LOG_HOSTNAME Force this hostname for a log format that doesn't include it. All hits will seem to come to this host --skip SKIP Skip the n first lines to start parsing/importing data at a given line for the specified log file --recorders RECORDERS Number of simultaneous recorders (default: 1). It should be set to the number of CPU cores in your server. You can also experiment with higher values which may increase performance until a certain point --recorder-max-payload-size RECORDER_MAX_PAYLOAD_SIZE Maximum number of log entries to record in one tracking request (default: 200). --replay-tracking Replay piwik.php requests found in custom logs (only piwik.php requests expected). See https://matomo.org/faq/how-to/faq_17033/ --replay-tracking-expected-tracker-file REPLAY_TRACKING_EXPECTED_TRACKER_FILE The expected suffix for tracking request paths. Only logs whose paths end with this will be imported. By default requests to the piwik.php file or the matomo.php file will be imported. --output OUTPUT Redirect output (stdout and stderr) to the specified file --encoding ENCODING Log files encoding (default: utf8) --disable-bulk-tracking Disables use of bulk tracking so recorders record one hit at a time. --debug-force-one-hit-every-Ns FORCE_ONE_ACTION_INTERVAL Debug option that will force each recorder to record one hit every N secs. --force-lowercase-path Make URL path lowercase so paths with the same letters but different cases are treated the same. --enable-testmode If set, it will try to get the token_auth from the matomo_tests directory --download-extensions DOWNLOAD_EXTENSIONS By default Matomo tracks as Downloads the most popular file extensions. If you set this parameter (format: pdf,doc,...) then files with an extension found in the list will be imported as Downloads, other file extensions downloads will be skipped. --add-download-extensions EXTRA_DOWNLOAD_EXTENSIONS Add extensions that should be treated as downloads. See --download-extensions for more info. --w3c-map-field KEY=VAL Map a custom log entry field in your W3C log to a default one. Use this option to load custom log files that use the W3C extended log format such as those from the Advanced Logging W3C module. Used as, eg, --w3c-map-field my-date=date. Recognized default fields include: date, time, cs-uri-stem, cs-uri-query, c-ip, cs(User-Agent), cs(Referer), sc-status, sc-bytes, cs- host, cs-method, cs-username, time-taken Formats that extend the W3C extended log format (like the cloudfront RTMP log format) may define more fields that can be mapped. --w3c-time-taken-millisecs If set, interprets the time-taken W3C log field as a number of milliseconds. This must be set for importing IIS logs. --w3c-fields W3C_FIELDS Specify the '#Fields:' line for a log file in the W3C Extended log file format. Use this option if your log file doesn't contain the '#Fields:' line which is required for parsing. This option must be used in conjunction with --log-format-name=w3c_extended. Example: --w3c-fields='#Fields: date time c-ip ...' --w3c-field-regex KEY=VAL Specify a regex for a field in your W3C extended log file. You can use this option to parse fields the importer does not natively recognize and then use one of the --regex-group-to-XXX- cvar options to track the field in a custom variable. For example, specifying --w3c-field-regex=sc-win32-status=(?P<win32_status>\S+) --regex-group-to-page-cvar="win32_status=Windows Status Code" will track the sc-win32-status IIS field in the 'Windows Status Code' custom variable. Regexes must contain a named group. --title-category-delimiter TITLE_CATEGORY_DELIMITER If --enable-http-errors is used, errors are shown in the page titles report. If you have changed General.action_title_category_delimiter in your Matomo configuration, you need to set this option to the same value in order to get a pretty page titles report. --dump-log-regex Prints out the regex string used to parse log lines and exists. Can be useful for using formats in newer versions of the script in older versions of the script. The output regex can be used with the --log-format-regex option. --ignore-groups REGEX_GROUPS_TO_IGNORE Comma separated list of regex groups to ignore when parsing log lines. Can be used to, for example, disable normal user id tracking. See documentation for --log-format-regex for list of available regex groups. --regex-group-to-visit-cvar KEY=VAL Track an attribute through a custom variable with visit scope instead of through Matomo's normal approach. For example, to track usernames as a custom variable instead of through the uid tracking parameter, supply --regex-group-to-visit-cvar="userid=User Name". This will track usernames in a custom variable named 'User Name'. The list of available regex groups can be found in the documentation for --log-format-regex (additional regex groups you may have defined in --log-format-regex can also be used). --regex-group-to-page-cvar KEY=VAL Track an attribute through a custom variable with page scope instead of through Matomo's normal approach. For example, to track usernames as a custom variable instead of through the uid tracking parameter, supply --regex-group-to-page-cvar="userid=User Name". This will track usernames in a custom variable named 'User Name'. The list of available regex groups can be found in the documentation for --log-format-regex (additional regex groups you may have defined in --log-format-regex can also be used). --track-http-method TRACK_HTTP_METHOD Enables tracking of http method as custom page variable if method group is available in log format. --retry-max-attempts MAX_ATTEMPTS The maximum number of times to retry a failed tracking request. --retry-delay DELAY_AFTER_FAILURE The number of seconds to wait before retrying a failed tracking request. --request-timeout REQUEST_TIMEOUT The maximum number of seconds to wait before terminating an HTTP request to Matomo. --include-host INCLUDE_HOST Only import logs from the specified host(s). --exclude-host EXCLUDE_HOST Only import logs that are not from the specified host(s). --exclude-older-than EXCLUDE_OLDER_THAN Ignore logs older than the specified date. Exclusive. Date format must be YYYY-MM-DD hh:mm:ss +/-0000. The timezone offset is required. --exclude-newer-than EXCLUDE_NEWER_THAN Ignore logs newer than the specified date. Exclusive. Date format must be YYYY-MM-DD hh:mm:ss +/-0000. The timezone offset is required. --add-to-date SECONDS_TO_ADD_TO_DATE A number of seconds to add to each date value in the log file. --request-suffix REQUEST_SUFFIX Extra parameters to append to tracker and API requests. --accept-invalid-ssl-certificate Do not verify the SSL / TLS certificate when contacting the Matomo server. --php-binary PHP_BINARY Specify the PHP binary to use.
About Matomo Server Log Analytics: https://matomo.org/log-analytics/ Found a bug? Please create a ticket in https://github.com/matomo-org/matomo-log-analytics/ Please send your suggestions or successful user story to hello@matomo.org
LINKS
- https://matomo.org/faq/on-premise/installing-matomo/
- https://github.com/matomo-org/matomo-log-analytics/#readme
- https://matomo.org/faq/general/how-do-i-run-the-log-file-importer-script-with-default-options/
- https://github.com/matomo-org/matomo-log-analytics/issues/344
- https://github.com/matomo-org/matomo-nginx
- https://www.linuxcapable.com/how-to-install-matomo-with-lemp-on-ubuntu-linux/
- https://matomo.org/faq/how-to-install/faq_98/
- https://www.restack.io/docs/matomo-knowledge-matomo-error-logs-guide
- https://github.com/matomo-org/matomo-log-analytics/issues/264