MATOMO - Updating From Nginx Logs without duplicates
From Wiki.IT-Arts.net
Log Deduplication Problem
Log Import : Avoid Importing Duplicates
To avoid duplicates, there is no solution on community edition of Matomo.
Let's import data from logs files, then play around the --exclude-older-than option.
Bash Script Example
This dirty script store timestamp in a file to exclude older logs than last check while importing with the provided script /var/www/html/matomo/misc/log-analytics/import_logs.py
#!/bin/bash # # --exclude-older-than EXCLUDE_OLDER_THAN # Ignore logs older than the specified date. Exclusive. # Date format must be YYYY-MM-DD hh:mm:ss +/-0000. # The timezone offset is required. # # For print date on linux: date +"%Y-%m-%d %H:%M:%S %z" # VARIABLES SLEEP_TIME=1 TIMESTAMP_FILE="/root/last_run_timestamp_for_matomo.nfo" LOG_PATH="/var/log/matomo-archive.log" # GET TIMESTAMP OF LAST CHECK FROM FILE TIMESTAMP=$(cat "$TIMESTAMP_FILE") echo $TIMESTAMP # GET CURRENT TIMESTAMP NEW_TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S %z") # CUSTOM PYTHON IMPORT CUSTOM_COMMAND CUSTOM_COMMAND="python3 /var/www/html/matomo/misc/log-analytics/import_logs.py --accept-invalid-ssl-certificate --url=http://matomo.lanv --recorders=6 --enable-http-errors --enable-http-redirects --enable-static --enable-bots --debug-tracker" # RSYNC REVERSE PROXY LOGS IN /TMP logger "##### MATOMO SCRIPT : Beginning script" logger "##### MATOMO SCRIPT : Beginning rsync reverse proxy logs..." rsync -arvz -e "ssh -p 22" matomo@reverse-proxy.lan:/var/log/nginx/*example.org.access.log* /tmp/ >> $LOG_PATH rsync -arvz -e "ssh -p 22" matomo@reverse-proxy.lan:/var/log/nginx/*example.org.access.log*.1 /tmp/ >> $LOG_PATH rsync -arvz -e "ssh -p 22" matomo@reverse-proxy.lan:/var/log/nginx/*example2.com.access.log* /tmp/ >> $LOG_PATH rsync -arvz -e "ssh -p 22" matomo@reverse-proxy.lan:/var/log/nginx/*example2.com.access.log*.1 /tmp/ >> $LOG_PATH # IMPORTING LOGS logger "##### MATOMO SCRIPT : Beginning import Matomo.IT-Arts.net" $CUSTOM_COMMAND --exclude-older-than="$TIMESTAMP" --idsite=1 /tmp/matomo.example.org.access.log* >> $LOG_PATH sleep $SLEEP_TIME # AND SO ON... ... ... ... logger "##### MATOMO SCRIPT : Beginning archiving" cd /var/www/html/matomo && php console core:archive --force-all-websites --url='http://matomo.lan' >> $LOG_PATH # UPDATE TIMESTAMP logger "##### MATOMO SCRIPT : Updating timestamp in "$TIMESTAMP_FILE echo $NEW_TIMESTAMP > $TIMESTAMP_FILE logger "##### MATOMO SCRIPT : New timestamp : "$TIMESTAMP logger "##### MATOMO SCRIPT : End of script" exit 0
Links
- https://matomo.org/faq/on-premise/installing-matomo/
- https://github.com/matomo-org/matomo-log-analytics/#readme
- https://matomo.org/faq/general/how-do-i-run-the-log-file-importer-script-with-default-options/
- https://github.com/matomo-org/matomo-log-analytics/issues/344
- https://github.com/matomo-org/matomo-nginx
- https://www.linuxcapable.com/how-to-install-matomo-with-lemp-on-ubuntu-linux/
- https://matomo.org/faq/how-to-install/faq_98/
- https://www.restack.io/docs/matomo-knowledge-matomo-error-logs-guide
- https://github.com/matomo-org/matomo-log-analytics/issues/264