Webanalytics
Categories:
Matomo Analysis of the use of the Search Index
To get an overview of how the Search Index is used (e.g. number of visits in a period etc), an analysis via Matomo can be activated. For this purpose Matomo is installed. A cron job can be set up to analyze the access logs once a day or/and a Matomo Tag manager connection can be set up in SIDRE frontends to track usage directly in the frontend. The overview of the analysis can then be viewed in the Matomo-UI via the browser:
- http://192.168.98.115:8923/matomo (set your host and port instead of the OERSI Vagrant-VM)
Requirements
Already covered by the other roles in sidre: Installation of MariaDB and Nginx.
Configuration
matomo_install
: set totrue
to install Matomomatomo_public
: set totrue
for Matomo access at the public SIDRE urlmatomo_superuser_name
: Username of the Adminmatomo_superuser_password
: Password of the Adminmatomo_superuser_mail
: E-Mail of the Adminmatomo_dbname
: Name of the database for Matomomatomo_dbuser
: User for access to the Matomo databasematomo_dbpassword
: Password of the database usersearch_index_frontend_analytics_matomo_tracking
: set to"true"
to activate Matomo Tag Manager tracking in the SIDRE frontend (see below)search_index_frontend_analytics_matomo_container_url
: URL of the Matomo Tag Manager container to be used in the SIDRE frontend (see below)
Nginx Access Log Analysis
Analysis of GET requests to the Search Index that are logged in the Nginx access log. A cron job will be set up to analyze the access log once a day. The analysis is stored in the Matomo database and can be viewed in the Matomo UI.
Details of Access-Log-Analysis for SIDRE Frontend requests
- A call to a detail-page is a single request and can be found as single
*/_doc/*
entry in the matomo-page-view-analysis. - An initial call to the search-page consists of multiple (8) requests to
*/_msearch
and therefore causes multiple entries in the matomo-page-view-analysis. - Using the search-field or the author-search-field will process multiple (no. depends on typing) requests to
*/_msearch
, because the underlying data is requested while typing. => also causes multiple entries in the matomo-page-view-analysis - Use of a filter is just a single request to
*/_msearch
therefore causes just a single entry in the matomo-page-view-analysis.
Matomo Tag Manager Analysis
Each SIDRE frontend can be configured to use the Matomo Tag Manager to analyze front end usage. To do this, the Tag Manager container must be included into the frontend page. The container can then be configured to track page views, clicks on specific elements, etc.
A separate container must be created manually in Matomo for each frontend. It is recommended to use a separate website and container for each frontend so that the data can be evaluated separately.
Activation example
- Install Matomo and ensure public access.
- Create a new website in Matomo for your frontend.
- Configure your Matomo Tag Manager container - see also here.
- Navigate to “Tag Manager” in the Matomo UI and select your website
- Create a new trigger “History Change” with “History Change” trigger selected under the “User Engagement” section
- Assure a “Pageview” trigger exists (default)
- Assure a tag “Matomo Analytics” exists with tracking type “Pageview” (default)
- Edit the tag “Matomo Analytics”
- Set the Custom Title to
{{PageTitle}}
. - Set the Custom URL to
{{PageUrl}}
. - Under the option “Execute this tag when any of these triggers are triggered”, select the “History Change” and “Pageview” triggers that we created.
- Open “Advanced Settings” and set “Delay” to
400
(to ensure the correct page title is tracked, otherwise the page title may be incorrect).
- Set the Custom Title to
- Set your container URL in the SIDRE frontend configuration via
search_index_frontend_analytics_matomo_container_url
(e.g.https://<SIDRE_DOMAIN><SIDRE_BASE_PATH>/matomo/js/container_<CONTAINER_ID>.js
) and activate tracking viasearch_index_frontend_analytics_matomo_tracking: "true"
. - Use the Preview/Debug mode to test and ensure that your Triggers & Tag are working as expected.
- Publish a new version of the container.
Elasticsearch request logging & analysis
The elasticsearch requests to the public (read-only) endpoint (https://<SIDRE_DOMAIN><SIDRE_BASE_PATH>/api/search/) can be logged, including, path, body etc of the request and number of results. Logged requests will be persisted per date in (sidre-internal) indices search_index_backend_elasticsearch_request_log-${DATE}
(Date-Format YYYY-MM-DD) and cleaned up automatically after a period of time. All requests can be accessed via an alias search_index_backend_elasticsearch_request_log
.
Additionally there is a script that runs once a day and analyzes the logged requests of the day before - it extracts, for example, things like search-terms from the requests (which may be provided at several different locations). Afterwards this is stored in the same index and record in the field requestAnalysis
. This may be used for a visualization of the request-usages like search-terms (elasticsearch-aggs or even a Kibana visualization may be possible).
Configuration
- This feature can be activated via feature toggle
search_index_backend_features_log_elasticsearch_requests: "true"
. - Cleanup configuration via
search_index_backend_log_elasticsearch_requests_cleanup_age
example: Top 100 search-terms
GET search_index_backend_elasticsearch_request_log/_search
{
"size": 0,
"aggs": {
"search": {
"terms": {
"field": "requestAnalysis.searchTerms.keyword",
"size": 100
}
}
}
}