Internationalization
Categories:
Multilingual Search
We implemented a feature to include translations of SIDRE keywords into other languages in the search. It can be activated via config in sidre-setup. Then, for example, the results of a search for a German keyword will also include the results of the English/Spanish/French/… keyword.
A tool (internationalization-tool) provides the translations of the keywords (based on Wikidata-translations). From these translations a synonym file for elasticsearch is generated, which can be included in the configuration. An example for such a synonym file can be found here.
The feature can be enabled or disabled via toggle (ansible-variable search_index_features_use_synonyms
). If you change this configuration, the mapping will be adjusted and the index will be reindexed automatically by the setup. The synonym file will be placed in the SIDRE configuration directory (usually ~/conf
) and have the filename synonym.txt
(this will be linked into the Elasticsearch-config-dir by the setup). For testing purposes, the sample file can be used in the first phase of the feature development (set elasticsearch_synonyms_file: "synonyms-example.txt"
and it will be copied by ansible). So, the following configuration will enable the feature using the example file:
search_index_features_use_synonyms: true
elasticsearch_synonyms_file: "synonyms-example.txt"
For now, the tool has to be activated separately. So, the following configuration will enable the feature using automatic keyword-translations via the translation-tool:
search_index_features_use_synonyms: true
internationalization_tool_install: "{{ search_index_features_use_synonyms }}"
To configure the field name of the metadata field containing the keywords, use the ansible-variable internationalization_tool_search_index_keyword_fieldname
. Default is
internationalization_tool_search_index_keyword_fieldname: "keywords.keyword"
The synonyms are included during the Elasticsearch search (not during indexing). If there are changes to the synonym file, then the search analyzers need to be reloaded so that the changes are taken into account in the search. This can be done with the help of the script reload-search_analyzer.sh
.