Additional information
Table of contents
The server-side search system consists of two main components:
-
Dr.Explain Search Backend Controller: handles HTTP requests from browsers and other clients and communicates with Apache Solr search engine (see below).
-
Apache Solr: search engine with custom libraries and dictionaries for lemmatization and synonym support.
-
A server with 4 GB RAM and 3 GB disk space.
-
Docker Compose installed.
-
A web server (e.g. nginx or Apache).
-
A directory with exported HTML files.
You can host HTML files on one server and install the search engine on another. In this case you need to:
-
Tell the search engine how to load HTML files.
-
Forward HTTP requests containing /drexsearch/ to the search engine.To load HTML files via URL, modify the docker-compose.yaml file: under drexplain-search-backend-controller remove volumes section and in environment section specify URLs instead of paths:
environment:
- DREX_CONTENT_ROOT_URL_EN=https://your-website.com/docs-en/
- DREX_CONTENT_ROOT_URL_ES=https://your-website.com/docs-es/
To forward HTTP search request, you have two options depending on whether you can modify the configuration of the web server where your HTML files are hosted.
Option 1: forwarding HTTP requests containing /drexsearch/ to the search engine by the web server where HTML files are hosted
In ports section of docker-compse.yaml specify the ports without the "127.0.0.1:" part:
ports:
# WARNING: this will expose the service to all network interfaces.
# Make sure you have proper firewall rules in place.
- "8082:8082"
In your web server configuration replace "127.0.0.1" with the address of the server where your search engine is hosted. Note: communication between your web server and the search engine will be in plain HTTP. Make sure that both servers are connected to the same local or secure network (e.g. via VPN).
Restart search engine:
docker compose down
docker compose up -d
Option 2: forwarding HTTP requests containing /drexsearch/ to the search engine by the web server where search engine is hosted
For this approach, you will need to install a web server on the same machine with the search engine, then configure it to forward requests containing /drexsearch/ to the search engine, and specify the URL to this web server in Dr.Explain project's HTML export settings:
-
Select Search engine and HTML files are hosted on different servers checkbox.
-
Specify the URL of your search engine server in the Search engine server URL input field.
-
Export your Dr.Explain project to HTML and publish exported files to your web server.
The search functionality is highly customizable via the settings (see below). To use the settings:
-
Create a file named .env and place it in the same folder as docker-compose.yaml.
-
Copy and paste necessary lines into this .env file.
-
Edit docker-compose.yaml by adding env_file section under drexplain-search-backend-controller:
drexplain-search-backend-controller:
env_file: ".env"
image: ghcr.io/indigobyte/dsb-controller:latest
...
Alternatively, instead of creating an .env file, you can specify the required settings in the environment section of docker-compose.yaml:
drexplain-search-backend-controller:
environment:
- DSBC_JAVA_MAX_HEAP_SIZE=150m
- SEARCH_OCCURRENCE_BOOST_TITLE=1
Full list of settings (in a copy-and-paste friendly format):
# This is the port to which you must redirect all incoming /drexsearch/SOMETHING HTTP requests,
# e.g. https://your-website.com/any/number/of/subfolders/drexsearch/SOMETHING to http://127.0.0.1:THIS_PORT/SOMETHING
DSBC_PORT=8082
# Specify here the URLs or mounted paths of HTML files exported by Dr.Explain. For example, if some topic is available via
# https://your-website.com/manual/some-topic.html, then write DREX_CONTENT_ROOT_URL_1=https://your-website.com/manual/
# Make sure that all URLs are properly escaped according to the RFC 3986 standard and that punycode is used for non-ASCII domain
# names. For example, you can open the browser's web developer tools console and run inside it this command
# (replace the URL with your own):
# new URL("https://your-website.com/manual/some path with whitespaces and other special characters/").href
# Use the URL you get.
DREX_CONTENT_ROOT_URL_1=https://your-website.com/docs/
DREX_CONTENT_ROOT_URL_2=https://your-website.com/docs-es/
DREX_CONTENT_ROOT_URL_3=https://your-website.com/docs-de/
# Path to logs inside the Dr.Explain Search Backend Controller container. This is where DSB_LOG_EXT_PATH
# mounts to inside the container.
DSB_LOG_INT_PATH=/var/logs/dsbc
# Supported levels (lower to higher): TRACE, DEBUG, INFO, WARN, ERROR, FATAL. Setting level to DEBUG or TRACE
# is not recommended and can lead to performance degradation.
# Must be same or higher than DSBC_LOG_LEVEL_ROOT.
DSBC_LOG_LEVEL_LOGIC=INFO
# Supported levels (lower to higher): TRACE, DEBUG, INFO, WARN, ERROR, FATAL. Setting level to DEBUG or TRACE
# is not recommended and can lead to performance degradation.
DSBC_LOG_LEVEL_ROOT=INFO
# Maximum size of a single log file in bytes before it's rotated.
DSBC_LOG_MAX_SIZE=1048576
# Maximum number of rotated versions of the same log file to retain.
DSBC_LOG_MAX_HISTORY=100
# Max size of Java heap for Dr.Explain Search Backend Controller application
DSBC_JAVA_MAX_HEAP_SIZE=128m
DSBC_JAVA_MAX_METASPACE_SIZE=10m
DSBC_JAVA_MAX_DIRECT_MEMORY_SIZE=2m
DSBC_JAVA_RESERVED_CODE_CACHE_SIZE=8m
DSBC_JAVA_THREAD_STACK_SIZE=512k
# Port of plain HTTP proxy the Dr.Explain Search Backend Controller uses to communicate with search engine and the outside
# world, e.g. downloading files to index from website with HTML files exported from Dr.Explain when using
# DREX_CONTENT_ROOT_URL_XXX environment variables (see below). Setting HTTP_PROXY_PORT to 0 disables HTTP proxy and
# makes Dr.Explain Search Backend Controller ignore all other HTTP_PROXY_XXX settings (i.e. all HTTP requests will be
# sent directly to corresponding hosts).
HTTP_PROXY_PORT=0
# HTTP proxy host. Ignored if HTTP_PROXY_PORT is 0. You might need additional configuration to make
# host.docker.internal work on Linux. Refer to Docker's documentation.
HTTP_PROXY_HOST=host.docker.internal
# Set this to true to monitor HTTPS requests via a MitM proxy.
HTTP_PROXY_ALLOW_SELF_SIGNED_CERTIFICATES=false
# These settings make Dr.Explain Search Backend Controller return detailed information about errors that happen during
# processing of HTTP requests (when possible).
DSBC_REST_PRINT_PROBLEM_DETAILS=false
DSBC_REST_SERVER_ERROR_INCLUDE_BINDING_ERRORS=never
DSBC_REST_SERVER_ERROR_INCLUDE_EXCEPTION=false
DSBC_REST_SERVER_ERROR_INCLUDE_MESSAGE=never
DSBC_REST_SERVER_ERROR_INCLUDE_STACKTRACE=never
# Search engine server connection settings.
DSBC_SEARCH_SERVER_SCHEME=http
DSBC_SEARCH_SERVER_HOST=solr
DSBC_SEARCH_SERVER_PORT=8983
# Search engine credentials. The search engine is not accessible outside of Docker Compose stack, so it's safe to use
# the default Solr credentials. After updating credentials you must delete old data via "docker compose down -v"
# and restart Dr.Explain Search Backend.
DSBC_SEARCH_SERVER_USER=solr
DSBC_SEARCH_SERVER_PASSWORD=SolrRocks
# This is the prefix used to distinguish search indexes created by Dr.Explain Search Backend from all other indexes.
# When Dr.Explain Search Backend updates indexes, it deletes all old indexes whose names start with this prefix.
# If you use the same search engine for multiple Dr.Explain Search Backend instances, set a unique value for each
# instance so that they won't interfere with each other's search indexes.
DSBC_SEARCH_INDEX_PREFIX=drex-
# Maximum length of the highlighted fragment (search snippets), in characters. Setting this to 0 will display
# the entire text of the topic in the snippets.
SEARCH_HIGHLIGHT_FRAGMENT_SIZE=100
# If the topic's description does not contain search query (e.g. it's only in the title), then search
# snippets will contain the first SEARCH_DESCRIPTION_PREVIEW_MAX_LENGTH characters of the topic's description instead.
SEARCH_DESCRIPTION_PREVIEW_MAX_LENGTH=200
# Multiplier applied to the score of search hits in topic's title.
SEARCH_OCCURRENCE_BOOST_TITLE=2
# Multiplier applied to the score of search hits in topic's description.
SEARCH_OCCURRENCE_BOOST_DESCRIPTION=1
# Multiplier applied to the score of partial search hits (i.e. when the word from search query is contained inside
# another word, e.g. when searching for "doc", word "paddock" is a partial search hit).
SEARCH_OCCURRENCE_BOOST_PARTIAL=0.5
# Coefficient applied to exact search hits, i.e. when text in topic contains exact words from the search query
# in any order (case-insensitive).
SEARCH_OCCURRENCE_BOOST_EXACT=5
# Coefficient applied to exact search hits, i.e. when text in topic contains exact words from the search query
# in the same order, without other words between them (case-insensitive)
SEARCH_OCCURRENCE_BOOST_EXACT_PHRASE=10
# Search hits with score below minimum will not be displayed
SEARCH_MIN_SCORE=0.0
# Maximum number of search results returned at once. Value larger than 20 might result in higher memory consumption
# and slower response times.
SEARCH_MAX_PAGE_SIZE=20
# Specifies the delay in seconds between polling attempts
# to verify search index initialization following a creation command.
SEARCH_INDEX_CREATION_CHECK_DELAY_BETWEEN_ATTEMPTS_SECONDS=1
# This is the number of attempts to check whether search index was initialized
SEARCH_INDEX_CREATION_CHECK_MAX_ATTEMPTS=10
# Used for debugging only. Setting this to false makes search engine return responses in XML instead of binary format. Do not use in production.
SEARCH_SOLR_BINARY_REQUEST_WRITER=true
# When indexing topics, search engine will commit (write to disk) changes every so often. This is the time between
# commits in milliseconds. Setting it too high or too low might increase memory or CPU consumption.
SEARCH_SOLR_COMMIT_TIME_MILLIS=5000
# Used for debugging. If true, search engine will return detailed information about search query and search hits.
SEARCH_DEBUG_ENABLED=false
# Maximum number of characters to copy from another field (e.g. description) to create a copy of the field.
SEARCH_SOLR_COPY_FIELD_MAX_CHARS=200
# Only the first N characters of the document will be searched when creating the search snippet.
SEARCH_HIGHLIGHT_MAX_DOCUMENT_SIZE_IN_CHARS_TO_HIGHLIGHT=200000
# Minimum should match parameter for Solr DisMax query parser.
# Controls how many query terms must match for a document to be returned.
# Format: "X<-Y%" means: if X or fewer terms, all must match; if more than X terms, at least Y% must match.
# Example: "2<-25%" means: if 2 terms, both must match; if more than 2 terms, at least 25% must match.
SEARCH_DIS_MAX_MM=2<-25%
# Time the Dr.Explain Search Backend Controller will wait after startup before checking whether the search index
# needs to be updated. It's advised to set this value to at least 5 seconds in order to let the search engine
# fully initialize and become ready for incoming requests.
SCHEDULED_SEARCH_CHECK_CONTENT_UPDATE_INITIAL_DELAY_SECONDS=10
# Dr.Explain Search Backend Controller will periodically check whether the search index needs updating. This is the time
# between when the previous check ends and the next check begins.
SCHEDULED_SEARCH_CHECK_CONTENT_UPDATE_FIXED_DELAY_SECONDS=300
# Set this to false to disable Dr.Explain Search Backend Controller from checking whether search index needs updating.
SCHEDULED_SEARCH_CHECK_CONTENT_UPDATE_ENABLED=true
|
|
Search query token
|
Example
|
Explanation
|
|
+
|
Bob likes +apples
|
The following term must be present in the topic. Searching for Bob likes +apples will ignore topics without word "apples".
|
|
-
|
Bob likes -apples
|
The following term must not be present in the topic. Searching for Bob likes -apples will ignore topics with word "apples".
|
|
( and )
|
Bob likes & (apples | pears)
|
Used to nest logic operators like (a | b) & c. Searching for Bob likes & (apples | pears) is equivalent to running two search queries Bob likes & apples and Bob likes & pears and merging their results.
|
|
"
|
Bob "likes apples"
|
Searches for the topics containing exact words. Searching for Bob "likes apples" will include topics containing exact words "likes" and "apples".
|
|
& and |
|
Bob likes & (apples | pears)
|
Logical "and" (&) and "or" (|). Searching for Bob likes & (apples | pears) is equivalent to running two search queries Bob likes & apples and Bob likes & pears and merging their results.
|
To update the search engine to the latest version, follow these steps:
-
Connect to the server where the search engine is installed via SSH.
-
Navigate to the directory containing docker-compose.yaml (usually /opt/drexplain-search):cd /opt/drexplain-search
-
Stop the search engine and remove all associated resources:docker compose down -v --rmi all
-
Review the latest documentation for setup instructions. Check for any changes that may require updates to your docker-compose.yaml file and apply them accordingly.