Custom Token Filters
Token filters are used to modify the tokens produced by a tokenizer. Common token filters include lowercasing, removing stop words, stemming, and more.
Example: Lowercase and Stop Filter
Let’s create a custom analyzer that includes both lowercase and stop filters.
PUT /custom_filter_example
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "stop"]
}
}
}
}
}
Analyzing text:
GET /custom_filter_example/_analyze
{
"analyzer": "custom_analyzer",
"text": "Elasticsearch is a powerful search engine"
}
Output:
{
"tokens": [
{ "token": "elasticsearch", "start_offset": 0, "end_offset": 14, "type": "word", "position": 0 },
{ "token": "powerful", "start_offset": 20, "end_offset": 28, "type": "word", "position": 1 },
{ "token": "search", "start_offset": 29, "end_offset": 35, "type": "word", "position": 2 },
{ "token": "engine", "start_offset": 36, "end_offset": 42, "type": "word", "position": 3 }
]
}
In this example:
- The text is tokenized using the standard tokenizer.
- Tokens are converted to lowercase.
- Stop words (“is“, “a“) are removed.
Full Text Search with Analyzer and Tokenizer
Elasticsearch is renowned for its powerful full-text search capabilities. At the heart of this functionality are analyzers and tokenizers, which play a crucial role in how text is processed and indexed. This guide will help you understand how analyzers and tokenizers work in Elasticsearch, with detailed examples and outputs to make these concepts easy to grasp.