220_Token_normalization/10_Lowercasing.asciidoc · 缠中说禅/elasticsearch-definitive-guide

4月12日模力方舟 AI 应用沙龙 · 杭州站报名开放，产研前线第一手干货，AI 开发者必冲！

加入 Gitee

与超过 1200万开发者一起发现、参与优秀开源项目，私有仓库也完全免费：）

文件

克隆/下载

10_Lowercasing.asciidoc 1.54 KB

[[lowercase-token-filter]]
=== In That Case

The most frequently used token filter is the `lowercase` filter, which does
exactly what you would expect; it transforms ((("tokens", "normalizing", "lowercase filter")))((("lowercase token filter")))each token into its lowercase
form:

[source,js]
--------------------------------------------------
GET /_analyze?tokenizer=standard&filters=lowercase
The QUICK Brown FOX! <1>
--------------------------------------------------
<1> Emits tokens `the`, `quick`, `brown`, `fox`

It doesn't matter whether users search for `fox` or `FOX`, as long as the same
analysis process is applied at query time and at search time. The `lowercase`
filter will transform a query for `FOX` into a query for `fox`, which is the
same  token that we have stored in our inverted index.

To use token filters as part of the analysis process, we ((("analyzers", "using token filters")))((("token filters", "using with analyzers")))can create a `custom`
analyzer:

[source,js]
--------------------------------------------------
PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_lowercaser": {
          "tokenizer": "standard",
          "filter":  [ "lowercase" ]
        }
      }
    }
  }
}
--------------------------------------------------

And we can test it out with the `analyze` API:

[source,js]
--------------------------------------------------
GET /my_index/_analyze?analyzer=my_lowercaser
The QUICK Brown FOX! <1>
--------------------------------------------------
<1> Emits tokens `the`, `quick`, `brown`, `fox`

一键复制编辑原始数据按行查看历史

提交于 2014-11-23 10:24 +08:00 . Edited 220_Token_normalization/10_Lowercasing.asciidoc with Atlas code editor

In That Case

The most frequently used token filter is the lowercase filter, which does exactly what you would expect; it transforms each token into its lowercase form:

GET /_analyze?tokenizer=standard&filters=lowercase
The QUICK Brown FOX! (1)

Emits tokens the, quick, brown, fox

It doesn’t matter whether users search for fox or FOX, as long as the same analysis process is applied at query time and at search time. The lowercase filter will transform a query for FOX into a query for fox, which is the same token that we have stored in our inverted index.

To use token filters as part of the analysis process, we can create a custom analyzer:

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_lowercaser": {
          "tokenizer": "standard",
          "filter":  [ "lowercase" ]
        }
      }
    }
  }
}

And we can test it out with the analyze API:

GET /my_index/_analyze?analyzer=my_lowercaser
The QUICK Brown FOX! (1)

Emits tokens the, quick, brown, fox

马建仓 AI 助手

尝试更多

代码解读

代码找茬

代码优化

https://gitee.com/SFAC_hds/elasticsearch-definitive-guide.git

[email protected]:SFAC_hds/elasticsearch-definitive-guide.git

SFAC_hds

elasticsearch-definitive-guide

master

缠中说禅/elasticsearch-definitive-guide

In That Case

简介

发行版

贡献者

近期动态

缠中说禅/elasticsearch-definitive-guide .gitee-modal { width: 500px !important; }

In That Case

简介

发行版

贡献者

近期动态

搜索帮助

缠中说禅/elasticsearch-definitive-guide