Our final topic in this chapter is about an internal aspect of Elasticsearch. While we don’t demonstrate any new techniques here, doc values are an important topic that we will refer to repeatedly, and is something that you should be aware of.
When you sort on a field, Elasticsearch needs access to the value of that field for every document that matches the query. The inverted index, which performs very well when searching, is not the ideal structure for sorting on field values:
When searching, we need to be able to map a term to a list of documents.
When sorting, we need to map a document to its terms. In other words, we need to ``uninvert'' the inverted index.
This uninverted'' structure is often called a
column-store'' in other systems.
Essentially, it stores all the values for a single field together in a single
column of data, which makes it very efficient for operations like sorting.
In Elasticsearch, this column-store is known as doc values, and is enabled by default. Doc values are created at index-time: when a field is indexed, Elasticsearch adds the tokens to the inverted index for search. But it also extracts the terms and adds them to the columnar doc values.
Doc values are used in several places in Elasticsearch:
Sorting on a field
Aggregations on a field
Certain filters (for example, geolocation filters)
Scripts that refer to fields
Because doc values are serialized to disk, we can leverage the OS to help keep access fast. When the "working set" is smaller than the available memory on a node, the OS will naturally keep all the doc values hot in memory, leading to very fast access. When the "working set" is much larger than available memory, the OS will naturally start to page doc-values on/off disk without running into the dreaded OutOfMemory exception.
We’ll talk about doc values in much greater depth later. For now, all you need to know is that sorting (and some other operations) happen on a parallel data structure which is built at index-time.
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。