Elasticsearch Interview Questions & Answers

An index is similar to a table in relational databases. The difference is that relational databases would store actual values, which is optional in ElasticSearch. An index can store actual and/or analyzed values in an index.

A document is similar to a row in relational databases. The difference is that each document in an index can have a different structure (fields), but should have same data type for common fields.

Each field can occur multiple times in a document with different data types. Fields can contain other documents too.

  1. Yes, ElasticSearch can have mappings which can be used to enforce schema on documents.
  1. A document type can be seen as the document schema / mapping definition, which has the mapping of all the fields in the document along with its data types.
  1. The process of storing data in an index is called indexing in ElasticSearch. Data in ElasticSearch can be dividend into write-once and read-many segments. Whenever an update is attempted, a new version of the document is written to the index.
  1. Each instance of ElasticSearch is called a node. Multiple nodes can work in harmony to form an ElasticSearch Cluster.

Due to resource limitations like RAM, vCPU etc, for scale-out, applications need to employ multiple instances of ElasticSearch on separate machines. Data in an index can be divided into multiple partitions, each handled by a separate node (instance) of ElasticSearch. Each such partition is called a shard. By default an ElasticSearch index has 5 shards.

Each shard in ElasticSearch has 2 copy of the shard. These copies are called replicas. They serve the purpose of high-availability and fault-tolerance.

While indexing data in ElasticSearch, data is transformed internally by the Analyzer defined for the index, and then indexed. An analyzer is built of tokenizer and filters. Following types of Analyzers are available in ElasticSearch 1.10.

  • STANDARD ANALYZER
  • SIMPLE ANALYZER
  • WHITESPACE ANALYZER
  • STOP ANALYZER
  • KEYWORD ANALYZER
  • PATTERN ANALYZER
  • LANGUAGE ANALYZERS
  • SNOWBALL ANALYZER
  • CUSTOM ANALYZER

A Tokenizer breakdown fields values of a document into a stream, and inverted indexes are created and updates using these values, and these stream of values are stored in the document.

  1. After data is processed by Tokenizer, the same is processed by Filter, before indexing. Following types of Filters are available in ElasticSearch 1.10.
    • AND FILTER
    • BOOL FILTER
    • EXISTS FILTER
    • GEO BOUNDING BOX FILTER
    • GEO DISTANCE FILTER
    • GEO DISTANCE RANGE FILTER
    • GEO POLYGON FILTER
    • GEOSHAPE FILTER
    • GEOHASH CELL FILTER
    • HAS CHILD FILTER
    • HAS PARENT FILTER
    • IDS FILTER
    • INDICES FILTER
    • LIMIT FILTER
    • MATCH ALL FILTER
    • MISSING FILTER
    • NESTED FILTER
    • NOT FILTER
    • OR FILTER
    • PREFIX FILTER
    • QUERY FILTER
    • RANGE FILTER
    • REGEXP FILTER
    • SCRIPT FILTER
    • TERM FILTER
    • TERMS FILTER
    • TYPE FILTER
  1. ElasticSearch uses the Apache Lucene query language, which is called Query DSL.
  1. Elasticsearch is a search engine based on Lucene. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is released as open source under the terms of the Apache License.

The following operations can be performed on documents

  • INDEXING A DOCUMENT USING ELASTICSEARCH.
  • FETCHING DOCUMENTS USING ELASTICSEARCH.
  • UPDATING DOCUMENTS USING ELASTICSEARCH.
  • DELETING DOCUMENTS USING ELASTICSEARCH.

Perform basic operations with Elasticsearch.

Inverted index is the heart of search engines. The primary goal of a search engine is to provide speedy searches while finding the documents in which our search terms occur. Inverted index is a hashmap like data structure that directs users from a word to a document or a web page. It is the heart of search engines. Its main goal is to provide quick searches for finding data from millions of documents.

Usually in Books we have inverted indexes as below. Based on the word we can thus find the page on which the word exists.

Consider the following statements

  • javainuse is a good website
  • javainuse is one of the good websites.

For indexing purpose the above text are tokenized into separate terms and all the unique terms are stored inside the index with information such as in which document this term appears and what is the term position in that document.

So the inverted index for the document text will be as follows-

When you search for the term website OR websites, the query is executed against the inverted index and the terms are looked out for, and the documents where these terms appear are quickly identified.

  1. Cluster is a collection of one or more nodes (servers) that together holds your entire data and provides federated indexing and search capabilities across all nodes. A cluster is identified by a unique name which by default is “elasticsearch”. This name is important because a node can only be part of a cluster if the node is set up to join the cluster by its name.