Say I have a sentence This is a new city
That depends on your tokenizer. By default Elasticsearch uses Standant Tokenizer which divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm.
That means your sentence will be tokenized as this, is, a, new, city
. You can create custom tokenizer if you like to.
Documents are indexed when you put them to Elasticsearch.
The data is kept in file system: https://www.elastic.co/blog/found-dive-into-elasticsearch-storage
Here is a blog post about internals: https://www.elastic.co/blog/found-elasticsearch-from-the-bottom-up