I have a Dataset for which I don't know or have the type in advance, nor the number of properties or their type.
On execution, I obtain for that Dataset a DatasetSchema that contain the names, types and some flags for the properties.
For geometry properties I have their GeoJson representation stored as string and I have some flags (isGeoShape, isGeoPoint) that tell the ES property type.
I'm also using NetTopologySuite if it's needed to parse those GeoJsons to actual Geometry objects, but i rather not do this extra parsing and use the GeoJson strings instead.
class DatasetSchema {
List<DatasetField> Fields;
}
class DatasetField {
string Name;
Type DataType;
bool isGeoShape;
bool isGeoPoint;
}
Questions:
How can I create such an ES index with unknown / dynamic mappings schema with NEST high level client with those geometry properties?
How can I Bulk index those documents with NEST high level client with Bulk or BulkAll APIs with those geometry properties?
I saw here and here that the bulk indexing might be done with BulkDescriptor:
dynamic obj = new System.Dynamic.ExpandoObject();
// ….
var descriptor = new BulkDescriptor();
foreach (var doc in values)
{
descriptor.Index<object>(i => i
.Index("abc")
.Id((Id)doc.Id)
.Document((object)doc));
}
client.Bulk(descriptor);
Still, I’m curious how geometry types should be treated?
Thank you very much! Any thoughts or suggestions are welcomed!
Dynamic templates will be a good fit for your use case, this feature gives you a great way to control how elasticsearch maps your dynamic data schema.
You can leverage match parameter and control field type based on field name. If the instance of DatasetField
has IsGeoPoint
set to true
we can prefix elasticsearch field name with GeoPoint and configure dynamic template to create goe_point
field for names prefixed with GeoPoint
{
"mappings": {
"dynamic_templates": [{
"geo_shape": {
"match": "GeoShape*",
"mapping": {
"type": "geo_shape"
}
}
}, {
"geo_point": {
"match": "GeoPoint*",
"mapping": {
"type": "geo_point"
}
}
}
]
}
}
Here is a sample C# app showing it in action
class Program
{
static async Task Main(string[] args)
{
string indexName = "my_index";
var connectionSettings = new ConnectionSettings(new Uri("http://localhost:9200"));
connectionSettings.DefaultIndex(indexName);
var elasticClient = new ElasticClient(connectionSettings);
await elasticClient.Indices.DeleteAsync(indexName);
//create index mapping with two dynamic templates,
//based on field suffix elasticsearch will map field to specific type
var indexResponse = await elasticClient.Indices.CreateAsync(indexName, d => d
.Map(map => map
.DynamicTemplates(dt => dt
.DynamicTemplate("geo_shape", gs => gs.Match("GeoShape*").Mapping(m => m.GeoShape(s => s)))
.DynamicTemplate("geo_point", gs => gs.Match("GeoPoint*").Mapping(m => m.GeoPoint(p => p)))
)));
//some same data matching your schema
var data = new List<DatasetField>
{
new () { Name = "Field1", IsGeoPoint = true },
new () { Name = "Field2", IsGeoShape = true },
};
var document = new EsDocument();
foreach (var datasetField in data)
{
//if the field is of type geo shape, prefix field name with GeoShape,
//geo_shape dynamic template will match field name and will create geo_point type for it
if (datasetField.IsGeoShape)
{
document.Add($"GeoShape{datasetField.Name}", new PointGeoShape(new GeoCoordinate(0, 0)));
}
//if the field is of type geo point, prefix field name with GeoPoint,
//geo_point dynamic template will match field name and will create geo_shape type for it
if (datasetField.IsGeoPoint)
{
document.Add($"GeoPoint{datasetField.Name}", new GeoLocation(0, 0));
}
}
var response = await elasticClient.IndexDocumentAsync(document);
}
//this class is just an alias to dictionary
class EsDocument : Dictionary<string,object>{}
class DatasetField
{
public string Name { get; set; }
public bool IsGeoShape { get; set; }
public bool IsGeoPoint { get; set; }
}
}
This will produce following elasticsearch mapping
{
"my_index": {
"mappings": {
"dynamic_templates": [{
"geo_shape": {
"match": "GeoShape*",
"mapping": {
"type": "geo_shape"
}
}
}, {
"geo_point": {
"match": "GeoPoint*",
"mapping": {
"type": "geo_point"
}
}
}
],
"properties": {
"GeoPointField1": {
"type": "geo_point"
},
"GeoShapeField2": {
"type": "geo_shape"
}
}
}
}
}
When it comes to bulk indexing documents, the easiest way is to use IndexManyAsync
extension method
await elasticClient.IndexManyAsync(new List<EsDocument>());
Please also have a look at this blog post describing indexing multiple documents in detail. Check "Multiple documents" section.
UPDATE: add new dynamic template to a mapping with existing dynamic templates
var map = (await elasticClient.Indices.GetMappingAsync<EsDocument>()).Indices["your_index_name"];
var dynamicTemplates = map.Mappings.DynamicTemplates;
//add new
dynamicTemplates.Add(new KeyValuePair<string, IDynamicTemplate>());
await elasticClient.Indices.PutMappingAsync(new PutMappingRequest("your_index_name") { DynamicTemplates = dynamicTemplates });