RemoteSolrException: ERROR: [doc=2] unknown field 'firstName'

I wrote a Spring project that uses SolrInputDocument to add data from tables. I have used doc.addField() method

doc.addField("actorId",a.getId()); doc.addField("firstName",a.getFirstName()); (posting only few of them) for adding data that I have retrieved from MySql.

When I am trying to add these values to SOLR index, I am getting the following error.

Exception in thread "main" org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: ERROR: [doc=2] unknown field 'firstName' at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:68) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:54)

I request you to help me know where I will have to mention the fields "id" and "firstName" in any other file so that SOLR knows that I am using these as parameters for adding data.

Solution

When an RemoteSolrException is raised with the message ERROR: [doc=2] unknown field ... clearly means that the field you're trying to insert is not present in your index (core or collection).

Absolutely you have to read the Solr Documentation, because in the design of Solr schema hides most of Solr information retrieval (IR) logic. I'll suggest to read Solr Overview of Documents, Fields, and Schema Design.

Anyway I would try to give you a little guidance and advice in order to avoid what was for me harder to understand.

First of all you have to recognise the difference between Solr running as Standalone server or in SolrCloud mode. The former is a server that have its configuration written locally in the disk, for each index (named core). The latter is a cluster configuration where more Solr instances behaves as a single server (i.e. distributed search, sharding, replicas, fault tolerance, etc.) and the configuration is stored on a Zookeeper ensemble.

I'll strongly suggest to start with a standalone configuration, besides all differences, the standalone has a configuration easily accessible in your disc and has all the IR features present in SolrCloud.

And again, you should also recognise the difference between an index running in managed-schema and schema.xml:

managed-schema is the name for the schema file Solr uses by default to support making Schema changes at runtime via the Schema API, or Schemaless Mode features.

schema.xml is the traditional name for a schema file which can be edited manually by users who use the ClassicIndexSchemaFactory.

In this case important thing to understand is that in Solr you can define a class of fields that are for example all the field with a name ending with _s (string) or _i (integer), these classes are called in Solr Dynamic Fields.

In the managed-schema (aka Schemaless) configuration all the most important fields types are ready to be used (i.e. strings, integers, booleans, dates, currency, text_general, etc.). This gives the opportunity to load your data immediately, all you have to do is add the correct suffix at end of each field:

productName becomes productName_s
manufacturer becomes manufacturer_s
quantity becomes manufacturer_i
dateInvoice become dateInvoice_d
price becomes price_c

Dynamic fields can be available both in schemaless and traditional schema mode.

So why this difference? Well, a part the historical reasons, I think the Solr engineers were trying to let users to load more easily their data into Solr indexes. But when you start to write your own custom schema.xml then you finally have access to the power of IR that made Solr and to the Lucene engine so famous and one of the best open source full-text server around.

Very likely you're already using the schemaless mode in your index, so just change your field name in firstName_s and try to load your data again.

Regarding the id field, in schemaless mode id field is a special field used as primary key, and is a kind of "reserved name you don't have to add any suffix.

The id field has type string.