Wrong Java to Solr type mapping with Schemaless Collection

I'm using SolrJ to index POJOs to Solr, and a string property with a numeric value is being mapped to org.apache.solr.schema.TrieLongField type, which in turns causes an BindingException when I try to retrieve the document from Solr.

My class is annotated with @Field on the setters and I'm adding the document with client.addBean(object).

The following code can reproduce this issue:

public class SolrIndexTest {
    @Field
    public Long longField;
    @Field
    public String stringField;

    public static void main(String[] args) {
        //test core created with the following command
        //sudo su - solr -c  "/opt/solr/bin/solr create -c test -n data_driven_schema_configs"

        HttpSolrClient client = new HttpSolrClient.Builder("http://localhost:8983/solr/test").build();
        client.setParser(new XMLResponseParser());

        SolrIndexTest obj1 = new SolrIndexTest();
        obj1.longField = 1L;
        obj1.stringField = "1"; // 1st doc: numeric value
        SolrIndexTest obj2 = new SolrIndexTest();
        obj2.longField = 2L;
        obj2.stringField = "Text string"; // 2nd doc: text value

        try {
            client.addBean(obj1);
            client.commit();
        } catch (Exception e) {
            e.printStackTrace();
        }
        try {
            client.addBean(obj2); // This line will throw a BindingException
            client.commit();
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

Solution

When you run a Solr Collection in Schemaless mode, the field type (double, integer, string, etc) is taken by suffix added to the field name. Or by guessing the field type, parsers for Boolean, Integer, Long, Float, Double, and Date are currently available (not String).

Schemaless Mode is a set of Solr features that, when used together, allow users to rapidly construct an effective schema by simply indexing sample data, without having to manually edit the schema. These Solr features, all controlled via solrconfig.xml, are:

Managed schema: Schema modifications are made at runtime through Solr APIs, which requires the use of schemaFactory that supports these changes - see Schema Factory Definition in SolrConfig for more details.

Field value class guessing: Previously unseen fields are run through a cascading set of value-based parsers, which guess the Java class of field values - parsers for Boolean, Integer, Long, Float, Double, and Date are currently available.

Automatic schema field addition, based on field value class(es): Previously unseen fields are added to the schema, based on field value Java classes, which are mapped to schema field types - see Solr Field Types.

In short, if you want map correctly your field type just add the correct suffix:

@Field
public Long longField_l; // _l stands for long
@Field
public String stringField_s; // _s stands for string

And you'll see the expected result:

<doc>
    <long name="longField_l">1</long>
    <str name="stringField_s">1</str>
</doc>
<doc>
    <long name="longField_l">2</long>
    <str name="stringField_s">Text string</str>
</doc>

if you open the managed-schema file at end you'll see the list of dynamic fields used to map the types. Here I have copied few of them:

<dynamicField name="*_i" type="int" indexed="true" stored="true"/>
<dynamicField name="*_s" type="string" indexed="true" stored="true"/>
<dynamicField name="*_l" type="long" indexed="true" stored="true"/>
<dynamicField name="*_t" type="text_general" indexed="true" stored="true"/>
<dynamicField name="*_b" type="boolean" indexed="true" stored="true"/>
<dynamicField name="*_f" type="float" indexed="true" stored="true"/>
<dynamicField name="*_d" type="double" indexed="true" stored="true"/>
<dynamicField name="*_p" type="location" indexed="true" stored="true"/>
<dynamicField name="*_c" type="currency" indexed="true" stored="true"/>