Search code examples
javaavromaven-pluginconfluent-schema-registry

Can't generate dependent java classes from avro subjects


In my simple maven application I have 3 avro files:

ReportDetails.avsc

{
  "type": "record",
  "name": "ReportDetails",
  "namespace": "com.vl.model.avro",
  "fields": [
    {"name": "detailId", "type": "string"},
    {"name": "detailName", "type": "string"}
  ]
}

Employee.avsc

{
  "fields": [
    { "name": "employeeId", "type": "string"},
    { "name": "position", "type": "string" },
    { "name": "department", "type": "int" },
    {"name": "employeeName", "type": "string"}
  ],
  "name": "Employee",
  "namespace": "com.vl.model.avro",
  "type": "record"
}

Report.avsc

{
  "type": "record",
  "name": "Report",
  "namespace": "com.vl.model.avro",
  "fields": [
    {"name": "reportId", "type": "string"}
    , {"name": "employee", "type": ["null", "com.vl.model.avro.Employee"], "default": null}
    , {"name": "details", "type": {"type": "array", "items": "com.vl.model.avro.ReportDetails"}}
  ]
}

the plugin configuration is

            <plugin>
                <groupId>org.apache.avro</groupId>
                <artifactId>avro-maven-plugin</artifactId>
                <version>1.11.3</version>
                <executions>
                    <execution>
                        <phase>generate-sources</phase>
                        <goals>
                            <goal>schema</goal>
                        </goals>
                        <configuration>
                            <sourceDirectory>${project.basedir}/src/main/resources/avro</sourceDirectory>
                            <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
                            <enableDecimalLogicalType>true</enableDecimalLogicalType>
                            <stringType>String</stringType>
                            <fieldVisibility>PRIVATE</fieldVisibility>
                            <includes>
                                <include>ReportDetails.avsc</include>
                                <include>Employee.avsc</include>
                                <include>Report.avsc</include>
                            </includes>
                        </configuration>
                    </execution>
                </executions>
            </plugin>

The first stage issue

so this fails because of

[INFO] --- avro:1.11.3:schema (default) @ spring-cloud-stream-kafka-streaming-example ---
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  2.693 s
[INFO] Finished at: 2024-02-01T14:43:20+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.avro:avro-maven-plugin:1.11.3:schema (default) on project spring-cloud-stream-kafka-streaming-example: Execution default of goal org.apache.avro:avro-maven-plugin:1.11.3:schema failed: Undefined name: "com.vl.model.avro.Employee" -> [Help 1]

to make it working i updated the Report configuration (Employee and ReportDetails without changes)

{
  "type": "record",
  "name": "Report",
  "namespace": "com.vl.model.avro",
  "fields": [
    {"name": "reportId", "type": "string"}
    , {"name": "employee", "type": ["null", {"type": "record", "name": "Employee", "fields": []}], "default": null}
    , {"name": "details", "type": {"type": "array", "items": {"type": "record", "name": "ReportDetails", "fields": []}}}
  ]
}

It looks fixed my avro:1.11.3:schema generating issue and works fine because generates things exactly I need.

Following next step. Share the schemas.

Avro schemas has to be registered in schema registry service for sharing on different micro services. To cover this need i've configured kafka-schema-registry-maven-plugin (7.5.1), which can upload and download models.

So the plugin configuration is:

            <plugin>
                <groupId>io.confluent</groupId>
                <artifactId>kafka-schema-registry-maven-plugin</artifactId>
                <version>7.5.1</version>
                <executions>
                    <execution>
                        <id>avro-resources</id>
                        <phase>generate-sources</phase>
                        <goals>
                            <goal>download</goal>
                        </goals>
                    </execution>
                </executions>
                <configuration>
                    <schemaRegistryUrls>
                        <param>http://localhost:8081</param>
                    </schemaRegistryUrls>
                    <outputDirectory>src/main/avro</outputDirectory>
                    <subjectPatterns>
                        <param>^com.vl.model.*$</param>
                    </subjectPatterns>
                    <versions>
                        <param>latest</param>
                    </versions>

                    <subjects>
                        <com.vl.model.ReportDetails>src/main/resources/avro/ReportDetails.avsc</com.vl.model.ReportDetails>
                        <com.vl.model.Employee>src/main/resources/avro/Employee.avsc</com.vl.model.Employee>
                        <com.vl.model.Report>src/main/resources/avro/Report.avsc</com.vl.model.Report>
                    </subjects>
                    <schemaTypes>
                        <com.vl.model.ReportDetails>AVRO</com.vl.model.ReportDetails>
                        <com.vl.model.Employee>AVRO</com.vl.model.Employee>
                        <com.vl.model.Report>AVRO</com.vl.model.Report>
                    </schemaTypes>
                    <references>
                        <com.vl.model.Report>
                            <reference>
                                <name>details</name>
                                <subject>com.vl.model.ReportDetails</subject>
                            </reference>
                            <reference>
                                <name>employee</name>
                                <subject>com.vl.model.Employee</subject>
                            </reference>
                        </com.vl.model.Report>
                    </references>
                </configuration>
            </plugin>

Subjects registration issue

Registering schemas mvn schema-registry:register leads to different issue

[INFO] --- schema-registry:7.5.1:register (default-cli) @ spring-cloud-stream-kafka-streaming-example ---
[INFO] Registered subject(com.vl.model.Overtime) with id 3 version 1
[INFO] Registered subject(com.vl.model.Absence) with id 4 version 1
[INFO] Registered subject(com.vl.model.Employee) with id 5 version 1
[INFO] Registered subject(com.vl.model.ReportDetails) with id 6 version 1
[INFO] Registered subject(com.vl.model.Attendance) with id 7 version 1
[ERROR] Could not parse Avro schema
org.apache.avro.SchemaParseException: Can't redefine: com.vl.model.avro.Employee
    at org.apache.avro.Schema$Names.put (Schema.java:1550)
    at org.apache.avro.Schema$Names.add (Schema.java:1544)
    at org.apache.avro.Schema.parse (Schema.java:1665)
    at org.apache.avro.Schema.parse (Schema.java:1765)
    at org.apache.avro.Schema.parse (Schema.java:1678)
    at org.apache.avro.Schema$Parser.parse (Schema.java:1433)
    at org.apache.avro.Schema$Parser.parse (Schema.java:1421)
    at io.confluent.kafka.schemaregistry.avro.AvroSchema.<init> (AvroSchema.java:120)
    at io.confluent.kafka.schemaregistry.avro.AvroSchemaProvider.parseSchemaOrElseThrow (AvroSchemaProvider.java:54)
    at io.confluent.kafka.schemaregistry.SchemaProvider.parseSchema (SchemaProvider.java:114)
    at io.confluent.kafka.schemaregistry.SchemaProvider.parseSchema (SchemaProvider.java:123)
    at io.confluent.kafka.schemaregistry.client.CachedSchemaRegistryClient.parseSchema (CachedSchemaRegistryClient.java:286)
    at io.confluent.kafka.schemaregistry.client.SchemaRegistryClient.parseSchema (SchemaRegistryClient.java:61)
    at io.confluent.kafka.schemaregistry.maven.UploadSchemaRegistryMojo.processSubject (UploadSchemaRegistryMojo.java:120)
    at io.confluent.kafka.schemaregistry.maven.UploadSchemaRegistryMojo.execute (UploadSchemaRegistryMojo.java:92)
    at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo (DefaultBuildPluginManager.java:126)
    at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute2 (MojoExecutor.java:328)
    at org.apache.maven.lifecycle.internal.MojoExecutor.doExecute (MojoExecutor.java:316)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:212)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:174)
    at org.apache.maven.lifecycle.internal.MojoExecutor.access$000 (MojoExecutor.java:75)
    at org.apache.maven.lifecycle.internal.MojoExecutor$1.run (MojoExecutor.java:162)
    at org.apache.maven.plugin.DefaultMojosExecutionStrategy.execute (DefaultMojosExecutionStrategy.java:39)
    at org.apache.maven.lifecycle.internal.MojoExecutor.execute (MojoExecutor.java:159)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:105)
    at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject (LifecycleModuleBuilder.java:73)
    at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build (SingleThreadedBuilder.java:53)
    at org.apache.maven.lifecycle.internal.LifecycleStarter.execute (LifecycleStarter.java:118)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:261)
    at org.apache.maven.DefaultMaven.doExecute (DefaultMaven.java:173)
    at org.apache.maven.DefaultMaven.execute (DefaultMaven.java:101)
    at org.apache.maven.cli.MavenCli.execute (MavenCli.java:906)
    at org.apache.maven.cli.MavenCli.doMain (MavenCli.java:283)
    at org.apache.maven.cli.MavenCli.main (MavenCli.java:206)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0 (Native Method)
    at jdk.internal.reflect.NativeMethodAccessorImpl.invoke (NativeMethodAccessorImpl.java:77)
    at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke (DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke (Method.java:568)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced (Launcher.java:283)
    at org.codehaus.plexus.classworlds.launcher.Launcher.launch (Launcher.java:226)
    at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode (Launcher.java:407)
    at org.codehaus.plexus.classworlds.launcher.Launcher.main (Launcher.java:348)
[ERROR] Schema for com.vl.model.Report could not be parsed.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE

because we can't declare two times "type": "record", "name": "Employee" or "ReportDetails".

Subject registration solution

Trying to resolve the last issue I updated Report description

{
  "type": "record",
  "name": "Report",
  "namespace": "com.vl.model.avro",
  "fields": [
    {"name": "reportId", "type": "string"}
    , {"name": "employee", "type": {"type": "Employee", "java-class": "com.vl.model.avro.Employee"}}
    , {"name": "details", "type": {"type": "array", "items": "com.vl.model.avro.ReportDetails"}}
  ]
}

It is in the schema registry now.

Different part is broken

Avro schemas definitely published to the schema registry. But neither local nor remote can't help generate java classes. The pulled avro files from schema registry and you can find that they are different a little. The issue for remote schema:

[INFO] --- avro:1.11.3:schema (default) @ spring-cloud-stream-kafka-streaming-example ---
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  2.753 s
[INFO] Finished at: 2024-02-01T16:59:40+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.avro:avro-maven-plugin:1.11.3:schema (default) on project spring-cloud-stream-kafka-streaming-example: Execution default of goal org.apache.avro:avro-maven-plugin:1.11.3:schema failed: "Employee" is not a defined name. The type of the "employee" field must be a defined name or a {"type": ...} expression. -> [Help 1]

for local schema Execution default of goal org.apache.avro:avro-maven-plugin:1.11.3:schema failed: Type not supported: Employee .

The downloaded Report.avsc from schema registry (I formatted one to improve readability) looks like:

{
  "type": "record",
  "name": "Report",
  "namespace": "com.vl.model.avro",
  "fields": [
    {
      "name": "reportId",
      "type": "string"
    },
    {
      "name": "employee",
      "type": "Employee"
    },
    {
      "name": "details",
      "type": {
        "type": "array",
        "items": "ReportDetails"
      }
    }
  ]
}

(missed com.vl.model.avro namespace before types like a Employee, ReportDetails)

Questions

As you can see, resolving one issue we are getting different one. I'll be happy to

  1. Get solution for any described issue without side effect.
  2. Get an idea connected to different maven plugins to cover my needs.
  3. Get a scenario to be checked that I've not checked yet.

P.S.

I didn't removed avsc references because I sure it bring serialisation issues at kafka communication time.

P.S.2. Solution should generate java data classes using maven builder. Java classes will be used for publishing kafka messsages.


Solution

  • It seems, the correct schema pulled from schema registry (it's logical on other case nobody would use it in their projects.)

    {
      "type": "record",
      "name": "Report",
      "namespace": "com.vl.model.avro",
      "fields": [
        {
          "name": "reportId",
          "type": "string"
        },
        {
          "name": "employee",
          "type": "Employee"
        },
        {
          "name": "details",
          "type": {
            "type": "array",
            "items": "ReportDetails"
          }
        }
      ]
    }
    

    So the issue might be with plugin. Verifying the plugin api, I've realised that I missed in my configuration something important.

                <plugin>
                    <groupId>org.apache.avro</groupId>
                    <artifactId>avro-maven-plugin</artifactId>
                    <version>1.11.3</version>
                    <executions>
                        <execution>
                            <phase>generate-sources</phase>
                            <goals>
                                <goal>schema</goal>
                            </goals>
                            <configuration>
                                <sourceDirectory>${project.basedir}/src/main/resources/avro</sourceDirectory>
                                <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
                                <enableDecimalLogicalType>true</enableDecimalLogicalType>
                                <stringType>String</stringType>
                                <fieldVisibility>PRIVATE</fieldVisibility>
                                <imports>
                                    <import>${project.basedir}/src/main/resources/avro/Employee.avsc</import>
                                    <import>${project.basedir}/src/main/resources/avro/ReportDetails.avsc</import>
                                    <import>${project.basedir}/src/main/resources/avro/Report.avsc</import>
                                </imports>
                                <includes>
                                    <include>ReportDetails.avsc</include>
                                    <include>Employee.avsc</include>
                                    <include>Report.avsc</include>
                                </includes>
                            </configuration>
                        </execution>
                    </executions>
                </plugin>
    

    So imports section was missed.

    P.S. To fix it for pulled schemas in the imports section (and includes also) should be files from pulled folder.