Search code examples
spring-bootspring-batchresume

Restarting a failed Spring (Boot) Batch job lauched by command line


I know that there is a lot of questions and posts about this topic (besides the own Spring documentation), but I admit that I still didn't manage to figure out how do job restarts work.

First of all, I'm using Spring Boot for creating my batch program. The relevant parts of my pom.xml follow:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.2.1.RELEASE</version>
        <relativePath /> <!-- lookup parent from repository -->
    </parent>

    ...

    <dependencies>
        <!-- Spring -->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-batch</artifactId>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-configuration-processor</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
            <exclusions>
                <exclusion>
                    <groupId>org.junit.vintage</groupId>
                    <artifactId>junit-vintage-engine</artifactId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>org.springframework.batch</groupId>
            <artifactId>spring-batch-test</artifactId>
            <scope>test</scope>
        </dependency>

        <!-- Databases -->
        <dependency>
            <groupId>org.postgresql</groupId>
            <artifactId>postgresql</artifactId>
        </dependency>

        <!-- Misc -->
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
            <optional>true</optional>
        </dependency>
        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>23.0</version>
        </dependency>
    </dependencies>

    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>

</project>

The main class is annotated with @EnableBatchProcessing:

@SpringBootApplication
@EnableBatchProcessing
public class Migrator {

    public static void main( String[] args ) {
        SpringApplication.run( Migrator.class, args );
    }

}

And the Job, properly said, is configured as follows:


@Configuration
public class EnigmaPartitionedJobConfig {

    @Autowired
    private JobBuilderFactory jobBuilderFactory;

    ...

    @Bean
    public Job configureJob( @Qualifier( "constraintsTurnOffStep" ) Step constraintsTurnOffStep, 
                             @Qualifier( "constraintsTurnOnStep" ) Step constraintsTurnOnStep,
                             @Qualifier( "partitionerStep" ) Step partitionerStep ) {
        return jobBuilderFactory
                        .get( "migratorPartitionedJob" )
                        .incrementer( new RunIdIncrementer() )
                        .start( constraintsTurnOffStep )
                        .next( partitionerStep )
                        .next( constraintsTurnOnStep )
                        .build();
    }

    ...

}

So, the job is composed of three steps. The first and last ones are very simple. They just turn off and on, respectivelly, some constraints in the target database. The middle step is the core business. It reads an input file desbribing the source database, and applies a table based partition. That is, a copy of the migration step is run on a separate thread for each table of the source database:

    @Bean( "partitionerStep" )
    public Step partitionerStep( @Qualifier( "migrationAndAnonymizationStep" ) Step migrationAndAnonymizationStep, TaskExecutor taskExecutor ) {
        return stepBuilderFactory
                    .get( "partitionerStep" )
                    .partitioner( "migrationAndAnonymizationStep", new MultiTablePartitioner( this.model.getEntities() ) )
                    .step( migrationAndAnonymizationStep )
                    .taskExecutor( taskExecutor )
                    .gridSize( this.model.getEntitiesCount() )
                    .build();
    }

Finally, the migration step is configured as follows:

    @Bean( "migrationAndAnonymizationStep" )
    public Step migrationAndAnonymizationStep( MigrationAndAnonymizationReader reader, 
                                               MigrationAndAnonymizationProcessor processor, 
                                               MigrationAndAnonymizationWriter writer,  
                                               TaskExecutor taskExecutor ) {

        return stepBuilderFactory
                    .get( "migrationAndAnonymizationStep" )
                    .<Map<String, Object>, Map<String, Object>>chunk( 50 )
                    .reader( reader )
                    .processor( processor )
                    .writer( writer )
                    .taskExecutor( taskExecutor )
                    .throttleLimit( 1 )
                    .build();
    }

MigrationAndAnonymizationReader is basically a JdbcCursorItemReader with some configuration due to the partitioning, whereas MigrationAndAnonymizationWriter is basically a subclass of FlatFileItemWriter with some minor initialization, just like MigrationAndAnonymizationReader.

Job executions are run using command line:

$ java -jar migrator-0.0.1-SNAPSHOT.jar --spring.profiles.active=[list of profiles] --migrator.source-username=XXX --migrator.source-password=XXX --migrator.target-username=XXX --migrator.target-password=XXX --spring.datasource.username=XXX --spring.datasource.password=XXX --migrator.model-path=[path to a specific input file]

My test for checking the restart functionality consists in running the job using a command like the previous one and, during the execution (it usually takes 2 mins to process the test database I'm using), I kill the process. Then, I execute the same command line and my expectation was that the job would restart, not executing finished steps and steps that didn't finish, start executing from the point they were before the failure. However, every time I attempt this test what I see is the job executing completely from the begining.

So, what am I missing here? What action, implementation or configuration is Spring Boot/Batch expecting from me?


Solution

  • Spring Batch starts a new job because of this line:

    .incrementer( new RunIdIncrementer() )
    

    This generates a unique id for every run.

    If you want to restart a job you have to make sure that the parameters for you pass to the job are the same. So you cannot use RunIdIncrementer