Search code examples
postgresqltransactionsjboss

Idle transactions mybatis jboss 6.4 postgres 9.6


Some version information:

  • Jboss 6.4
  • Postgres 9.6
  • mybatis-3 CDI
  • Postgres Driver 42.2.20 JDBC 4

I'm having a problem that is causing pretty catastrophic behavior in my system. From my debugging I've been able to deduce that an idle transaction appears to be locking a table in my database, causing the application to freeze (certain locks aren't being released). I've been able to stop the freezing my setting timeouts in mybatis but I cannot figure out what is causing the idle transaction in the first place. The good news is that its always the same UPDATE statement that appears to be blocked. However, I can't narrow down what query/trans curring and I'm seeing behavior that I understand.

Here is the query that always seems to lock up (Some names were changed but this query normally works):

<update id="updateHealth">
        UPDATE table d
        SET fk_table_health_status_id = 
            (SELECT pk_table_health_status_id 
            FROM stat_table_health_status sdhs
            WHERE sdhs.table_health_status = #{health})
            <if test="health == 'Failed' || health == 'Other Failed'">
            ,last_failure = NOW(),
            failure_count = failure_count + 1</if>
        WHERE d.name = #{name}
    </update>

Repository file used to call the query:

@Stateless
public class Repository implements IRepository {

    /** The device mapper. */
    // CHECKSTYLE:OFF - SuppressWarnings - Bean is auto-generated by the iBatis mapper.
    @SuppressWarnings("cdi-ambiguous-dependency")
    @Inject
    @Mapper
    private IMapper mapper;
    
    // more queries...
    @Override
    public void updateHealth(String name, String health) {
        mapper.updateHealth(name, health);
    }
    // more queries...
}

This query doens't always freeze but it occurs pretty frequently. When it does freeze I see there is an idle transaction that lingers until either I shutdown the server or the one of the timeouts I've set fires.

Here are my configurations for Mybatis and Jboss:

Mybatis:

<configuration>
    <settings>
        <setting name="autoMappingBehavior" value="PARTIAL"/>
    </settings>
    <!-- Used to configure MyBatis environment to apply your SQL Maps to the database. -->
    <environments default="development">
        <environment id="development">
            <transactionManager type="MANAGED" />
            <dataSource type="JNDI">
                <property name="data_source" value="${jdbc.url}" />
            </dataSource>
        </environment>
    </environments>
    <!-- The mappers configures the location of the xml mappers containing the CRUD functions ex against the database. -->
    <mappers>
        <package name="com/my/mapper/package"/>
    </mappers>
</configuration>

Jboss DataSource:

<subsystem xmlns="urn:jboss:domain:datasources:1.2">
            <datasources>
                <datasource jndi-name="java:jboss/datasources/PostgresDS" pool-name="PostgresDS" enabled="true" use-java-context="true">
                    <connection-url>jdbc:postgresql://localhost:5432/mydb</connection-url>
                    <driver>postgresql</driver>
                    <pool>
                        <min-pool-size>2</min-pool-size>
                        <max-pool-size>20</max-pool-size>
                        <prefill>true</prefill>
                    </pool>
                    <security>
                        <user-name>user</user-name>
                        <password>super_secret_password</password>
                    </security>
                    <validation>
                        <check-valid-connection-sql>SELECT 1</check-valid-connection-sql>
                        <validate-on-match>false</validate-on-match>
                        <background-validation>false</background-validation>
                        <use-fast-fail>false</use-fast-fail>
                    </validation>
                </datasource>

JBoss Transaction:

<subsystem xmlns="urn:jboss:domain:transactions:1.5">
            <core-environment>
                <process-id>
                    <uuid/>
                </process-id>
            </core-environment>
            <recovery-environment socket-binding="txn-recovery-environment" status-socket-binding="txn-status-manager"/>
            <coordinator-environment default-timeout="1800"/>
        </subsystem>

I've enable lock monitoring in postgres and I see logs like this regularly:

2021-10-27 13:07:15 EDT LOG:  process 15612 still waiting for ShareLock on transaction 10150 after 1005.830 ms
2021-10-27 13:07:15 EDT DETAIL:  Process holding the lock: 13404. Wait queue: 15612.
2021-10-27 13:07:15 EDT CONTEXT:  while updating tuple (1,24) in relation "dyn_device"
2021-10-27 13:07:15 EDT STATEMENT:  UPDATE dyn d
            SET fk_health_status_id = 
                (SELECT pk_health_status_id 
                FROM stat_health_status sdhs
                WHERE sdhs.health_status = $1)
                 
            WHERE d.name = $2

What's confusing about this to me is that there isn't a single query in my system that should take longer then a few milliseconds. There are no crazy transactions that make 30 different queries with REST calls in between. The database is full of mostly simple CRUD operations. Yet this log makes is seem like a transaction is taking longer then a second... is the JBOSS transaction subsystem not committing a transaction? If that's the case how do I force it to commit? I've been using queries from Postgres Lock Monitoring to help debug this issue. Every time the system freezes I see my transaction not being granted a lock and another transaction stuck in "idle in transaction" status. The query also returns the queries/transactions that appear to be causing the idle transaction. However, the idle queries being shown appear to be random and have nothing to do with the table that the UPDATE is accessing. I've seen INSERTs to completely unrelated tables as well as SELECT to different tables as well. I also enabled trace logging for the Transaction subsystem. It shows a transaction being committed just after my query that is locking up begins. I can post that if it will help but its alot to sort though.

With that said does anyone have an idea what could be causing this issue? How does a transaction actually get to the "idle in transaction" state? Does anyone have an idea what what else I can do to debug this? Could application synchronization be playing a part? I've tried synchronizing all queries to the table (switched to @Singleton) being locked but that didn't seem to work.


Solution

  • So I discovered what the problem was. The issue really wasn't the database's fault or even the queries that were being used. It turns out that our system was using the same Transaction subsystem for both it our Data Source (Postgres Database) and our JMS messaging system. When a JMS message was sent, it created a transaction and every transactional based action that followed during the life cycle of that tread/transaction would be treated as part of that original transaction. Which includes all of our database calls.....

    This explains why a query as simple as insert into a message log was touching all of our relations in the database. The debug queries only showed me the first query/statement sent to the database, not all of the others that were used during the life cycle of the JMS message. There were several ways to fix this but my team opted for the easiest which was preventing the Data Source from using the JBoss provided Transaction Manager.

    <datasource jndi-name="java:jboss/datasources/PostgresDS" jta="false" pool-name="PostgresDS" enabled="true" use-java-context="true">