Search code examples
postgresqlamazon-ec2thingsboard

Thingsboard AWS server freezes


I want to run a self managed Thingsboard on AWS (t2.micro).

I have installed Thingsboard CE on a t2.micro AWS instance running Ubuntu 20.04 sever.
I followed the aws setup and Ubuntu install guides(postgresql + built in queue service).

I also set up haproxy using this guide.

I was able to successfully log in to my Thingsboard. I only changed the passwords and checked the basic functionalities, but didn't create any new dashboards or made any modifications.

After this I left the computer on, running Thingsboard. Next day I could not reach Thingsboard and although the AWS instance was running I could not ssh into it anymore. After stopping and starting(reboot didn't work) the instance everything was ok (could ssh and Thingsboard was reachable).

I can reproduce this failure just by leaving the instance on, it seems that after serveral hours (5-8 hrs) Thingsboard(or something else not sure) fails which freezes the whole computer.

I have checked two things:

  1. I checked CPU utiization on AWS monitoring. It seems that after some hours there is a big jump in CPU load and then it drops back to almost zero. While Thingsboard is running, it is constant.See printscreen from AWS monitoring

  2. I checked the Thingsboard logs (in /var/log/thingsboard): There are some errors, but unfortunately most of the things are not enough for me guess what could be a problem with the fresh installation. Here are some lines from the log:

2021-11-12 00:21:59,626 [http-nio-0.0.0.0-8080-exec-13] INFO  o.a.coyote.http11.Http11Processor - Error parsing HTTP request header
     Note: further occurrences of HTTP request parsing errors will be logged at DEBUG level.
    java.lang.IllegalArgumentException: Invalid character found in method name

[0x160x030x010x00{0x010x000x00w0x030x030x170xb80xb80xe50xef0x000xb50x0a&0x930x020x00:0xde0xd70xa00xab0xb 70x8bU0xc00x92r0x9330x10O0x8c<o0xf70xf90x000x000x1a0xc0/0xc0+0xc00x110xc00x070xc00x130xc00x090xc00x140xc00x0a0x000x050x00/0x0050xc00x120x000x0a0x010x000x0040x000x050x000x050x010x000 x000x000x000x000x0a0x000x080x000x060x000x170x000x180x000x190x000x0b0x000x020x010x000x000x0d0x000x100x000x0e0x040x010x040x030x020x010x020x030x040x010x050x010x060x010xff0x010x000x010x00...].

  HTTP method names must be tokens
            at org.apache.coyote.http11.Http11InputBuffer.parseRequestLine(Http11InputBuffer.java:417)
            at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:261)
            at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65)
            at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:893)
            at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1707)
            at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
            at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
            at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
            at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
            at java.base/java.lang.Thread.run(Thread.java:829)
2021-11-12 00:22:01,486 [sql-queue-2-ts-4-thread-1] WARN  com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate

connection org.postgresql.jdbc.PgConnection@4393afd0 (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2021-11-12 00:22:01,487 [sql-queue-2-ts latest-8-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@75b9496b (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2021-11-12 00:22:01,487 [sql-queue-0-ts latest-6-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@31849eec (This connection has been closed.). Possibly consider using a shorter maxLifetime value. 2021-11-12 00:22:01,487 [sql-queue-0-ts-2-thread-1] WARN com.zaxxer.hikari.pool.PoolBase - HikariPool-1 - Failed to validate connection org.postgresql.jdbc.PgConnection@725fafe3 (This connection has been closed.). Possibly consider using a shorter maxLifetime value.

Some more:

2021-11-12 00:23:46,205 [sql-log-1-thread-1] INFO  o.t.s.dao.sql.TbSqlBlockingQueue - Queue-2 [TS Latest] queueSize [9] totalAdded [0] totalSaved [0] totalFailed [0]
2021-11-12 00:23:47,741 [sql-queue-0-ts-2-thread-1] WARN  o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:47,742 [sql-queue-2-ts-4-thread-1] WARN  o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:47,742 [sql-queue-2-ts latest-8-thread-1] WARN  o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:47,742 [sql-queue-0-ts latest-6-thread-1] WARN  o.h.e.jdbc.spi.SqlExceptionHelper - SQL Error: 0, SQLState: 08003
2021-11-12 00:23:48,022 [sql-queue-0-ts-2-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 634223ms.
2021-11-12 00:23:48,058 [sql-queue-0-ts-2-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
2021-11-12 00:23:48,022 [sql-queue-0-ts latest-6-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 634223ms.
2021-11-12 00:23:48,059 [sql-queue-0-ts latest-6-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
2021-11-12 00:23:48,022 [sql-queue-2-ts latest-8-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 624177ms.
2021-11-12 00:23:48,059 [sql-queue-2-ts latest-8-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.
2021-11-12 00:23:48,023 [sql-queue-2-ts-4-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - HikariPool-1 - Connection is not available, request timed out after 627819ms.
2021-11-12 00:23:48,059 [sql-queue-2-ts-4-thread-1] ERROR o.h.e.jdbc.spi.SqlExceptionHelper - This connection has been closed.

At the last:

2021-11-12 00:33:10,919 [sql-queue-0-ts latest-6-thread-1] ERROR o.t.s.dao.sql.TbSqlBlockingQueue - [TS Latest] Failed to save 1 entities
org.springframework.transaction.CannotCreateTransactionException: Could not open JPA EntityManager for transaction; nested exception is org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
        at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:448)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.startTransaction(AbstractPlatformTransactionManager.java:400)
        at org.springframework.transaction.support.AbstractPlatformTransactionManager.getTransaction(AbstractPlatformTransactionManager.java:373)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.createTransactionIfNecessary(TransactionAspectSupport.java:574)
        at org.springframework.transaction.interceptor.TransactionAspectSupport.invokeWithinTransaction(TransactionAspectSupport.java:361)
        at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:118)
        at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
        at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:750)
        at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:692)
        at org.thingsboard.server.dao.sqlts.insert.latest.psql.PsqlLatestInsertTsRepository$$EnhancerBySpringCGLIB$$381b448c.saveOrUpdate(<generated>)
        at org.thingsboard.server.dao.sqlts.SqlTimeseriesLatestDao.lambda$init$3(SqlTimeseriesLatestDao.java:133)
        at org.thingsboard.server.dao.sql.TbSqlBlockingQueue.lambda$init$2(TbSqlBlockingQueue.java:71)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.hibernate.exception.JDBCConnectionException: Unable to acquire JDBC Connection
        at org.hibernate.exception.internal.SQLExceptionTypeDelegate.convert(SQLExceptionTypeDelegate.java:48)
        at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:42)
        at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:113)
        at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:99)
        at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:111)
        at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getPhysicalConnection(LogicalConnectionManagedImpl.java:138)
        at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.getConnectionForTransactionManagement(LogicalConnectionManagedImpl.java:276)
        at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.begin(LogicalConnectionManagedImpl.java:284)
        at org.hibernate.resource.transaction.backend.jdbc.internal.JdbcResourceLocalTransactionCoordinatorImpl$TransactionDriverControlImpl.begin(JdbcResourceLocalTransactionCoordinatorImpl.java:246)
        at org.hibernate.engine.transaction.internal.TransactionImpl.begin(TransactionImpl.java:83)
        at org.springframework.orm.jpa.vendor.HibernateJpaDialect.beginTransaction(HibernateJpaDialect.java:184)
        at org.springframework.orm.jpa.JpaTransactionManager.doBegin(JpaTransactionManager.java:402)
        ... 16 common frames omitted
Caused by: java.sql.SQLTransientConnectionException: HikariPool-1 - Connection is not available, request timed out after 634223ms.
        at com.zaxxer.hikari.pool.HikariPool.createTimeoutException(HikariPool.java:695)
        at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:197)
        at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:162)
        at com.zaxxer.hikari.HikariDataSource.getConnection(HikariDataSource.java:128)
        at org.hibernate.engine.jdbc.connections.internal.DatasourceConnectionProviderImpl.getConnection(DatasourceConnectionProviderImpl.java:122)
        at org.hibernate.internal.NonContextualJdbcConnectionAccess.obtainConnection(NonContextualJdbcConnectionAccess.java:38)
        at org.hibernate.resource.jdbc.internal.LogicalConnectionManagedImpl.acquireConnectionIfNeeded(LogicalConnectionManagedImpl.java:108)
        ... 23 common frames omitted
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
        at org.postgresql.jdbc.PgConnection.checkClosed(PgConnection.java:877)
        at org.postgresql.jdbc.PgConnection.setNetworkTimeout(PgConnection.java:1610)
        at com.zaxxer.hikari.pool.PoolBase.setNetworkTimeout(PoolBase.java:560)
        at com.zaxxer.hikari.pool.PoolBase.isConnectionAlive(PoolBase.java:173)
        at com.zaxxer.hikari.pool.HikariPool.getConnection(HikariPool.java:186)
        ... 28 common frames omitted

What is interesting that the timestaps on the CPU load going to max don't precisely correlate with the error messages in the log. I apologise for the long error messages, but right now I don't know what could be the root cause.

I haven't tried to reinstall the whole computer yet.

My question would be, how should I proceed? Does anyone have ever faced similar issues? What logs/services/etc. should I check to grasp the root cause?

Should I try using a machine with more resources? Should I try other database and queue service?

In the current form this Thingsboard instance is not stable even for tests.

Edit: Sorry I could not format properly the first part of the error code. Edit2: First link was wrong.


Solution

  • After I increased the RAM to 4GB (from 1GB) the ThingsBoard server is up without problem. There are no sporadic freezes anyomore. As there were no other provable suggestions for the problem and now my system works without a problem I consider the question answered.