Search code examples
sqlpostgresqltimeoutdatagrip

PostgreSQL timeouts on local server with complex query and fairly large dataset


Background

I have a local server of PostgreSQL on my machine that contains a fair amount of data on car and motorcycle insurance that I need to prepare for statistical analysis. I'm running a fairly complicated query involving several joins on 3 different tables, one of which has ~100m rows and the other two of which have ~30k or so rows. All tables have < 15 columns. Note that this query is actually wrapped in a COPY TO statement, because I want to output the result to a CSV file for later import and analysis in R.

My machine is a 64bit Windows 10 Pro machine with 6 cores and 32gb of RAM. I run my Postgres queries in DataGrip (because I like the GUI better than PGAdmin). I should say at the outset that I'm a RDBMS novice in general. Postgres version is 13.3.

The Problem

This big query in question, which is ~100 lines of code, takes a long time to run, which in a vacuum is fine -- I have the time to wait. I've tested out the query on a small dummy dataset so I know it works. But when I ran it on the "real" data, it ran for two hours exactly and then spat out this error:

[Date] completed in 2 h 0 m 0 s 15 ms
[Date] Error unmarshaling return header; nested exception is:
[Date]  java.net.SocketException: Connection reset

Now, I've looked on SO and in Google for mentions of this error message, and I've found them. But in every case I've seen, users were encountering this in Postgres servers out there on the web (e.g. Amazon Redshift) and not on their local machines, like mine (this is an example).

This is perhaps a bit vague. I'm happy to post an anonymized version of the query for inspection (lord knows it could probably be optimized to run faster), but again, I'm pretty sure the query itself isn't the issue here. (I grant that I may be wrong about this.)

What I'm thinking

My sense from reading other similar posts is that this has to do with some kind of timeout on my server side, but I'm not sure why that would happen if a query is still running (i.e. if the server is still being asked to do stuff).

Any ideas?


Solution

  • After clarification from OP it became clear that DataGrip sets statement_timeout to value that is not enough to complete queries.

    Solution

    Manually turn off statement timeout on top of your script:

    SET statement_timeout = 0
    

    Notes

    I don't DataGrip at hand but I suppose that this session parameter should be configurable for database connections.

    Check documentation/forums.

    Maybe that will work: https://intellij-support.jetbrains.com/hc/en-us/community/posts/360010315600-Setting-a-session-variable-for-a-Postgres-connection-