Search code examples
amazon-ec2proxyapache2apache-supersetdremio

Apache2 server and Superset, 502 Proxy Error, error reading from remote server while dashboards loading


Short introduction

I have Apache Superset and Apache2 server located on the same EC2 instance. Apache2 is acting as a proxy server. It accepts HTTPS requests and transfers them to Apache Superset. Apache Superset is run using gunicorn.

Problem

Requests to Apache Dremio data engine could take some time (< 60 seconds). When accessing dashboards on Superset, using DNS name with SSL, with proxy setup some dashboards parts (requests) are failing with the following error:

Proxy Error
The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request
Reason: Error reading from remote server

Strangely, these errors can appear in a matter of seconds despite that default value for ProxyTimeout is quite high.

The problem doesn't occur if Superset is accessed by IP address.

Error message in apache2/error.log:

(20014) Internal error (specific information not available): [client 10.4.26.3:6969] AH01102: error reading status line from remote server localhost:8088, referer: ...

What was tried to solve a problem

Problem can be with proxy server timeout or with Superset web server dropping some connections. My Apache2 config:

<VirtualHost *:443>
  ProxyPreserveHost On
  ProxyRequests Off
  ServerName dash.domain.com
  ServerAlias dash.domain.com

  SSLEngine on
  SSLCertificateFile /etc/ssl/private/cert.crt
  SSLCertificateChainFile /etc/ssl/certs/cert2.crt
  SSLCertificateKeyFile /etc/ssl/private/key.key

  ProxyPass / http://localhost:8088/ connectiontimeout=3600 timeout=3600
  ProxyPassReverse / http://localhost:8088/

  # things tried
  # SetEnv force-proxy-request-1.0 1
  # SetEnv proxy-nokeepalive 1
  # SetEnv proxy-initial-not-pooled 1
  # ProxyTimeout 3600
  # TimeOut 3600
</VirtualHost>

Things tested (and not working):

  1. Timeout and ProxyTimeout
  2. connectiontimeout and timeout (as seen above)
  3. Keepalive=On for ProxyPass
  4. different SetEnv
  5. superset_config.py -> ENABLE_PROXY_FIX, SUPERSET_WEBSERVER_TIMEOUT

In addition, similar proxy setup was build using nginx, error is similar to what is described here.

Any help or ideas would be appreciated. Thank you very much!

Useful information

Apache Superset version: 0.37.2

Apache Dremio version: 4.1.0

Apache2 server version: 2.4.29

EC2 instance type: t3.medium

OS version: Ubuntu 18.04


Solution

  • The problem was in dying gunicorn async workers. Too many requests were coming from the charts and workers were not able to handle them. Changing worker type from async to sync (default gunicorn type) solved the proxy problem.

    I still don't know why direct access by IP was not producing the 502 proxy error.

    Sorry for not including information about gunicorn in the question.

    P.S Recommended type of workers for Apache Superset from their docs is async, but, for my case, sync were the better solution. In theory, sync workers are slower compare to async (in Superset context).