I am currently connecting my ec2 server to rds via the following:
self.conn = MySQLdb.connect (
host = settings.DATABASES['default']['HOST'],
port = 3306,
user = settings.DATABASES['default']['USER'],
passwd = settings.DATABASES['default']['PASSWORD'],
db = settings.DATABASES['default']['NAME'])
This connects via tcp and is much, much slower for me than locally when I connect on my own machine to mysql through a socket. How would I connect an ec2 instance to an rds database via a socket connection so it is much faster than using tcp/ip for long-running scripts (the difference for me is an update script will take 10 hours instead of one).
Short answer: You can't.
Aside: all connections to MySQL on a Linux server use "sockets," of course, whether they are Internet (TCP) sockets, or IPC/Unix Domain sockets. But in this question, as in common MySQL parlance, "socket" refers to an IPC socket connection, using a special file, such as /tmp/mysql.sock
, though the specific path to the socket file varies by Linux distribution.
A Unix domain socket or IPC socket (inter-process communication socket) is a data communications endpoint for exchanging data between processes executing within the same host operating system.
So, you can't use the MySQL "socket" connection mechanism, because the RDS server is not on the same machine. The same holds true, of course, any time the MySQL server is on a different machine.
On a local machine, the performance difference between an IPC socket connection and a TCP socket connection (from/to the same machine) is negligible. There is no disagreement that TCP connections have more overhead than IPC simply because of the TCP/IP wrapper and checksums, the three-way handshake, etc... but again, these tiny fractions of milliseconds of difference that will be entirely lost on the casual observer.
To conclude that TCP connections are "slower" than IPC connections, and particularly by a factor of 10, is not correct. The quotes around "slower" reflect my conclusion that you have not yet defined "slower" with sufficient precision: Slow to connect? Slow to transfer large amounts of data (bandwidth/throughput issue)? Slower to return from each query?
Take note of the Fallacies of Distributed Computing, particularly this one:
Latency is zero.
I suspect your primary performance issue is going to be found in the fact that your code is not optimal for non-zero latency. The latency between systems in EC2 (including RDS) within a region should be under 1 millisecond, but that's still many hundreds of times the round-trip latency on a local machine (which is not technically zero but could easily be just a handful of microseconds).
Testing your code locally, using a TCP connection (using the host 127.0.0.1 and port 3306) instead of the IPC socket should illustrate whether there's really a significant difference or whether the problem is somewhere else... possibly inefficient use of the connections, or unnecessarily repeated disconnect/reconnect, though it's difficult to speculate further without a clearer understanding of what you mean by "slow."