Search code examples
cassandrathriftphpcassa

TFramedTransport Error on PHPCassa + Cassandra


We're deleting a massive number of records in Cassandra. We get the following error. We also get this error when we insert a massive number of records:

Error performing remove on 10.130.279.40:9160: exception 'TTransportException' with message 'TSocket: timed out reading 4 bytes from 10.130.279.40:9160' in /home/zonefiles/php/thrift/transport/TSocket.php:268
    Stack trace:
    0 /home/zonefiles/php/thrift/transport/TTransport.php(87): TSocket->read(4)
    1 /home/zonefiles/php/thrift/transport/TFramedTransport.php(135): TTransport->readAll(4)
    2 /home/zonefiles/php/thrift/transport/TFramedTransport.php(102): TFramedTransport->readFrame()
    3 [internal function]: TFramedTransport->read(8192)
    4 /home/zonefiles/php/thrift/packages/cassandra/Cassandra.php(691): thrift_protocol_read_binary(Object(TBinaryProtocolAccelerated), 'cassandra_Cassa...', false)
    5 /home/zonefiles/php/thrift/packages/cassandra/Cassandra.php(664): CassandraClient->recv_remove()
    6 [internal function]: CassandraClient->remove('CUSTOMERSERVICE...', Object(cassandra_ColumnPath), 1301555573936295, 1)
    7 /home/zonefiles/php/connection.php(230): call_user_func_array(Array, Array)
    8 /home/zonefiles/php/columnfamily.php(582): ConnectionPool->call('remove', 'CUSTOMERSERVICE...', Object(cassandra_ColumnPath), 1301555573936295, 1)
    9 /home/zonefiles/php/delete.php(34): ColumnFamily->remove('CUSTOMERSERVICE...')
    10 {main}
    Error connecting to 10.130.279.40:9160: exception 'TTransportException' with message 'TSocket: timed out reading 4 bytes from 10.130.279.40:9160' in /home/zonefiles/php/thrift/transport/TSocket.php:268
    Stack trace:
    0 /home/zonefiles/php/thrift/transport/TTransport.php(87): TSocket->read(4)
    1 /home/zonefiles/php/thrift/transport/TFramedTransport.php(135): TTransport->readAll(4)
    2 /home/zonefiles/php/thrift/transport/TFramedTransport.php(102): TFramedTransport->readFrame()
    3 [internal function]: TFramedTransport->read(8192)
    4 /home/zonefiles/php/thrift/packages/cassandra/Cassandra.php(1015): thrift_protocol_read_binary(Object(TBinaryProtocolAccelerated), 'cassandra_Cassa...', false)
    5 /home/zonefiles/php/thrift/packages/cassandra/Cassandra.php(992): CassandraClient->recv_describe_version()
    6 /home/zonefiles/php/connection.php(63): CassandraClient->describe_version()
    7 /home/zonefiles/php/connection.php(163): ConnectionWrapper->__construct('CDTMain1', '10.130.279.40:9...', NULL, true, 5000, 5000)
    8 /home/zonefiles/php/connection.php(254): ConnectionPool->make_conn()
    9 /home/zonefiles/php/connection.php(241): ConnectionPool->handle_conn_failure(Object(ConnectionWrapper), 'remove', Object(TTransportException), 1)
    10 /home/zonefiles/php/columnfamily.php(582): ConnectionPool->call('remove', 'CUSTOMERSERVICE...', Object(cassandra_ColumnPath), 1301555573936295, 1)
    11 /home/zonefiles/php/delete.php(34): ColumnFamily->remove('CUSTOMERSERVICE...')
    12 {main}

Here is the PHP we use to generate the error:

<?php
set_time_limit(2000);
require 'connection.php';
require 'columnfamily.php';
$servers[0]['host'] = 'private ip';
$servers[0]['port'] = '9160';
$conn = new Connection('Server11', $servers);
$urlFamily = new ColumnFamily($conn, 'Domain'); // ColumnFamily

$start = microtime(true);

$limit = 100000000;

$rows = $urlFamily->get_range($key_start='', $key_finish='zzzzzzzzzzzzzzz',100000000);

$num = 0;
$delCount = 0;

foreach($rows as $key => $columns) {
   // Do stuff with $key or $columns
       if (strpos($key, ' .net') !== false) {
               //echo 'deleting ' . $key . "\n";
               $urlFamily->remove($key);
               $delCount++;
       }
       if ($num++ > 100000000) break;
       //$num++;
       if ($num % 100000 == 0) echo $num . "\n";
}

$end = microtime(true);

echo $num . " total\n";
echo $delCount . ' deleted in ' . ($end - $start) . " seconds\n";
echo $delCount / ($end - $start) . " deleted per second\n";

?>

We are running PHP 5.3.5 on Fedora 14 Laughlin and Thrift 0.5.0.

One theory is that this is caused by a Cassandra not being able to process the commands fast enough. Do you agree/disagree? Have you seen this before?

If you recommend deleting a different way (e.g. Truncate), how do we still prevent this issue from happening when we do other things with Cassandra?


Solution

  • Are those just log messages, or is an exception actually being raised? phpcassa calls error_log() every time that an exception like this is caught before retrying with a different connection. Basically, this means that you should keep an eye on the stack traces that get logged, but you don't need to worry too much about them.

    Those are client-side socket timeouts, which means that the call has taken longer than the default timeout of 5 seconds. Why exactly these are happening in the first place depends a lot on how Cassandra is behaving. Monitoring Cassandra is probably the best place to start.