Search code examples
phpmembase

Membase started working very slowly


Few days ago Membase started working very slow. We are using local moxi proxy on each server(we have 4 servers). And we have Membase cluster with 2 servers(10GBx2) with replication. Now we have about 15M records at cluster(30M with replicas).Load on cluster about 2000-6000ops and 100% memory resident.

Membase: 1.7.1 System: Linux CentOS 2.6.18-238.19.1.el5 #1 SMP Fri Jul 15 07:31:24 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux

Here is timings from moxi proxy:

stats proxy timings
STAT 11311:default:connect 100+100=23 37.70% ******************
STAT 11311:default:connect 200+100=30 86.89% ************************
STAT 11311:default:connect 300+100=7  98.36% *****
STAT 11311:default:connect 400+100=1 100.00% 
STAT 11311:default:connect 500+100=0 100.00% 
STAT 11311:default:reserved     100+100    =6848    1.64% *
STAT 11311:default:reserved     200+100    =41143  11.49% *******
STAT 11311:default:reserved     300+100    =102670 36.08% *******************
STAT 11311:default:reserved     400+100    =127986 66.73% ************************
STAT 11311:default:reserved     500+100    =53582  79.56% **********
STAT 11311:default:reserved     600+100    =31544  87.11% *****
STAT 11311:default:reserved     700+100    =13188  90.27% **
STAT 11311:default:reserved     800+100    =3630   91.14% 
STAT 11311:default:reserved     900+100    =1629   91.53% 
STAT 11311:default:reserved    1000+100    =1253   91.83% 
STAT 11311:default:reserved    1100+100    =1135   92.10% 
STAT 11311:default:reserved    1200+100    =967    92.33% 
STAT 11311:default:reserved    1300+100    =553    92.46% 
STAT 11311:default:reserved    1400+100    =354    92.55% 
STAT 11311:default:reserved    1500+100    =265    92.61% 
STAT 11311:default:reserved    1600+100    =226    92.67% 
STAT 11311:default:reserved    1700+100    =252    92.73% 
STAT 11311:default:reserved    1800+100    =197    92.77% 
STAT 11311:default:reserved    1900+100    =132    92.81% 
STAT 11311:default:reserved    2000+100    =73     92.82% 
STAT 11311:default:reserved    2100+200    =242    92.88% 
STAT 11311:default:reserved    2300+400    =307    92.96% 
STAT 11311:default:reserved    2700+800    =256    93.02% 
STAT 11311:default:reserved    3500+1600   =244    93.08% 
STAT 11311:default:reserved    5100+3200   =241    93.13% 
STAT 11311:default:reserved    8300+6400   =90     93.15% 
STAT 11311:default:reserved   14700+12800  =13     93.16% 
STAT 11311:default:reserved   27500+25600  =256    93.22% 
STAT 11311:default:reserved   53100+51200  =5      93.22% 
STAT 11311:default:reserved  104300+102400 =17107  97.32% ***
STAT 11311:default:reserved  206700+204800 =8553   99.36% *
STAT 11311:default:reserved  411500+409600 =2358   99.93% 
STAT 11311:default:reserved  821100+819200 =255    99.99% 
STAT 11311:default:reserved 1640300+1638400=37    100.00% 
STAT 11311:default:reserved 3278700+3276800=3     100.00% 
STAT 11311:default:reserved 6555500+6553600=0     100.00% 
END

Membase stats:

STAT accepting_conns 1
STAT auth_cmds 174777
STAT auth_errors 0
STAT bucket_active_conns 1
STAT bucket_conns 304
STAT bytes_read 6765632401898
STAT bytes_written 5857686469455
STAT cas_badval 827124
STAT cas_hits 116425346
STAT cas_misses 97243268
STAT cmd_flush 0
STAT cmd_get 10962483669
STAT cmd_set 1406835206
STAT connection_structures 684
STAT conn_yields 2180210
STAT curr_connections 313
STAT curr_items 7636008
STAT curr_items_tot 15386636
STAT daemon_connections 10
STAT decr_hits 0
STAT decr_misses 0
STAT delete_hits 311792
STAT delete_misses 9
STAT ep_bg_fetched 0
STAT ep_commit_num 3320450
STAT ep_commit_time 0
STAT ep_commit_time_total 9492892
STAT ep_data_age 84
STAT ep_data_age_highwat 1398
STAT ep_db_cleaner_status complete
STAT ep_dbinit 2
STAT ep_dbname /data/membase/data/default-data/default
STAT ep_dbshards 4
STAT ep_db_strategy multiMTVBDB
STAT ep_diskqueue_drain 1647838030
STAT ep_diskqueue_fill 1576916461
STAT ep_diskqueue_items 9919
STAT ep_diskqueue_memory 872872
STAT ep_diskqueue_pending 13765304
STAT ep_expired 244175343
STAT ep_flush_all false
STAT ep_flush_duration 132
STAT ep_flush_duration_highwat 768
STAT ep_flush_duration_total 9517040
STAT ep_flusher_state running
STAT ep_flusher_todo 1457
STAT ep_flush_preempts 0
STAT ep_io_num_read 3570394
STAT ep_io_num_write 1430459297
STAT ep_io_read_bytes 647400207
STAT ep_io_write_bytes 3645986635220
STAT ep_item_begin_failed 0
STAT ep_item_commit_failed 0
STAT ep_item_flush_expired 195564028
STAT ep_item_flush_failed 0
STAT ep_items_rm_from_checkpoints 689034281
STAT ep_kv_size 5046455536
STAT ep_latency_arith_cmd 33757761
STAT ep_latency_get_cmd 18446744071823123379
STAT ep_latency_store_cmd 1655074486
STAT ep_max_data_size 10485760000
STAT ep_max_txn_size 1000
STAT ep_mem_high_wat 7864320000
STAT ep_mem_low_wat 6291456000
STAT ep_min_data_age 0
STAT ep_num_active_non_resident 0
STAT ep_num_checkpoint_remover_runs 2386690
STAT ep_num_eject_failures 0
STAT ep_num_eject_replicas 0
STAT ep_num_expiry_pager_runs 3316
STAT ep_num_non_resident 0
STAT ep_num_not_my_vbuckets 42220981
STAT ep_num_pager_runs 0
STAT ep_num_value_ejects 0
STAT ep_onlineupdate false
STAT ep_onlineupdate_revert_add 0
STAT ep_onlineupdate_revert_delete 0
STAT ep_onlineupdate_revert_update 0
STAT ep_oom_errors 0
STAT ep_overhead 123317215
STAT ep_pending_ops 0
STAT ep_pending_ops_max 0
STAT ep_pending_ops_max_duration 0
STAT ep_pending_ops_total 0
STAT ep_queue_age_cap 900
STAT ep_queue_size 8521
STAT ep_storage_age 84
STAT ep_storage_age_highwat 29808
STAT ep_storage_type featured
STAT ep_store_max_concurrency 10
STAT ep_store_max_readers 9
STAT ep_store_max_readwrite 1
STAT ep_tap_bg_fetched 0
STAT ep_tap_bg_fetch_requeued 0
STAT ep_tap_keepalive 300
STAT ep_tmp_oom_errors 0
STAT ep_too_old 1553154
STAT ep_too_young 0
STAT ep_total_cache_size 78231164348
STAT ep_total_del_items 196040440
STAT ep_total_enqueued 1576916461
STAT ep_total_new_items 208010695
STAT ep_total_persisted 1626499737
STAT ep_uncommitted_items 998
STAT ep_value_size 3107006691
STAT ep_vb_total 1024
STAT ep_vbucket_del 45
STAT ep_vbucket_del_avg_walltime 3567
STAT ep_vbucket_del_fail 0
STAT ep_vbucket_del_max_walltime 22153
STAT ep_vbucket_del_total_walltime 160519
STAT ep_version 1.7.1_2_g4648391
STAT ep_warmed_up 3569370
STAT ep_warmup_dups 0
STAT ep_warmup_oom 0
STAT ep_warmup_thread complete
STAT ep_warmup_time 1675261980
STAT ep_warmup true
STAT get_hits 7075301263
STAT get_misses 3887182406
STAT incr_hits 33740914
STAT incr_misses 14178
STAT libevent 2.0.11-stable
STAT limit_maxbytes 67108864
STAT listen_disabled_num 0
STAT mem_used 5169772751
STAT pid 4225
STAT pointer_size 64
STAT rejected_conns 0
STAT rusage_system 274452.680850
STAT rusage_user 527816.140750
STAT tap_checkpoint_end_received 10177375
STAT tap_checkpoint_end_sent 10176851
STAT tap_checkpoint_start_received 10177881
STAT tap_checkpoint_start_sent 10177361
STAT tap_connect_received 11827
STAT tap_delete_received 101873556
STAT tap_delete_sent 101820996
STAT tap_mutation_received 1182343906
STAT tap_mutation_sent 1194330191
STAT tap_opaque_received 35863
STAT tap_opaque_sent 34956
STAT threads 4
STAT time 1331297508
STAT total_connections 174956
STAT uptime 11938092
STAT vb_active_curr_items 7636008
STAT vb_active_eject 0
STAT vb_active_ht_memory 50561024
STAT vb_active_itm_memory 2194686368
STAT vb_active_num 512
STAT vb_active_num_non_resident 0
STAT vb_active_ops_create 104054142
STAT vb_active_ops_delete 98202339
STAT vb_active_ops_reject 0
STAT vb_active_ops_update 625882155
STAT vb_active_perc_mem_resident 100
STAT vb_active_queue_age 915152000
STAT vb_active_queue_drain 839960787
STAT vb_active_queue_fill 839965441
STAT vb_active_queue_memory 409552
STAT vb_active_queue_pending 5710163
STAT vb_active_queue_size 4654
STAT vb_dead_num 0
STAT vb_pending_curr_items 0
STAT vb_pending_eject 0
STAT vb_pending_ht_memory 0
STAT vb_pending_itm_memory 0
STAT vb_pending_num 0
STAT vb_pending_num_non_resident 0
STAT vb_pending_ops_create 0
STAT vb_pending_ops_delete 0
STAT vb_pending_ops_reject 0
STAT vb_pending_ops_update 0
STAT vb_pending_perc_mem_resident 0
STAT vb_pending_queue_age 0
STAT vb_pending_queue_drain 0
STAT vb_pending_queue_fill 0
STAT vb_pending_queue_memory 0
STAT vb_pending_queue_pending 0
STAT vb_pending_queue_size 0
STAT vb_replica_curr_items 7750628
STAT vb_replica_eject 0
STAT vb_replica_ht_memory 50561024
STAT vb_replica_itm_memory 2212229588
STAT vb_replica_num 512
STAT vb_replica_num_non_resident 0
STAT vb_replica_ops_create 103956553
STAT vb_replica_ops_delete 97838101
STAT vb_replica_ops_reject 0
STAT vb_replica_ops_update 596566447
STAT vb_replica_perc_mem_resident 100
STAT vb_replica_queue_age 1028068000
STAT vb_replica_queue_drain 807877243
STAT vb_replica_queue_fill 736951020
STAT vb_replica_queue_memory 463320
STAT vb_replica_queue_pending 8055141
STAT vb_replica_queue_size 5265
STAT version 1.4.4_461_gf99c147

Membase stats(Couchbase because we upgraded to 1.8.0)

arith_cmd (974447 total)
    1us - 2us     : (  0.00%)      1 
    2us - 4us     : (  0.01%)     53 
    4us - 8us     : (  0.03%)    201 
    8us - 16us    : (  1.03%)   9773 #
    16us - 32us   : ( 92.06%) 887081 #####################################################################################################################################################################################
    32us - 64us   : ( 97.53%)  53314 ##########
    64us - 128us  : ( 99.82%)  22276 ####
    128us - 256us : (100.00%)   1723 
    256us - 512us : (100.00%)     11 
    512us - 1ms   : (100.00%)      9 
    1ms - 2ms     : (100.00%)      5 
 data_age (20636834 total)
    0 - 1s        : (  0.87%)  179419 #
    1s - 2s       : (  6.83%) 1229526 ###########
    2s - 4s       : ( 16.33%) 1962033 ##################
    4s - 7s       : ( 26.81%) 2161109 ####################
    7s - 10s      : ( 33.92%) 1468459 ##############
    10s - 16s     : ( 43.93%) 2064852 ###################
    16s - 23s     : ( 52.64%) 1798684 #################
    23s - 34s     : ( 63.39%) 2218079 #####################
    34s - 49s     : ( 73.93%) 2174966 ####################
    49s - 1m      : ( 84.78%) 2237841 #####################
    1m - 1m       : ( 93.53%) 1807563 #################
    1m - 2m       : ( 97.64%)  848159 ########
    2m - 3m       : ( 98.96%)  271698 ##
    3m - 4m       : ( 99.67%)  145530 #
    4m - 6m       : ( 99.91%)   50132 
    6m - 9m       : ( 99.95%)    8945 
    9m - 12m      : (100.00%)    8831 
    12m - 17m     : (100.00%)    1008 
 disk_commit (29439 total)
    0 - 1s        : ( 27.51%) 8098 #######################################################
    1s - 2s       : ( 28.23%)  214 #
    2s - 4s       : ( 43.74%) 4565 ###############################
    4s - 7s       : ( 62.51%) 5524 #####################################
    7s - 10s      : ( 83.00%) 6032 #########################################
    10s - 16s     : ( 96.00%) 3827 ##########################
    16s - 23s     : ( 99.78%) 1115 #######
    23s - 34s     : (100.00%)   63 
    34s - 49s     : (100.00%)    1 
 disk_del (3864702 total)
    1us - 2us     : (  0.00%)       2 
    2us - 4us     : ( 23.32%)  901064 ##############################################
    4us - 8us     : ( 68.06%) 1729127 ########################################################################################
    8us - 16us    : ( 95.18%) 1048267 #####################################################
    16us - 32us   : ( 98.97%)  146376 #######
    32us - 64us   : ( 99.34%)   14471 
    64us - 128us  : ( 99.98%)   24694 #
    128us - 256us : (100.00%)     615 
    256us - 512us : (100.00%)      34 
    512us - 1ms   : (100.00%)      27 
    1ms - 2ms     : (100.00%)      24 
    2ms - 4ms     : (100.00%)       1 
 disk_insert (3720767 total)
    2us - 4us     : ( 16.68%)  620801 #################################
    4us - 8us     : ( 46.24%) 1099803 ##########################################################
    8us - 16us    : ( 94.96%) 1812778 ################################################################################################
    16us - 32us   : ( 98.76%)  141193 #######
    32us - 64us   : ( 99.19%)   15989 
    64us - 128us  : ( 99.98%)   29502 #
    128us - 256us : (100.00%)     619 
    256us - 512us : (100.00%)      34 
    512us - 1ms   : (100.00%)      31 
    1ms - 2ms     : (100.00%)      16 
    2ms - 4ms     : (100.00%)       1 
 disk_invalid_item_del (1 total)
    1us - 2us     : (100.00%) 1 ############################################################################################################################################################################################################
 disk_invalid_vbtable_del (294 total)
    512us - 1ms   : ( 81.29%) 239 ####################################################################################################################################################################
    1ms - 2ms     : (100.00%)  55 #####################################
 disk_update (16891367 total)
    2us - 4us     : (  0.00%)     617 
    4us - 8us     : ( 35.44%) 5985479 ######################################################################
    8us - 16us    : ( 88.98%) 9043092 ##########################################################################################################
    16us - 32us   : ( 95.49%) 1099896 ############
    32us - 64us   : ( 97.40%)  323792 ###
    64us - 128us  : ( 99.49%)  351597 ####
    128us - 256us : ( 99.96%)   79771 
    256us - 512us : (100.00%)    6809 
    512us - 1ms   : (100.00%)     205 
    1ms - 2ms     : (100.00%)     108 
    2ms - 4ms     : (100.00%)       1 
 get_cmd (271990315 total)
    0 - 1us       : (  0.00%)      3098 
    1us - 2us     : (  0.42%)   1134954 
    2us - 4us     : ( 26.85%)  71886528 ###################################################
    4us - 8us     : ( 96.55%) 189592751 ########################################################################################################################################
    8us - 16us    : ( 99.21%)   7231257 #####
    16us - 32us   : ( 99.54%)    885000 
    32us - 64us   : ( 99.56%)     67815 
    64us - 128us  : (100.00%)   1177254 
    128us - 256us : (100.00%)      8664 
    256us - 512us : (100.00%)      1087 
    512us - 1ms   : (100.00%)      1486 
    1ms - 2ms     : (100.00%)       402 
    2ms - 4ms     : (100.00%)        15 
    4ms - 8ms     : (100.00%)         2 
    8ms - 16ms    : (100.00%)         2 
 set_vb_cmd (7680 total)
    0 - 1us       : (  0.08%)    6 
    1us - 2us     : ( 11.60%)  885 #######################
    2us - 4us     : ( 90.59%) 6066 ##############################################################################################################################################################
    4us - 8us     : ( 98.80%)  631 ################
    8us - 16us    : ( 99.67%)   67 #
    16us - 32us   : ( 99.82%)   11 
    64us - 128us  : ( 99.96%)   11 
    128us - 256us : ( 99.99%)    2 
    512us - 1ms   : (100.00%)    1 
 storage_age (20636834 total)
    0 - 1s        : (  0.39%)   79479 
    1s - 2s       : (  1.73%)  278232 ##
    2s - 4s       : (  2.91%)  242634 ##
    4s - 7s       : (  4.18%)  263057 ##
    7s - 10s      : (  5.25%)  219060 ##
    10s - 16s     : (  7.25%)  412722 ###
    16s - 23s     : (  9.97%)  562678 #####
    23s - 34s     : ( 15.01%) 1039406 #########
    34s - 49s     : ( 22.71%) 1589602 ###############
    49s - 1m      : ( 32.58%) 2036916 ###################
    1m - 1m       : ( 43.13%) 2176751 ####################
    1m - 2m       : ( 51.79%) 1786380 #################
    2m - 3m       : ( 59.80%) 1653388 ###############
    3m - 4m       : ( 69.64%) 2031245 ###################
    4m - 6m       : ( 81.27%) 2399569 #######################
    6m - 9m       : ( 94.47%) 2724318 ##########################
    9m - 12m      : ( 99.52%) 1042119 #########
    12m - 17m     : ( 99.73%)   44332 
    17m - 24m     : ( 99.90%)   34353 
    24m - 34m     : (100.00%)   20552 
    34m - 48m     : (100.00%)      41 
 store_cmd (32950985 total)
    1us - 2us     : (  0.00%)      296 
    2us - 4us     : (  0.25%)    83421 
    4us - 8us     : (  2.63%)   782380 ####
    8us - 16us    : ( 91.01%) 29122181 ##############################################################################################################################################################################
    16us - 32us   : ( 98.48%)  2461695 ##############
    32us - 64us   : ( 98.77%)    95381 
    64us - 128us  : ( 99.96%)   394021 ##
    128us - 256us : (100.00%)    10725 
    256us - 512us : (100.00%)      538 
    512us - 1ms   : (100.00%)      260 
    1ms - 2ms     : (100.00%)       77 
    2ms - 4ms     : (100.00%)        9 
    8ms - 16ms    : (100.00%)        1 
 tap_mutation (15722391 total)
    1us - 2us     : (  0.00%)       4 
    2us - 4us     : (  0.18%)   28274 
    4us - 8us     : ( 34.16%) 5342711 ###################################################################
    8us - 16us    : ( 92.32%) 9144157 ###################################################################################################################
    16us - 32us   : ( 98.76%) 1012068 ############
    32us - 64us   : ( 99.04%)   44704 
    64us - 128us  : ( 99.98%)  147357 #
    128us - 256us : (100.00%)    2927 
    256us - 512us : (100.00%)      93 
    512us - 1ms   : (100.00%)      73 
    1ms - 2ms     : (100.00%)      20 
    2ms - 4ms     : (100.00%)       3 

Solution

  • Problem found.

    Problem was in Coushbase/Membase cluster list. On our cluster we have two network interfaces local(1Tbit) and global(100Mbit). All backend servers have local Moxi proxy that used to connect to Couchbase. In Moxi cluster config Couchbase servers have local IPs, but when moxi connect to Couchbase a get server-list Couchbase return global IPs and because global network interfaces heavy loaded we get issue with respons time.

    So there is few variants of fix:

    I think Couchbase need to separate connections by interface, or even better to make this behavior configurable. Also this behavior should be added to docs, because it not obvious.