I am using google-api-ruby-client
for Streaming Data Into BigQuery. so whenever there is a request. it is pushed into Redis as a queue & then a new Sidekiq worker tries to insert into bigquery. I think its involves opening a new HTTPS connection to bigquery every insert.
the way, I have it setup is: Events post every 1 second or when the batch size reaches 1MB (one megabyte), whichever occurs first. This is per worker, so the Biquery API may receive tens of HTTP posts per second over multiple HTTPS connections.
This is done using the provided API client by Google.
Now the Question -- For Streaming inserts, what is the better approach:-
google-api-ruby-client
I think it's pretty much too early to speak about these optimizations. Also other context is missing like if you exhausted the kernel's TCP connections or not. Or how many connections are in TIME_WAIT state and so on.
Otherwise this would need lots of other context and deep level of understanding how this works in order to optimize something here.
Maximum row size: 1 MB
HTTP request size limit: 10 MB
Maximum rows per second: 100,000 rows per second, per table.
Maximum rows per request: 500
Maximum bytes per second: 100 MB per second, per table
I will try to give also context to better understand the complex situation when ports are exhausted:
Let's say on a machine you have a pool of 30,000 ports and 500 new connections per second (typical):
1 second goes by you now have 29500
10 seconds go by you now have 25000
30 seconds go by you now have 15000
at 59 seconds you get to 500,
at 60 you get back 500 and stay at using 29500 and that keeps rolling at
29500. Everyone is happy.
Now say that you're seeing an average of 550 connections a second. Suddenly there aren't any available ports to use.
So, your first option is to bump up the range of allowed local ports; easy enough, but even if you open it up as much as you can and go from 1025 to 65535, that's still only 64000 ports; with your 60 second TCP_TIMEWAIT_LEN, you can sustain an average of 1000 connections a second. Still no persistent connections are in use.
This port exhaust is better discussed here: http://www.gossamer-threads.com/lists/nanog/users/158655