I have two CSV files, one containing 500k+ customer records. I am attempting to convert each row to a customer object and do a POST to an API which I am also responsible for.
This approach has the obvious problem of firing of 500k+ HTTP calls and causing maximum HTTP connections to be reached.
I have had two suggestions thrown at me, opening up a WebSocket or using Spring Batch. Is this a good use case for opening a WebSocket and sending messages rather than opening multiple HTTP connections? Or is it better to go the more traditional route of using spring batch?
Since it appears to be your own server, you should just make a server route that allows you to send it multiple records at a time and then you can batch things into a lot fewer API calls.
If it's really 500k records, you need to send, you will probably still want to batch them into multiple requests, but you could at least do them 10k at a time and manage your connections so you don't have more than 5-10 requests in flight at any given time (since it's unlikely your server could process more than that at once anyway and this should keep your client from running out of network resources).
Or, if you want to do it more like a file upload, you could send 500k records worth of data, have your server handle it like a file upload and then once it succeeds, have the server process it.
In fact, you may want to just upload the CSV and let the server process it directly.
While a webSocket connection would let you use the same connection for multiple requests (which is a good thing), you still don't want to be sending 500k individual records. Just the overhead of sending that many separate request alone will be inefficient whether it's webSocket or http request. Instead, you really want to batch the requests and send large chunks of data per request.