java sockets linear-regression bandwidth

Calculating the bandwidth by sending several packets through linear regression

I implemented a TCP client-server model to test my bandwidth with the server through sending number of packets with different sizes and see the RTT then calculate the bandwidth through linear regression, Here is the server code:

 import java.io.*;
 import java.net.*;

 public class Server implements Runnable {

 ServerSocket welcomeSocket;
 String clientSentence;
 Thread thread;
 Socket connectionSocket;
 BufferedReader inFromClient;
 DataOutputStream outToClient;
 public Server() throws IOException {

     welcomeSocket = new ServerSocket(6588);
     connectionSocket = welcomeSocket.accept();

     inFromClient =  new BufferedReader(new InputStreamReader(connectionSocket.getInputStream()));
     outToClient = new DataOutputStream(connectionSocket.getOutputStream());

     thread = new Thread(this);
     thread.start();
 }

@Override
public void run() {
    // TODO Auto-generated method stub

    while(true)
    {

    try {
        clientSentence = inFromClient.readLine();
        if (clientSentence != null) {
            System.out.println("Received: " + clientSentence);
            outToClient.writeBytes(clientSentence + '\n');
        }
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    }
}
public static void main(String[] args) throws IOException {
    new Server();
}

}

And this is the method in the Client class that return an array of the RTT by each packet

    public int [] getResponseTime() throws UnknownHostException, IOException {
    timeArray = new int[sizes.length];
    for (int i = 0; i < sizes.length; i++) {
        sentence = StringUtils.leftPad("", sizes[i], '*');
        long start  = System.nanoTime();
        outToServer.writeBytes(sentence + '\n');
        modifiedSentence = inFromServer.readLine();
        long end = System.nanoTime();
        System.out.println("FROM SERVER: " + modifiedSentence);
        timeArray[i] = (int) (end - start);
        simpleReg.addData(timeArray[i]* Math.pow(10, -9), sizes[i] * 2); // each char is 2 bytes 
    }
    return timeArray;
}

when i get the slope it returns me a BW with kilo bytes however they are in the same network and the bandwidth should be much more . What i am doing wrong ?

Solution

Are you obliged to use linear regression or could it be a different estimator? I am actually not sure if linear regression is the best approach here. I am curious, do you happen to know any sources that suggest to use it in this kind of situation?

Note, that especially the initial BW measurements are much smaller than the real maximal goodput (due to TCP slow-start), so it is important to use a metric estimation that takes large wrong outliers into account. In previous work I have used the harmonic mean to monitor the bandwidth over a longer period of time and it worked pretty good (also on links with a large bandwidth). The advantage of the harmonic mean over other means, is that while it is still very easy to compute, it mitigates the impact of large outliers, meaning the estimate is not as easily falsified.

Given a series of bandwidth measurements R_i, where i=0,1,2,..., n-1, the harmonic mean is calculated as: R_total = (n+1)/((n/R_total) + (1/R_n))

It is also good practice to skip the first few measurement values (depending on how often you measure...), e.g., R_(0..5), since you might have initial bursts due to initial preparations in the different layers and are in the slow-start phase anyways.

Here an example implementation in Java. Even though in this case the measurement is done through a file download, it can be easily applied to your environment too - simply use your echo server instead of the file download:

public class Estimator
{
    private static double R; // harmonic mean of all bandwidth measurements
    private static int n = 0; // number of measurements
    private static int skips = 5; // skip measurements for first 5 socket.read() operations

    // size in bytes
    // start/end in ns
    public static double harmonicMean(long start, long end, double size){
        // check if we need to skip this initial value, since it might falsify our estimate
        if(skips-- > 0) return 0;

        // get current value of R
        double curR = (size/(1024*1024))/(double)((end - start)*Math.pow(10, -9));
        System.out.println(curR);
        if(n == 0) {
            // initial value
            R = curR;
        } else {
            // use harmonic mean
            R = (n+1)/((n/R)+(1/curR));
        }

        n++;

        return R;
    }

    public static void main(String[] args)
    {
        // temporary buffer to hold bytes
        byte[] buffer = new byte[1024*1024*10]; // 10MB buffer - just in case ...

        Socket socket = null;
        try {
            // measurement done through file download from server
            // prepare request
            socket = new Socket("yourserver.com",80);
            PrintWriter pw = new PrintWriter(socket.getOutputStream());
            InputStream is = socket.getInputStream();
            pw.println("GET /test_blob HTTP/1.1"); // a test file, e.g., 1MB big
            pw.println("Host: yourserver.com");
            pw.println("");
            pw.flush();

            // prepare measurement
            long start,end;
            double bytes = 0;
            double totalBytes = 0;
            start = System.nanoTime();
            while((bytes = is.read(buffer)) != -1) {
                // socket.read() occurred -> calculate harmonic mean
                end = System.nanoTime();
                totalBytes += bytes;
                harmonicMean(start, end, totalBytes);
            }

            // clean up
            is.close();
            pw.close();
        }
        catch(Exception e){
            e.printStackTrace();
        } 
        finally {
            if(socket != null) {
                try{
                    socket.close();
                } 
                catch(Exception e){
                    e.printStackTrace();
                }
            }
        }
        System.out.println(R+" MB/s");
    }
}

Additionally, for the sake of completeness, as I already mentioned in the comments it is important that the test messages/files are big enough, so TCP reaches the full goodput potential of the link.

Please also note, that this is a simplified way to estimate the bandwidth. In this example we start measuring (taking the first timestamp) from when the request was sent, meaning we include the link propagation and server processing delay, which in return will reduce the overall estimated value. Anyways, since you seem to use a local network, I expect the sum of these delays to be rather small, which means they will not falsify the final estimate too much.

I wrote a small blog post concerning measuring TCP connection metrics inside an application layer. Everything is described in more detail there (though the code examples are in C).