Search code examples
javamultithreadingperformancelistoverhead

Java Proxy Discovering Bot


I have written a class, ProxyFinder which connects to random ips and first pings them, and if they respond, attempts to create a http proxy connection through common proxy ports.

Currently, it is set up just connecting to random ips. This is relatively fast, discovering a few proxys an hour. However, I would like to somehow check if I have already previously connected to an ip. First I tried keeping them in a list, but that was using over 10GB of ram.. I included a method that I tried in the code below which writes the data to a cache using a RandomAccessFile, but this is incredibly slow to search through the entire file for each connection as it gets larger.

I am storing the data in as small of format as possible, simply four bytes for each ip. Even though, this is 4 * 256 * 256 *256 * 256 bytes.. = 16gb of raw ram.. or a 16gb file to search each time you want to test another ip.

I also tried creating a separate thread to generate ips, check them against the file, and then add them to a queue that the probe threads could pull from. It could not keep up with the probe threads either.

How can I quickly check if I have already connected to an IP or not, without being incredibly slow or using ridiculous amounts of memory?

package net;

import java.io.File;
import java.io.RandomAccessFile;
import java.net.HttpURLConnection;
import java.net.InetAddress;
import java.net.InetSocketAddress;
import java.net.Proxy;
import java.net.URL;
import java.util.Arrays;
import java.util.concurrent.atomic.AtomicInteger;

/**
 *
 * @author Colby
 */
public class ProxyFinder {

    /**
     * @param args the command line arguments
     */
    public static void main(String[] args) throws Exception {

        int[] ports = {
            1080, 3128, 3128, 8080
        };

        System.out.println("Starting network probe");

        AtomicInteger counter = new AtomicInteger();
        for (int i = 0; i < 500; i++) {
            new Thread(() -> {

                do {
                    try {
                        byte[] addrBytes = randomAddress();//could be getNextAddress also
                        if (addrBytes == null) {
                            break;
                        }

                        InetAddress addr = InetAddress.getByAddress(addrBytes);
                        if (ping(addr)) {
                            float percent = (float) ((counter.get() / (256f * 256f * 256f * 256f)) * 100F);
                            if (counter.incrementAndGet() % 10000 == 0) {
                                System.out.println("Searching " + percent + "% network search");
                            }

                            for (int port : ports) {
                                try {
                                    Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress(addr, port));

                                    HttpURLConnection con = (HttpURLConnection) new URL("http://google.com").openConnection(proxy);

                                    con.setConnectTimeout(1000);
                                    con.setReadTimeout(1000);
                                    con.setRequestMethod("GET");
                                    con.setRequestProperty("User-Agent", "Mozilla/5.0");

                                    con.getContent();
                                    con.disconnect();

                                    System.out.println("Proxy found!" + addr.getHostAddress() + ":" + port + "  Found at " + percent + "% network search");

                                } catch (Exception e) {
                                }
                            }

                            //
                            //System.out.println("Ping response: --" + addr.getHostAddress() + "-- Attempt: " + counter.get() + " Percent: " + percent + "%");
                        } else {
                            //System.out.println("Ping response failed: " + addr.getHostAddress() + " attempt " + counter.incrementAndGet());
                        }

                    } catch (Exception e) {
                        //e.printStackTrace();
                    }

                } while (true);

            }).start();
        }
    }

    private static RandomAccessFile cache;

    private static byte[] getNextAddress() throws Exception {
        if (cache == null) {
            cache = new RandomAccessFile(File.createTempFile("abc", ".tmp"), "rw");
        }

        byte[] check;
        checkFile:
        {
            byte[] addr = new byte[4];
            do {
                check = randomAddress();
                inner:
                {
                    cache.seek(0);
                    while (cache.length() - cache.getFilePointer() > 0) {
                        cache.readFully(addr);
                        if (Arrays.equals(check, addr)) {
                            break inner;
                        }
                    }
                    cache.write(check);
                    break checkFile;
                }

            } while (true);
        }
        return check;
    }

    private static byte[] randomAddress() {
        return new byte[]{(byte) (Math.random() * 256), (byte) (Math.random() * 256), (byte) (Math.random() * 256), (byte) (Math.random() * 256)};
    }

    private static boolean ping(InetAddress addr) throws Exception {
        return addr.isReachable(500);
    }
}

Also in case anyone is wondering, I've had this running for 12 hours now and it's discovered about 50 proxys, and pinged about 2.09664E-4% of the ip range which is about 1.2 million ips. not bad for the bandwidth allocated (0.5Mbps)

EDIT: I am starting to think that maybe the overhead of storing and checking all of these IPs would be even greater than simply connecting to many duplicates near the end of searching the ip range..


Solution

  • I have ported code from another solution here to fit this problem: Java- Mapping multi-dimensional arrays to single

    The answer to the above question gives an in depth explanation of how the following code works. If anyone else would like to post a more in depth answer on this thread I will award it the answer.

    static BitSet set;
    
    static int pos(int i, int j, int k, int m) {
        return ((256*256*256) * i) + ((256*256) * j) + (256 * k) + m;
    }
    
    static boolean get(byte[] addr) {
        return set.get(pos(addr[0], addr[1], addr[2], addr[3]));
    }
    
    static void set(byte[] addr, boolean flag) {
        set.set(pos(addr[0], addr[1], addr[2], addr[3]), flag);
    }