Search code examples
javaalgorithmcollectionsdata-retrieval

Java: Filter collection and retrieve data by multiple fields


I have a class:

public class Address {
    private String country;
    private String state;
    private String city;
}

And there are a list of Person objects. Person class looks like:

public class Person {
    private String country;
    private String state;
    private String city;
    //other fields
}

I need to filter Person objects and get the most suitable one. Address object can have at least one not null field. Person object can have none, partially or all mentioned fields initialized.

Here is one of the possible input examples:

Three Person objects:
a. PersonA: country = 'A'
b. PersonB: country = 'A', state = 'B'
c. PersonC: country = 'A', state = 'B', city = 'C'

Address object:
a. Address: country = 'A', state = 'B'

Expected result after filtering is PersonB. And in case when there're only PersonA and PersonC objects, then PersonA is more preferable.

I'd like to show how I tried to do this, but in fact it is pure brute force algorithm and I dislike it. Algorithm complexity increases with newly added field. I also thought about using guava filter by predicate, but had no idea what predicate should be.

What is preferable algorithm for such filtering if there's any besides brute force?


Solution

  • To avoid brute-force you would need to index you persons by address. For a good search you would definitely need a country (guess it or default it somehow, otherwise results would anyway be too inaccurate).

    The index would be a number, first 3 digits for a country, next 3 digits for a state and last 4 digits for a city. In such a case in an int you would be able to store 213 countries (only 206 as of 2016), with up to 999 states and 9999 cities.

    It gives us ability to use hashCode and a TreeSet to index your Person instances and look them up partially by address in a O(log(n)) manner without touching their fields. Fields would be touched on a TreeSet construction, and you would need to add some extra logic for a modification of a Person to keep the index intact.

    The index is calculated sequntially for each part, starting from country

        import java.util.HashMap;
        import java.util.Map;
    
        public class PartialAddressSearch {
    
            private final static Map<String, AddressPartHolder> COUNTRY_MAP = new HashMap<>(200);
    
            private static class AddressPartHolder {
                int id;
                Map<String, AddressPartHolder> subPartMap;
    
                public AddressPartHolder(int id, Map<String, AddressPartHolder> subPartMap) {
                    this.id = id;
                    this.subPartMap = subPartMap;
                }
            }
    
            public static int getCountryStateCityHashCode(String country, String state, String city) {
                if (country != null && country.length() != 0) {
                    int result = 0;
                    AddressPartHolder countryHolder = COUNTRY_MAP.get(country);
                    if (countryHolder == null) {
                        countryHolder = new AddressPartHolder(COUNTRY_MAP.size() + 1, new HashMap<>());
                        COUNTRY_MAP.put(country, countryHolder);
                    }
                    result += countryHolder.id * 10000000;
    
                    if (state != null) {
                        AddressPartHolder stateHolder = countryHolder.subPartMap.get(state);
                        if (stateHolder == null) {
                            stateHolder = new AddressPartHolder(countryHolder.subPartMap.size() + 1, new HashMap<>());
                            countryHolder.subPartMap.put(state, stateHolder);
                        }
                        result += stateHolder.id * 10000;
    
                        if (city != null && city.length() != 0) {
                            AddressPartHolder cityHolder = stateHolder.subPartMap.get(city);
                            if (cityHolder == null) {
                                cityHolder = new AddressPartHolder(stateHolder.subPartMap.size() + 1, null);
                                stateHolder.subPartMap.put(city, cityHolder);
                            }
                            result += cityHolder.id;
                        }
                    }
    
                    return result;
                } else {
                    throw new IllegalArgumentException("Non-empty country is expected");
                }
        }
    

    For your Person and Address classes you define hashCode and compareTo basing on a natural order of int:

        public class Person implements Comparable {
            private String country;
            private String state;
            private String city;
    
            @Override
            public boolean equals(Object o) {
                 //it's important but I removed it for readability
            }
    
            @Override
            public int hashCode() {
                return getCountryStateCityHashCode(country, state, city);
            }
    
            @Override
            public int compareTo(Object o) {
                //could be further improved by storing hashcode in a field to avoid re-calculation on sorting
                return hashCode() - o.hashCode();
            }
    
        }
    
        public class Address implements Comparable {
            private String country;
            private String state;
            private String city;
    
    
            @Override
            public boolean equals(Object o) {
                 //removed for readability
            }
    
            @Override
            public int hashCode() {
                return getCountryStateCityHashCode(country, state, city);
            }
    
            @Override
            public int compareTo(Object o) {
                //could be further improved by storing hashcode in a field to avoid re-calculation on sorting
                return hashCode() - o.hashCode();
            }
    
        }
    
        public class AddressPersonAdapter extends Person {
            private final Address delegate;
    
            public AddressPersonAdapter(Address delegate) {
                this.delegate = delegate;
            }
    
            @Override
            public boolean equals(Object o) {
                return delegate.equals(o);
            }
    
            @Override
            public int hashCode() {
                return delegate.hashCode();
            }
        }
    

    After that your filtering code would shrink to filling the index and calculating floor for your partial Address:

        TreeSet<Person> personSetByAddress = new TreeSet<>();
        Person personA = new Person();
        personA.setCountry("A");
        personSetByAddress.add(personA);
        Person personB = new Person();
        personB.setCountry("A");
        personB.setState("B");
        personSetByAddress.add(personB);
        Person personC = new Person();
        personC.setCountry("A");
        personC.setState("B");
        personC.setCity("C");
        personSetByAddress.add(personC);
    
        Address addressAB = new Address();
        addressAB.setCountry("A");
        addressAB.setState("B");
    
        System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
    
        Yields:
        Person{hashCode=10010000, country='A', state='B', city='null'}
    

    And if you won't have PersonB:

        TreeSet<Person> personSetByAddress = new TreeSet<>();
        Person personA = new Person();
        personA.setCountry("A");
        personSetByAddress.add(personA);
        Person personC = new Person();
        personC.setCountry("A");
        personC.setState("B");
        personC.setCity("C");
        personSetByAddress.add(personC);
    
        Address addressAB = new Address();
        addressAB.setCountry("A");
        addressAB.setState("B");
    
        System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
    
        Yields:
        Person{hashCode=10000000, country='A', state='null', city='null'}
    

    EDIT:

    Corner-case which will require additional validation would be absence of a greater(or lesser if we need ceiling) element within the same country. E.g.:

        TreeSet<Person> personSetByAddress = new TreeSet<>();
        Person personA = new Person();
        personA.setCountry("D");
        personSetByAddress.add(personA);
        Person personC = new Person();
        personC.setCountry("A");
        personC.setState("B");
        personC.setCity("C");
        personSetByAddress.add(personC);
    
        Address addressAB = new Address();
        addressAB.setCountry("A");
        addressAB.setState("B");
    
        System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
    
        Yields:
        Person{hashCode=10000000, country='D', state='null', city='null'}
    

    I.e. we fall out to the closest country. To fix this, we would need to check that the country digit is still the same. We can do it by sub-classing the TreeSet and adding this check there:

     //we need this class to allow flooring just by id
     public class IntegerPersonAdapter extends Person {
        private Integer id;
        public IntegerPersonAdapter(Integer id) {
            this.id = id;
        }
    
        @Override
        public boolean equals(Object o) {
            return id.equals(o);
        }
    
        @Override
        public int hashCode() {
            return id.hashCode();
        }
    
        @Override
        public int compareTo(Object o) {
            return id.hashCode() - o.hashCode();
        }
    
        @Override
        public String toString() {
            return id.toString();
        }
    }
    
    public class StrictCountryTreeSet extends TreeSet<Person> {
    
        @Override
        public Person floor(Person e) {
            Person candidate = super.floor(e);
            if (candidate != null) {
                //we check if the country is the same
                int candidateCode = candidate.hashCode();
                int eCode = e.hashCode();
                if (candidateCode == eCode) {
                    return candidate;
                } else {
                    int countryCandidate = candidateCode / 10000000;
                    if (countryCandidate == (eCode / 10000000)) {
                        //we check if the state is the same
                        int stateCandidate = candidateCode / 10000;
                        if (stateCandidate == (eCode / 10000)) {
                            //we check if is a state
                            if (candidateCode % 10 == 0) {
                                return candidate;
                            } else { //since it's not exact match we haven't found a city - we need to get someone just from state
                                return this.floor(new IntegerPersonAdapter(stateCandidate * 10000));
                            }
    
                        } else if (stateCandidate % 10 == 0) { //we check if it's a country already
                            return candidate;
                        } else {
                            return this.floor(new IntegerPersonAdapter(countryCandidate * 10000000));
                        }
                    }
                }
            }
            return null;
        }
    

    Now our test case would yield null after we intialize a StrictCountryTreeSet:

        TreeSet<Person> personSetByAddress = new StrictCountryTreeSet();
        Person personA = new Person();
        personA.setCountry("D");
        personSetByAddress.add(personA);
        Person personC = new Person();
        personC.setCountry("A");
        personC.setState("B");
        personC.setCity("C");
        personSetByAddress.add(personC);
    
        Address addressAB = new Address();
        addressAB.setCountry("A");
        addressAB.setState("B");
    
        System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
    
        Yields:
        null
    

    And a test for a different state would yield null as well:

        TreeSet<Person> personSetByAddress = new StrictCountryTreeSet();
        Person personD = new Person();
        personD.setCountry("D");
        personSetByAddress.add(personD);
    
        Person personE = new Person();
        personE.setCountry("A");
        personE.setState("E");
        personSetByAddress.add(personE);
    
        Person personC = new Person();
        personC.setCountry("A");
        personC.setState("B");
        personC.setCity("C");
        personSetByAddress.add(personC);
    
        Address addressA = new Address();
        addressA.setCountry("A");
    
        Address addressAB = new Address();
        addressAB.setCountry("A");
        addressAB.setState("B");
    
        Address addressABC = new Address();
        addressABC.setCountry("A");
        addressABC.setState("B");
        addressABC.setCity("C");
    
        System.out.println(personSetByAddress.floor(new AddressPersonAdapter(addressAB)));
    
        Yields:
        null
    

    Please note, that in this case you would need to store hashCode results within the Address and Person clasess to avoid it re-calculation.