Search code examples
javasqljava-streamin-memory-database

Write SQL-like queries for java streams


I have a stream of java objects and I would like to query them using SQL-like syntax at a reasonable performance (comparable to querying a regular table without any indexes in an RDBMS, like a one-time full table scan).

I like the Stream API map/filter/etc., but the query would also be an input, so I can't hard-code it in java.

Is it possible to do this without inserting the incoming data into a "real" database (and then removing them later to save space)?
I was thinking about using an in-memory database like H2 or SQLite, but then I would still have to insert the data, and they really are not for streaming.

Are there any existing libraries/solutions for something like I'm trying to do?

class A {
    private String name;

    /* ... */
}
Stream<A> myStream /* = ... */ ;

Stream<Integer> result = query(myStream, "select count(*) as number_of_x from :myStream where name = 'x'",
    (rs, i) -> rs.getInt("number_of_x"));

/* result.toList() will contain one element at the end */

Solution

  • I have a stream of java objects and I would like to query them

    What you want is doesn't make a lot of sense.

    Streams are iterators, not containers of data. See the API documentation:

    No storage. A stream is not a data structure that stores elements; instead, it conveys elements from a source such as a data structure, an array, a generator function, or an I/O channel, through a pipeline of computational operations.

    So the stream aren't a mean of storing data.

    And once the stream is consumed, it can't be used anymore. You can query a stream like a database.

    A stream is an internal iterator that can be executed only once.

    Update

    If you're interested in implementing a Parser for translating SQL-like queries into Predicates and Functions, which would be applied on a stream, then sure you can try. For very simple queries, it's definitely doable.

    But it's not a trivial task. A fully-fledged parser for handling simple queries (similar to the one that has been specified in the question) would require a lot of effort both to implement and to test. I doubt whether it would pay off.

    Here's a very, very dumb illustration which makes use of the Reflection API and regular expressions.

    The Demo-parser below is not capable of doing much, the proper implementation would be far more complex.

    public class QueryParser {
        
        public static <T> long getAsCount(String query, Stream<T> stream, Class<T> tClass) { // overloaded versions for primitive streams
            
            StreamOperation<T> operation = StreamOperation.fromQuery(query, tClass);
            
            return stream
                .filter(operation.getPredicate())
                .count();
        }
    
        private static class StreamOperation<T> {
            
            public static final Pattern WHERE = Pattern.compile("(?<=(?i)where).*?(?=(?i)group)|(?<=(?i)where).*?(?=$)");
            
            private Predicate<T> predicate;
            // more properties, constructor
            
            public static <T> StreamOperation<T> fromQuery(String query, Class<T> tClass) {
                
                Predicate<T> where = WHERE.matcher(query).results()
                    .map(MatchResult::group)
                    .findFirst()
                    .map(conditions -> parseConditions(conditions, tClass))
                    .orElse(t -> true);
                
                // working on other properties
                
                StreamOperation<T> so = new StreamOperation<>(where);
                
                return so;
            }
            
            public Predicate<T> getPredicate() {
                return predicate;
            }
            
            public static <T> Predicate<T> parseConditions(String conditions, Class<T> tClass) {
                
                String[] or = conditions.split("(?i)or"); // split by OR
                
                Predicate<T> orPredicate = t -> false; // base predicate for OR
                
                for (String jointCondition: or) {
                    String[] and = jointCondition.split("(?i)and"); // split by END
                    
                    Predicate<T> andPredicate = t -> true; // base predicate for AND
                    
                    for (String condition: and) {
                        Predicate<T> next = null;
                        // parse each condition
                        try {
                            next = Conditions.parseCondition(condition, tClass);
                        } catch (NoSuchFieldException e) {
                            e.printStackTrace();
                            throw new IllegalArgumentException("Invalid condition or type:\n"
                                + condition + " for Type " + tClass.getCanonicalName());
                        }
                        andPredicate = andPredicate.and(next); // join conditions together using Predicate.end()
                    }
                    orPredicate = orPredicate.or(andPredicate); // join conditions together using Predicate.or()
                }
                
                return orPredicate;
            }
        }
        
        private static class Conditions {
            
            public static <T> Predicate<T> parseCondition(String conditions, Class<T> tClass) throws NoSuchFieldException {
                
                // TO DO add logic for other conditions
                // Logic equality comparison implemented for demo purposes
                
                String[] equals = conditions.split("=");
                
                Field field = tClass.getDeclaredField(equals[0].strip());
                field.setAccessible(true);
                
                return field.getType().isPrimitive() ? // assumption that boolean is also represented as numeric value 0 or 1
                    compareAsNumericType(field, equals) : compareAsString(field, equals);
                
            }
            
            public static <T> Predicate<T> compareAsNumericType(Field field, String[] equals) {
        
                return t -> {
                    try {
                        return field.getDouble(t) == Double.parseDouble(equals[1].strip());
                    } catch (IllegalAccessException e) {
                        e.printStackTrace();
                        return false;
                    }
                };
            }
        
            public static <T> Predicate<T> compareAsString(Field field, String[] equals) {
            
                return t -> {
                    try {
                        return field.get(t).equals(equals[1].strip().replace("'", ""));
                    } catch (IllegalAccessException e) {
                        e.printStackTrace();
                        return false;
                    }
                };
            }
        }
        
        // TODO implement methods for retrieving other results
    
    //    public static <T, R> List<R> getAsList(String query, Stream<T> stream, Class<T> tClass) { // overloaded versions for primitive streams
    //
    //        StreamOperation<T> operation = StreamOperation.fromQuery(query, tClass);
    //
    //        return stream
    //            .filter(operation.getPredicate())
    //            .map(operation.getMapper()) // not implemented
    //            .toList();
    //    }
    }
    

    A dummy class for testing:

    public class A {
        private int id;
        private String name;
    
        // getters, constructor
    }
    

    main()

    public static void main(String[] args) {
        String query = "SELECT count(*) as number_of_x from :myStream WHERE name = 'Alise' AND id = 100";
        
        Stream<A> stream = Stream.of(
            new A(100, "Alise"),
            new A(90, "Bob"),
            new A(100, "Carol")
        );
        
        System.out.println(QueryParser.getAsCount(query, stream, A.class));
    }
    

    Output:

    1