Search code examples
javasqlderby

Apache Derby: Column reference 'xx' is invalid, or is part of an invalid expression.


Using apache derby ( 10.12.1.1), I created the following tables:

CREATE TABLE ROWCONTENT(
ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 1) PRIMARY KEY,
MD5SUM CHAR(32) UNIQUE,
CONTENT CLOB,
CONTIG VARCHAR(20),
START INT,
STOP INT,
REF VARCHAR(50) NOT NULL
);

CREATE TABLE VCF(
ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 1) PRIMARY KEY,
NAME VARCHAR(255)
);

CREATE TABLE VCFROW(
ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 1) PRIMARY KEY,
VCF_ID INTEGER CONSTRAINT row2vcf REFERENCES VCF,
ROW_ID INTEGER CONSTRAINT row2content REFERENCES ROWCONTENT
);

but when I try a SELECT using the following statement

SELECT
 VCF.ID,VCF.NAME, COUNT(VCFROW.ID)  as "COUNT_VARIANTS"
FROM
  VCF,VCFROW,ROWCONTENT
WHERE
  VCFROW.VCF_ID=VCF.ID AND
  VCFROW.ROW_ID = ROWCONTENT.ID AND
  ROWCONTENT.CONTIG IS NOT NULL
GROUP BY VCF.ID

I get the following exception:

java.sql.SQLSyntaxErrorException: Column reference 'VCF.NAME' is invalid, or is part of an invalid expression.  For a SELECT list with a GROUP BY, the columns and expressions being selected may only contain valid grouping expressions and valid aggregate expressions.
    at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
    at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
    at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
    at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
    at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
    at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
    at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
    at org.apache.derby.impl.jdbc.EmbedStatement.executeQuery(Unknown Source)

Do you know why ? Thanks.


Solution

  • First use explicit JOIN:

    ...
    FROM VCF
    JOIN VCFROW 
      ON VCFROW.VCF_ID=VCF.ID
    JOIN ROWCONTENT
      ON VCFROW.ROW_ID = ROWCONTENT.ID
    WHERE ROWCONTENT.CONTIG IS NOT NULL
    GROUP BY VCF.ID;
    

    You could also add table aliases so you don't have to write entire table names.


    Second VCF.NAME is not part of GROUP BY and it isn't wrapped with aggregation function.

    Depending on needs use:

    GROUP BY VCF.ID,VCF.NAME
    -- or
    SELECT VCF.ID, MAX(VCF.NAME) AS NAME, COUNT(VCFROW.ID) AS COUNT_VARIANTS
    

    Related: Group by clause in mySQL and postgreSQL, why the error in postgreSQL?

    In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause