Using apache derby ( 10.12.1.1), I created the following tables:
CREATE TABLE ROWCONTENT(
ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 1) PRIMARY KEY,
MD5SUM CHAR(32) UNIQUE,
CONTENT CLOB,
CONTIG VARCHAR(20),
START INT,
STOP INT,
REF VARCHAR(50) NOT NULL
);
CREATE TABLE VCF(
ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 1) PRIMARY KEY,
NAME VARCHAR(255)
);
CREATE TABLE VCFROW(
ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY (START WITH 1, INCREMENT BY 1) PRIMARY KEY,
VCF_ID INTEGER CONSTRAINT row2vcf REFERENCES VCF,
ROW_ID INTEGER CONSTRAINT row2content REFERENCES ROWCONTENT
);
but when I try a SELECT using the following statement
SELECT
VCF.ID,VCF.NAME, COUNT(VCFROW.ID) as "COUNT_VARIANTS"
FROM
VCF,VCFROW,ROWCONTENT
WHERE
VCFROW.VCF_ID=VCF.ID AND
VCFROW.ROW_ID = ROWCONTENT.ID AND
ROWCONTENT.CONTIG IS NOT NULL
GROUP BY VCF.ID
I get the following exception:
java.sql.SQLSyntaxErrorException: Column reference 'VCF.NAME' is invalid, or is part of an invalid expression. For a SELECT list with a GROUP BY, the columns and expressions being selected may only contain valid grouping expressions and valid aggregate expressions.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.generateCsSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.wrapInSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.TransactionResourceImpl.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.ConnectionChild.handleException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedStatement.execute(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedStatement.executeQuery(Unknown Source)
Do you know why ? Thanks.
First use explicit JOIN
:
...
FROM VCF
JOIN VCFROW
ON VCFROW.VCF_ID=VCF.ID
JOIN ROWCONTENT
ON VCFROW.ROW_ID = ROWCONTENT.ID
WHERE ROWCONTENT.CONTIG IS NOT NULL
GROUP BY VCF.ID;
You could also add table aliases so you don't have to write entire table names.
Second VCF.NAME
is not part of GROUP BY
and it isn't wrapped with aggregation function.
Depending on needs use:
GROUP BY VCF.ID,VCF.NAME
-- or
SELECT VCF.ID, MAX(VCF.NAME) AS NAME, COUNT(VCFROW.ID) AS COUNT_VARIANTS
Related: Group by clause in mySQL and postgreSQL, why the error in postgreSQL?
In standard SQL, a query that includes a GROUP BY clause cannot refer to nonaggregated columns in the select list that are not named in the GROUP BY clause