I'm having trouble constrsucting a query and getting it to run on impala. I created the following working query to join two tables:
SELECT *
FROM illuminavariant as vcf, ensembl_genes as ens
WHERE vcf.filter = "PASS"
AND vcf.qual > 100
AND vcf.chromosome = ens.chromosome
AND vcf.position BETWEEN ens.start AND ens.stop
Now I'm trying to write a query that finds all variants WHERE vcf.filter = "PASS" and vcf.qual > 100, but do not have a match by chromosome and position.
I've tried this:
SELECT *
FROM p7dev.illumina_test, p7dev.ensembl_test
WHERE NOT EXISTS(
SELECT *
FROM p7dev.illumina_test as vcf, p7dev.ensembl_test as ens
WHERE vcf.chromosome = ens.chromosome
AND vcf.position BETWEEN ens.start AND ens.stop
)
But that's not returning any results. I think a WITH clause might do the trick, but I'd really appreciate if anybody could help me wrap my head around the logic of how that would work. Thanks so much!
Since you're looking for variants that are not correlated to any ensemble, it seems strange that you would form a cross join of variants and ensembles from which to filter out rows. If that's really what you want, though, then this ought to do it:
SELECT *
FROM illuminavariant as vcf, ensembl_genes as ens
WHERE vcf.filter = "PASS"
AND vcf.qual > 100
AND (
vcf.chromosome != ens.chromosome
OR vcf.position < ens.start
OR vcf.position > ens.stop
)
That just negates the condition correlating variant rows with ensemble rows.
I suspect what you really want is more like this, though:
SELECT vcf.*
FROM
illuminavariant as vcf
LEFT JOIN ensembl_genes as ens
ON vcf.chromosome = ens.chromosome
AND vcf.position BETWEEN ens.start AND ens.stop
WHERE
vcf.filter = "PASS"
AND vcf.qual > 100
AND ens.chromosome IS NULL
That performs the same join as your first query, but as a left join. The rows that actually represent matches are then filtered out by the ens.chromosome IS NULL
condition. It returns only the columns of the variant table, as the whole point is to find variants that have no corresponding row in the ensemble table.