Search code examples
sql-serversubqueryoutputquery-optimization

How to select multiple items in a subquery SQL Server


I have a query which select multiple items using a subquery as:

SELECT DISTINCT A2P.aid, P.pid 
FROM sub_aminer_author2paper A2P, sub_aminer_paper P
WHERE DATALENGTH(P.p_abstract_SWR) > 0
AND P.pid IN (SELECT pid 
              FROM sub_aminer_author2paper
              WHERE p_year BETWEEN 2005 AND 2014 
              AND aid = 677
             )
AND A2P.aid = 677
ORDER BY A2P.aid  

It gives the output as:

aid pid  
677 812229  
677 812486  
677 818273  
677 975105  
677 1129619  
677 1626166  
677 1924898  
677 2014164  
677 2070844   

I want to have same output if I use multiple values for aid as aid IN (SELECT aid FROM Authors) and A2P.aid IN (SELECT aid FROM Authors). But if I use to execute this query:

SELECT DISTINCT A2P.aid, P.pid 
FROM sub_aminer_author2paper A2P, sub_aminer_paper P
WHERE DATALENGTH(P.p_abstract_SWR) > 0
AND P.pid IN (SELECT pid 
              FROM sub_aminer_author2paper
              WHERE p_year BETWEEN 2005 AND 2014 
              AND aid IN (677, 1359)
             )
AND A2P.aid IN (677, 1359)
ORDER BY A2P.aid  

It multiplies the output for each number of aid's as output for two aid's (for example) should be 125 rows but it gives 250 rows (125 * 2) i.e. 125 rows for aid i.e. 677 and 125 rows for aid i.e. 1359. Similarly, output for three aid's (for example) should be 191 rows but it gives 573 rows (191 * 3) i.e. 191 rows for each aid.

How can I modify this query?


Solution

  • By using A's comments the query can be modified as:

    SELECT A2P.aid, P.pid
    FROM sub_aminer_author2paper A2P 
    JOIN sub_aminer_paper P ON P.pid = A2P.pid 
    WHERE DATALENGTH(P.p_abstract_SWR) > 0 
    AND P.p_year BETWEEN 2005 AND 2014 
    AND A2P.aid IN (SELECT aid FROM Authors)
    ORDER BY A2P.aid