Search code examples
sqlhadoophiveapache-pig

how to subtract count values in hive in same table same column


hi the screenshot I uploaded is table with first column post_id, score, answerCount, CommentCount hi I am stuck on a hive problem I am very noob in sql and hive I am working on stack overflow dataset, I am trying to find percentage of questions answered. what I did is I counted all the questions and counted all the questions which has been answered but I am stuck on how to subtract them

select AnswerCount
> from posts
> LEFT JOIN posts
> ON AnswerCount = AnswerCount
> WHERE AnswerCount IS NULL; 

I want the result to be count of all - count of question answered some of the answerCounts are null I did this to count answers with

`select AnswerCount
>from posts
>where AnswerCount > 0;`

here is the schema

post_id score AnswerCount CommentCount
385106    2       NULL        0
385107    2        0          2
385108   14       NULL        4
385109   -2       NULL        3
385110    8       NULL        5
385113   -8       NULL        2
385114   16       NULL        0
385116   30        2          6
385118   -2       NULL        0

Solution

  • Updated my answer to clean it up.

    This checked out:

        SELECT 
        CAST(( SELECT COUNT(ua.post_id) FROM posts ua 
        WHERE ua.AnswerCount IS NOT NULL) AS DECIMAL(3,2)) / 
        CAST(COUNT(t.post_id) AS DECIMAL(3,2))
          FROM posts t 
    

    The query contains a sub query which selects the COUNT() of posts where AnswerCount IS NULL, it divides that by the total number of posts. The rest is to CAST the integers to DECIMAL since a factional result will be reported as 0 if left as an int.