Search code examples
hadoopapache-pigcloudera

Invalid scalar projection in PIG


I have the data in PIG with column names as

keyword, campaign_id, date, time, display_site, was_clicked, cpc, country, placement

What I'm trying to do is finding keywords with high CTR.

So, I'm trying to understand why the following code is giving me Invalid Scalar projection error

  grouped = GROUP data BY keyword;
  by_keyword = FOREACH grouped 
{
  clicked = FILTER data BY was_clicked == 1;
  total = COUNT(data.keyword);
  GENERATE group, ((double)COUNT(clicked) / total) AS ctr;
}

The error I'm getting:

37,632 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: 
<line 59, column 33> Invalid scalar projection: clicked : A column needs to be projected from a relation for it to be used as a scalar
Details at logfile: /home/cloudera/pig_1486224821223.log

Any help would be appreciated.

Edit:

data = LOAD '/user/cloudera/pig_demo/ad_data.txt' AS (keyword:chararray,campaign_id:chararray,
      date:chararray, time:chararray,display_site:chararray, was_clicked:int, 
      cpc:int, country:chararray, placement:chararray);

Sample of the records:

tablet  C6  5/1/2013    3:47:10 movienet.example.com    0   102 USA TOP

Solution

  • Pig version 0.15.

    Input file data.txt:

    tablet  C6  5/1/2013    3:47:10 movienet.example.com    0   102 USA TOP
    tablet  C6  5/1/2013    3:47:10 movienet.example.com    0   102 USA TOP
    tablet  C6  5/1/2013    3:47:10 movienet.example.com    0   102 USA TOP
    tablet  C6  5/1/2013    3:47:10 movienet.example.com    1   102 USA TOP
    

    Script:

    data = LOAD '/path/data.txt' AS (keyword:chararray,campaign_id:chararray,
      date:chararray, time:chararray,display_site:chararray, was_clicked:int, 
      cpc:int, country:chararray, placement:chararray);
    grouped = GROUP data BY keyword;
    by_keyword = FOREACH grouped 
    {
      clicked = FILTER data BY was_clicked == 1;
      total = COUNT(data.keyword);
      GENERATE group, ((double)COUNT(clicked) / total) AS ctr;
    }
    dump by_keyword
    

    gives me correct result:

    (tablet,0.25)