Search code examples
google-bigquerygdelt

Counting number of events (protests) from GDELT database


My goal is to get the monthly number of protests in Mexico reported between the years 2004 and 2020. I am using Google BigQuery to get this data from the GDELT database.

My problem is that I am getting different results when running the same query on different tables.

select
    GlobalEventID
    ,MonthYear
    ,ActionGeo_Long
    ,ActionGeo_Lat
from
    gdelt-bq.full.events_partitioned -- Returns 34650 records
    --gdelt-bq.gdeltv2.events_partitioned -- Returns 93551 records
where
    _PARTITIONTIME >= TIMESTAMP('2004-01-01')
    and _PARTITIONTIME <= TIMESTAMP('2020-12-31')
    and EventRootCode = '14'
    and ActionGeo_CountryCode = 'MX'
;

Can you tell me which table I should use and why the query results differ from each other?


Solution

  • According to the GDELT documentation, gdeltv2 contains more events, and is more up to date for recent years. However they may not have finished backpopulating it to 1979.

    This query shows only 20340 of the 93563 event IDs existing in both tables, so for such a large time range you may get best results by using the v1 table before 2015, and the v2 table from 2015 onwards.

    SELECT COUNT(*)
    FROM gdelt-bq.gdeltv2.events_partitioned g2
    JOIN gdelt-bq.full.events_partitioned g1 ON g1.GlobalEventID = g2.GlobalEventID
    WHERE g2._PARTITIONTIME >= TIMESTAMP('2004-01-01')
    AND   g2._PARTITIONTIME <= TIMESTAMP('2020-12-31')
    AND   g2.EventRootCode = '14'
    AND   g2.ActionGeo_CountryCode = 'MX'
    AND   g1._PARTITIONTIME >= TIMESTAMP('2004-01-01')
    AND   g1._PARTITIONTIME <= TIMESTAMP('2020-12-31')
    AND   g1.EventRootCode = '14'
    AND   g1.ActionGeo_CountryCode = 'MX'