Search code examples
google-bigquerygdelt

Counting Number of Protest from GDELT database


Unfortunately I have no experience with BigQuery or programming in general, but I need data from GDELT for my thesis and can't access it through the analysis tool. That's why I created this query to determine the number of protests in Iran in September 2022. However, the number I get as a result is much too high (compared e.g., ACLED). Do I need to refine my query so that I only get the count of individual events?

SELECT COUNT(\*) FROM gdelt-bq.gdeltv2.events_partitioned where \_PARTITIONTIME \>= TIMESTAMP('2022-09-01') and \_PARTITIONTIME \<= TIMESTAMP('2022-09-30') and EventRootCode = '14' and ActionGeo_CountryCode = 'IR' ;

(As another test: ACLED lists >10.600 demonstrations in the USA from 24.05-22.08.2020, with my query I receive 123.500 as result)


Solution

  • Your query is counting all mentions of protests, but GDELT records the same event multiple times from different sources, inflating the count.

    How to refine your query?

    To get the number of unique protests in Iran for September 2022, modify your query to count distinct events instead of all mentions.

    Improved query:

    SELECT COUNT(DISTINCT GLOBALEVENTID) AS unique_protests
    FROM `gdelt-bq.gdeltv2.events_partitioned`
    WHERE _PARTITIONTIME BETWEEN TIMESTAMP('2022-09-01') AND TIMESTAMP('2022-09-30')
    AND EventRootCode = '14' 
    AND ActionGeo_CountryCode = 'IR';
    

    Want even more accuracy?

    You can filter based on confidence scores:

    • NumMentions > X → Only count events reported multiple times.
    • AvgTone < 0 → Focus on negative-toned events (protests usually have a negative tone).