Search code examples
mysqlsqljoinduplicatesaggregate-functions

How to get rid of duplicates with multiple-table joins and aggregate function in MySQL?


When joining three tables, I get the desired values, but when I join more, the values begin to duplicate.

I understand what's going on (each currency is recorded in each category), but I don't know how to write a query to get rid of it.

Here are examples of tables: volunteers categories volunteer_category currencies volunteer_currency

When i write a query:

SELECT DISTINCT v.id, v.name, v.type, v.logo, v.description, JSON_ARRAYAGG(c.category) AS categories 
    FROM volunteers v 
      INNER JOIN volunteer_category vc ON v.id = vc.volunteer_id 
      INNER JOIN categories c ON vc.category_id = c.id
    GROUP BY v.id

The result looks good:

id: 1,
name: "Serhiy Prytula",
type: "Charity foundation of",
logo: "https://scontent.fdnk6-2.fna.fbcdn.net/v/t39.30808-1/294616431_145621521453151_1955718552327534606_n.jpg?stp=cp0_dst-jpg_p40x40&_nc_cat=107&ccb=1-7&_nc_sid=1eb0c7&_nc_ohc=fxJFxGLTc_QAX8_9ZS9&_nc_ht=scontent.fdnk6-2.fna&oh=00_AfAjiJWqlm46HddMDALnVOqeuHmqoCGI6pgiM6lAdLKKxg&oe=636D0B6E",
description: "Owner is a well-known Ukrainian TV presenter, actor, stand-up comedian, author and co-producer of several TV shows since 1998. Former resident of Comedy Club Ukraine. Serhiy Prytula has been helping the army as a private volunteer since Russia began its occupation of Crimea and Donbas. At first, he helped with equipment, armor, and food. Subsequently, he began to provide units with high-precision equipment, drones and equipment for command posts.",
categories: "["vehicles", "surveillance", "communication", "weapons", "mechanical_repair", "training", "humanitarian_aid", "housing"]"
},

But when I add currency to the query:

SELECT DISTINCT v.id, v.name, v.type, v.logo, v.description, JSON_ARRAYAGG(c.category) AS categories, JSON_ARRAYAGG(cur.code) AS currencies 
    FROM volunteers v 
      INNER JOIN volunteer_category vc ON v.id = vc.volunteer_id 
      INNER JOIN categories c ON vc.category_id = c.id
      INNER JOIN volunteer_currency vcur ON v.id = vcur.volunteer_id
      INNER JOIN currencies cur ON vcur.currency_id = cur.id
    GROUP BY v.id

The result is:

{
id: 1,
name: "Serhiy Prytula",
type: "Charity foundation of",
logo: "https://scontent.fdnk6-2.fna.fbcdn.net/v/t39.30808-1/294616431_145621521453151_1955718552327534606_n.jpg?stp=cp0_dst-jpg_p40x40&_nc_cat=107&ccb=1-7&_nc_sid=1eb0c7&_nc_ohc=fxJFxGLTc_QAX8_9ZS9&_nc_ht=scontent.fdnk6-2.fna&oh=00_AfAjiJWqlm46HddMDALnVOqeuHmqoCGI6pgiM6lAdLKKxg&oe=636D0B6E",
description: "Owner is a well-known Ukrainian TV presenter, actor, stand-up comedian, author and co-producer of several TV shows since 1998. Former resident of Comedy Club Ukraine. Serhiy Prytula has been helping the army as a private volunteer since Russia began its occupation of Crimea and Donbas. At first, he helped with equipment, armor, and food. Subsequently, he began to provide units with high-precision equipment, drones and equipment for command posts.",
categories: "["vehicles", "vehicles", "vehicles", "vehicles", "vehicles", "vehicles", "vehicles", "vehicles", "vehicles", "vehicles", "surveillance", "surveillance", "surveillance", "surveillance", "surveillance", "surveillance", "surveillance", "surveillance", "surveillance", "surveillance", "communication", "communication", "communication", "communication", "communication", "communication", "communication", "communication", "communication", "communication", "weapons", "weapons", "weapons", "weapons", "weapons", "weapons", "weapons", "weapons", "weapons", "weapons", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "mechanical_repair", "training", "training", "training", "training", "training", "training", "training", "training", "training", "training", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "humanitarian_aid", "housing", "housing", "housing", "housing", "housing", "housing", "housing", "housing", "housing", "housing"]",
currencies: "["uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth", "uah", "eur", "usd", "pln", "czk", "gbp", "chf", "btc", "usdt", "eth"]"
}

I want to remove duplicates and optimize the query because I will be joining more tables.

I also tried using JSON_ARRAYAGG in INNER JOIN but don't know how to unpack json in ON clause

INNER JOIN (SELECT volunteer_id, JSON_ARRAYAGG(currency_id) AS currency_id FROM volunteer_currency GROUP BY volunteer_id) AS vcur ON v.id = vcur.volunteer_id
INNER JOIN currencies cur ON ???

Solution

  • You already know whats happening. You join all categories and all currencies to a volunteer, so for a volunteer with two categories and two currencies you get four rows of unrelated category-currency pairs. You then aggregate your data per volunteer, but that data is already flawed.

    Instead, join the category array and the currency array to a volunteer:

    SELECT
      v.id, v.name, v.type, v.logo, v.description,
      cat.categories,
      cur.currencies
    FROM volunteers v 
    INNER JOIN
    (
      SELECT vc.volunteer_id, JSON_ARRAYAGG(c.category) AS categories
      FROM volunteer_category vc
      INNER JOIN categories c ON c.id = vc.category_id
      GROUP BY vc.volunteer_id
    ) cat ON cat.volunteer_id = v.id
    INNER JOIN
    (
      SELECT vc.volunteer_id, JSON_ARRAYAGG(c.code) AS currencies 
      FROM volunteer_currency vc
      INNER JOIN currencies c ON c.id = vc.currency_id
      GROUP BY vc.volunteer_id
    ) cur ON cur.volunteer_id = v.id
    ORDER BY v.id;
    

    And if there can be volunteers without a catergory or without a currency and you still want to show them, change your inner joins to left outer joins.