Search code examples
mysqlsqlhierarchical-datarecursive-query

Retrieving Tree in Hierarchical data in MySQL


I have stored some hierarchical data of categories where each category is related to others, the trick is a single category can have multiple parents (Maximum 3, Minumum 0).

The table structures are:

category table id - Primary Key

name - Name of the Category

ref_id - Reference ID that is being used for relationship

id name ref_id
1 everything -1
2 computing 0
3 artificial intelligence 1
4 data science 2
5 machine learning (ML) 3
6 programming 4
7 web technologies 5
8 programming languages 7
9 content technologies 8
10 operating systems 9
11 algorithms 10
12 software development systems 102

category_relation table

id child_ref_id parent_ref_id
1 0 -1
2 1 0
3 2 0
4 3 1
5 3 2
6 4 102
7 5 0
8 7 4
9 8 0
10 9 0
11 10 0
12 10 4
13 102 0

relationship diagram

as you can see in the diagram, the relationship is pretty complicated, algorithms has two parents computing and programming, similarly machine learning (ML) also has two parents artificial intelligence and data science

How can I retrieve all the children of a specific category, e.g. computing, I need to retrieve all the children till the third level, i.e. programming languages and algorithms.

MySQL dump of the database: https://github.com/codersrb/multi-parent-hierarchy/blob/main/taxonomy.sql


Solution

  • Assuming the data structure is fixed with a good PK, in MySQL 8.x you can do:

    with recursive
    n (id, name, ref_id, lvl) as (
      select id, name, ref_id, 1 from category where id = 2 -- starting node
     union all
      select c.id, c.name, c.ref_id, n.lvl + 1
      from n
      join category_relation r on r.parent_ref_id = n.ref_id
      join category c on c.ref_id = r.child_ref_id
    )
    select * from n where lvl <= 3
    

    Result:

     id   name                                    ref_id  lvl 
     ---- --------------------------------------- ------- --- 
     2    computing                               0       1   
     3    artificial intelligence                 1       2   
     4    data science                            2       2   
     7    web technologies                        5       2   
     9    content technologies                    8       2   
     10   operating systems                       9       2   
     11   algorithms                              10      2   
     62   information science                     61      2   
     103  software / systems development          102     2   
     165  scientific computing                    165     2   
     296  image processing                        316     2   
     297  text processing                         317     2   
     301  Google                                  321     2   
     322  computer vision                         343     2   
     5    machine learning (ML)                   3       3   
     5    machine learning (ML)                   3       3   
     6    programming                             4       3   
     18   models                                  17      3   
     21   classification                          20      3   
     27   data preparation                        26      3   
     28   data analysis                           27      3   
     29   imbalanced datasets                     28      3   
     50   visualization                           49      3   
     61   information retrieval                   60      3   
     68   k-means                                 67      3   
     71   Random Forest algorithm                 70      3   
     104  project management                      103     3   
     105  software development methodologies      104     3   
     107  web development                         106     3   
     113  kNN model                               112     3   
     132  CRISP-DM methodology                    131     3   
     143  data                                    142     3   
     153  SMOTE                                   153     3   
     154  MSMOTE                                  154     3   
     157  backward feature elimination            157     3   
     158  forward feature selection               158     3   
     176  deep feature synthesis (DFS)            177     3   
     196  unsupervised learning                   197     3   
     210  mean-shift                              211     3   
     212  DBSCAN                                  213     3   
     246  naïve Bayes algorithm                   247     3   
     248  decision tree algorithm                 249     3   
     249  support vector machine (SVM) algorithm  250     3   
     251  neural networks                         252     3   
     252  artificial neural networks (ANN)        253     3   
     281  deep learning                           300     3   
     281  deep learning                           300     3   
     285  image classification                    304     3   
     285  image classification                    304     3   
     286  natural language processing (NLP)       305     3   
     286  natural language processing (NLP)       305     3   
     288  text representation                     307     3   
     294  visual recognition                      314     3   
     295  optical character recognition (OCR)     315     3   
     295  optical character recognition (OCR)     315     3   
     296  image processing                        316     3   
     298  machine translation (MT)                318     3   
     299  speech recognition                      319     3   
     300  TensorFlow                              320     3   
     302  R                                       322     3   
     304  Android                                 324     3   
     322  computer vision                         343     3   
     323  object detection                        344     3   
     324  instance segmentation                   345     3   
     325  edge detection                          346     3   
     326  image filters                           347     3   
     327  feature maps                            348     3   
     328  stride                                  349     3   
     329  padding                                 350     3   
     335  text preprocessing                      356     3   
     336  tokenization                            357     3   
     337  case normalization                      358     3   
     338  removing punctuation                    359     3   
     339  stop words                              360     3   
     340  stemming                                361     3   
     341  lemmatization                           362     3   
     342  Porter algorithm                        363     3   
     350  word2vec                                371     3   
     351  Skip-gram                               372     3   
     364  convnets                                385     3   
     404  multiplicative update algorithm         716     3   
    

    If you want to remove duplicates you can use DISTINCT. For example:

    with recursive
    n (id, name, ref_id, lvl) as (
      select id, name, ref_id, 1 from category where id = 2 -- starting node
     union all
      select c.id, c.name, c.ref_id, n.lvl + 1
      from n
      join category_relation r on r.parent_ref_id = n.ref_id
      join category c on c.ref_id = r.child_ref_id
    )
    select distinct * from n where lvl <= 3
    

    See running example at DB Fiddle.