Search code examples
sqlpostgresqlrelationshipdistinct-on

Use DISTINCT ON with empty n:n relations


I'm a new user of PostgreSQL, trying to use DISTINCT ON but I can't reach my goal.

Here's a brief sketch of my database :

  • files with versioning
  • fields with model (for form generation purpose)
  • n:n relations between files' versions and fields

enter image description here

I would like to retrieve a whole set of fields for a specified file's version.
My problem is that we could have (and we'll) empty values, ie. missing FileVersion_Field relations. I'll try to give you an example bellow :

FileVersion                            Field                                
+----------------+---------+---------+ +----------+-------+---------------+ 
| id_fileversion | id_file | version | | id_field | value | id_fieldmodel | 
+----------------+---------+---------+ +----------+-------+---------------+ 
| 1              | 1       | 1       | | 1        | Smith | 1             | 
| 2              | 1       | 2       | | 2        | 20    | 2             | 
+----------------+---------+---------+ | 3        | 25    | 2             | 
                                       +----------+-------+---------------+

FileVersion_Field             FieldModel
+----------------+----------+ +---------------+------+
| id_fileversion | id_field | | id_fieldmodel | type |
+----------------+----------+ +---------------+------+
| 1              | 1        | | 1             | Name |
| 1              | 2        | | 2             | Age  |
| 2              | 3        | +---------------+------+
+----------------+----------+ 

In this example, I would like to get these results:

-- id_file=1 & version=1
Name | Smith
Age  | 20
-- id_file=1 & version=2
Name | 
Age  | 25

Here's what I've tried, which doesn't work :

SELECT DISTINCT ON(FieldModel.id_fieldmodel) *
FROM File
LEFT JOIN FileVersion ON File.id_file = FileVersion.id_file
LEFT JOIN FileVersion_Field ON FileVersion.id_fileversion = FileVersion_Field.id_fileversion
LEFT JOIN Field ON FileVersion_Field.id_field = Field.id_field
RIGHT JOIN FieldModel ON (Field.id_fieldmodel = FieldModel.id_fieldmodel OR FieldModel.id_fieldmodel IS NULL)
WHERE (FieldModel.id_fieldmodel IS NOT NULL AND FileVersion.version = 2 AND File.id_file = 1)
   OR (Field.id_fieldmodel IS NULL)
ORDER BY FieldModel.id_fieldmodel;

-- Sample Structure
CREATE TABLE File (
  id_file integer PRIMARY KEY);
CREATE TABLE FieldModel (
  id_fieldmodel integer PRIMARY KEY, type varchar(50));
CREATE TABLE FileVersion (
  id_fileversion integer PRIMARY KEY, 
  id_file integer, version integer, 
  CONSTRAINT fk_fileversion_file FOREIGN KEY(id_file) REFERENCES File(id_file));
CREATE TABLE Field (
  id_field integer PRIMARY KEY, 
  id_fieldmodel integer, 
  value varchar(255), 
  CONSTRAINT fk_field_fieldmodel FOREIGN KEY(id_fieldmodel) REFERENCES FieldModel(id_fieldmodel));
CREATE TABLE FileVersion_Field (
  id_fileversion integer, 
  id_field integer, 
  PRIMARY KEY(id_fileversion, id_field), 
  CONSTRAINT fk_fileversionfield_fileversion FOREIGN KEY(id_fileversion) REFERENCES FileVersion(id_fileversion), 
  CONSTRAINT fk_fileversionfield_field FOREIGN KEY(id_field) REFERENCES Field(id_field));

-- Sample Data
INSERT INTO File (id_file) VALUES (1);
INSERT INTO FileVersion (id_fileversion, id_file, version) VALUES (1, 1, 1), (2, 1, 2);
INSERT INTO FieldModel (id_fieldmodel, type) VALUES (1, 'Name'), (2, 'Age');
INSERT INTO Field (id_field, id_fieldmodel, value) VALUES (1, 1, 'Smith'), (2, 2, '20'), (3, 2, '25');
INSERT INTO FileVersion_Field (id_fileversion, id_field) VALUES (1, 1), (1, 2), (2, 3);

Solution

  • 7 years later, time to exorcize my daemons!

    I just needed to change my way of thinking.

    1. First, we need the list of all used FieldModel for a File, whatever the version:
    SELECT DISTINCT(fm.id_fieldmodel), fm.type
    FROM FieldModel fm
    LEFT JOIN Field f ON fm.id_fieldmodel = f.id_fieldmodel
    LEFT JOIN FileVersion_Field fvf ON f.id_field = fvf.id_field
    LEFT JOIN FileVersion fv ON fv.id_fileversion = fvf.id_fileversion
    WHERE fv.id_file = 1;
    
    --  id_fieldmodel | type
    -- ---------------+------
    --              1 | Name
    --              2 | Age
    
    1. Now, we need the list of Field for the same File, but this time with a specified version:
    SELECT f.id_fieldmodel, f.value
    FROM FileVersion_Field fvv
    JOIN FileVersion fv ON fv.id_fileversion = fvv.id_fileversion
    JOIN Field f ON f.id_field = fvv.id_field
    WHERE fv.id_file = 1 AND fv.version = 2;
    
    --  id_fieldmodel | value
    -- ---------------+-------
    --              2 | 25
    
    1. All that remains is to use a LEFT JOIN on both computed tables, by allowing NULL values in the fields:
    SELECT fm.type, f.value
    FROM (
      SELECT DISTINCT(fm.id_fieldmodel), fm.type
      FROM FieldModel fm
      LEFT JOIN Field f ON fm.id_fieldmodel = f.id_fieldmodel
      LEFT JOIN FileVersion_Field fvf ON f.id_field = fvf.id_field
      LEFT JOIN FileVersion fv ON fv.id_fileversion = fvf.id_fileversion
      WHERE fv.id_file = 1
    ) fm 
    LEFT JOIN (
      SELECT f.id_fieldmodel, f.value
      FROM FileVersion_Field fvv
      JOIN FileVersion fv ON fv.id_fileversion = fvv.id_fileversion
      JOIN Field f ON f.id_field = fvv.id_field
      WHERE fv.id_file = 1 AND fv.version = 2
    ) f ON (f.id_fieldmodel = fm.id_fieldmodel OR f.id_fieldmodel IS NULL);
    
    --  type | value
    -- ------+-------
    --  Name |
    --  Age  | 25