Search code examples
elixirecto

How to represent this query in Elixir's Ecto Query format?


I'm having trouble "translating" the following SQL query into a valid Ecto query:

SELECT *
FROM file_modules 
WHERE file_id =
  (SELECT f.file_id
  FROM files AS f
  LEFT JOIN file_modules AS fm ON f.file_id = fm.file_id
  WHERE f.storage_id = '20:1:0:86d:1591:c89c:512:de52' AND fm.name = 'bruteforce'
  ORDER BY f.version DESC
  LIMIT 1)

I believe the way to go is to split this query into two. Here's what I've tried:

q1 = 
  File
  |> where([f], f.storage_id == ^storage_id)
  |> join(:left, [f], fm in FileModule, f.file_id == fm.file_id)
  |> where([..., fm], fm.name == ^module_name)
  |> order_by([..., fm], desc: fm.version)
  |> select([fm], fm.file_id)
  |> limit(1)

# q1 works and returns the expected file_id

q2 =
  q1
  |> subquery()
  |> where([fm, fm2], fm.file_id == fm2.file_id)  # Here's where I'm stuck
  |> preload([..., m], [modules: m])

q1 has the following Ecto.Query result:

#Ecto.Query<from f0 in Helix.Software.Model.File,
 left_join: f1 in Helix.Software.Model.FileModule, on: f0.file_id == f1.file_id,
 where: f0.storage_id == ^(storage_id),
 where: f1.name == ^:bruteforce, order_by: [desc: f1.version], limit: 1,
 select: f0.file_id>

My problem seems to be with q2. How should I format it in order to use the results from q1 as input?

Thanks in advance.


I believe this question is agnostic to my underlying schema, but here it is:

schema "files" do
  field :file_id, ID,
    primary_key: true

  field :name, :string
  field :path, :string
  field :software_type, Constant
  field :file_size, :integer
  field :storage_id, Storage.ID

  field :crypto_version, :integer

  field :full_path, :string

  belongs_to :type, SoftwareType,
    foreign_key: :software_type,
    references: :software_type,
    define_field: false
  belongs_to :storage, Storage,
    foreign_key: :storage_id,
    references: :storage_id,
    define_field: false

  has_many :modules, FileModule,
    foreign_key: :file_id,
    references: :file_id,
    on_replace: :delete

  timestamps()
end



schema "file_modules" do
  field :file_id, File.ID,
    primary_key: true
  field :name, Constant,
    primary_key: true
  field :version, :integer

  belongs_to :file, File,
    foreign_key: :file_id,
    references: :file_id,
    define_field: false,
    on_replace: :update
end

As mentioned by Jablanović on the comments, it's possible that the ORM abstraction fails sometimes. In that case, would it be possible to use the raw sql above while:

1) casting the File.ID and Storage.ID types, in order to avoid ugly string concatenation?

2) After returning the result, preloading or saving this result into the schema?

The interface I have in mind is something like:

q = "SELECT * FROM files WHERE file_id = $1", ^file_id
file = Repo.get(q) |> Repo.preload()

file.file_id  # Returns file id

As per the examples of Ecto.Adapters.SQL.query/4, it looks like I can achieve 1. How about 2?

My end goal would be to use the nested SQL query above, while casting file_id and storage_id safely, and using the %File{} schema correctly, with a loaded association at file.modules.

Note that a stored procedure or view would give me a better interface, which I could cast correctly using fragment, but I feel I'd still have problems with "preloading" that data into the schema.


Solution

  • Subquerys are currently not allowed in where clauses; the documentation recommends using a JOIN instead.

    I haven't tested it but the following query should produce the same results as your original SQL:

    q1 = ... # Same as in the question.
    q2 =
      FileModule
      |> join(:left, [fm], fid in subquery(q1), fm.file_id == fid)
      |> select([fm], fm)