Search code examples
postgresqlbusiness-logicdata-layersapplication-layer

Using Postgres' external procedural languages over application code


I am trying to figure out the advantages and disadvantages of using non-plpgsql procedural languages (PL/Python, PL/Perl, PL/v8, etc.) to implement data manipulation logic on the database level instead of going up to model level/ORM of the application framework that interacts with the database (Rails, Entity Framework, Django, etc.) and implementing it there.

To give a concrete example, say, I have a table that contains Mustache templates, and I want to have them "rendered" somehow. Table definition:

create table templates (
  id serial primary key,
  content text not null,
  data jsonb not null
);

Usually I would go the model code and add and extra method to render the template. Example in Rails:

class Template < ApplicationRecord
  def rendered
    Mustache.render(content, data)
  end
end

However, I could also write a PL/Python function that would do just that but on the database level:

create or replace function fn_mustache(template text, data jsonb)
returns text 
language plpython3u
as $$
  import chevron
  import json
  return chevron.render(template, json.loads(data))
$$;

create view v_templates as 
  select id, content, data, fn_mustache(content, data) as rendered
    from templates;

This yields virtually the same result functionality-wise. This example is very basic, yet the idea is to use PL/Python (or others) to manipulate the data in a more advanced manner than PL/pgsql can allow for. That is, PL/pgsql does not have the same amount of libraries that any generic programming language provides today (in the example am relying on implementations of Mustache templating system which would not be practical to implement in PL/pgsql in this case). I obviously would not use PL/Python for any sort of networking or other OS-level features, but for operations exclusively on data this seems like a decent approach (change my mind).

Points that I can observe so far:

  • PL/Python is an "untrusted" language which I guess makes it by definition more dangerous to write a function in since you have access to syscalls; at least it feels like the cost of messing up a PL/Python function is higher than that of the mistake on the application layer, since the former is executed in the context of the database
  • Database approach is more extensible since I am working on the level that is the closest to the data, i.e. I am not scattering the presentation logic across multiple "tiers" (ORM and DB in this case). This means that if I need some other external service interested in interacting with the data, I can plug it directly into the database, bypassing the application layer.
  • Implementing this on model level just seems much simpler in execution
  • Supporting the application code variant seems easier as well since there are less concepts to keep in mind

What are the other advantages and disadvantages of these two approaches? (e.g. performance, maintainability)


Solution

  • You are wondering whether to have application logic inside the database or not. This is to a great extent a matter of taste. In the days of yore, the approach to implement application logic in database functions was more popular, but today it is usually frowned upon.

    Extreme positions in this debate are

    • The application is implemented in the database to the extent that the database functions produce the HTML code that is sent to the client.

    • The database is just a dumb collection of tables with no triggers or constraints beyond a primary key, and the application tries to maintain data integrity.

    The best solution is typically somewhere in the middle, but where is largely a matter of taste. You see that this is a typical opinion-based question. However, let me supply some arguments that help you make a decision.

    Points speaking against application logic in the database:

    • It makes it more difficult to port to another database.

    • It is more complicated to develop and debug database functions than client code. For example, you won't have as advanced debugging tools.

    • That database machine has to perform not only the normal database workload, but also the application code workload. But databases are harder to scale than application servers (you can't just spin up a second database to handle part of the workload).

    • PostgreSQL-specific: all database functions run inside a single database transaction, so you cannot implement functionality that requires more complicated transaction management.

    Points speaking for application logic in the database:

    • It becomes easier to port to another application server or client programming language.

    • Less data has to be transferred between client and server, which can make processing more efficient.

    • The software stack becomes shorter and the overall software architecture simpler.

    My personal opinion is that anything that has to do with basic data integrity should be implemented in the database:

    • Have foreign keys and check constraints in the database. The application will of course also respect these rules (no point in triggering a database error), but it is good for data integrity to have a safety net.

    • If you have to keep redundant information in the database, use triggers to make sure that all copies of a datum are kept synchronized. This implicitly makes use of transactional atomicity.

    • Anything that is more complicated is best done in the application. Be wary of database functions that are very long or complicated. Exceptions can be made for performance reasons: perhaps some complicated report could not easily be written in pure SQL, and shipping all the raw data to the client is prohibitively expensive.