Search code examples
mysqllaravelquery-optimizationdatabase-performance

Should I use another column to show whether LONGTEXT contains data?


I have a doubt (probably very basic) when it comes to know if a field has been stored with information in the database.

I'm working with Laravel and MySQL as DB engine.

The situation is as follows:

I have a field in the database that stores information (in this case it is a LONGTEXT field with a large amount of information in it). This field stores information in an automated way (by means of a CRON).

When listing the information related to the records of that table, I need to know if the field in question contains information or not.

At first I had thought of including another field (column) in the same table that tells me if the field is empty or not. Although I consider that this would be a correct way to do it, on the other hand I think that I could save this column by simply checking if the field in question is empty or not. However, I'm not sure if this would be the right way to do it and if this could affect the performance of the application (I don't know how MySQL does exactly this comparison or if it could be optimised by making use of the new field).

I hope I have explained myself correctly. Schematically, the options are:

Option 1:

  • Have a single field (very large amount of information).
  • When obtaining the list with the records, check in the corresponding search if the field in question contains information.

Option 2:

  • Have two fields: one of them contains the information and the other is a boolean that indicates if the first one contains information.
  • When obtaining the list of records, look at the boolean.

The aim of the question is to use good practices as well as to optimise both the search and minimise the impact on the performance of the process.

Thank you very much in advance.


Solution

  • It takes extra work for MySQL to retrieve the contents of a LONGTEXT or any other BLOB / CLOB column. You'll incur that work even if your query says.

    SELECT id FROM tbl WHERE longtext IS NOT NULL   /* slow! */
    

    or

    SELECT id FROM tbl WHERE CHAR_LENGTH(longtext) >= 1   /* slow! */
    

    So, yes, you should also use another column to indicate whether the LONGTEXT column is populated if you need to run a lot of queries like that.

    You could consider using a generated -- virtual --- colum like this for the purpose.

    textlen BIGINT GENERATED ALWAYS AS (CHAR_LENGTH(longtext)) STORED
    

    The generated column will get its value at the time you INSERT or UPDATE the row. Then WHERE textlen >= 1 will be fast. You can even put an index on it.

    Go for the length rather than the Boolean value. It doesn't take significantly more space, it gives you slightly more information, and it gives you a nice way to sanity-check your logic during testing.