Search code examples
mysqlsqlsequelpro

How do I get the first 160 characters from a SQL query and remove HTML?


I've looked around to see if I could find a solution that directly addresses my question, but I can't seem to find it.

I'm querying my old DB and creating a new table that I can import into my new DB. I want to create a new field named news_releases_summary that is derived from news_releases_body, but only with the first 160 characters (for SEO meta desc) and ideally no HTML.

My current query looks like this. The function I'm having problems with is LTRIM RTRIM - I'm not even sure it's the right function for this job? * Note that I also need to remove \n\t because the final output will not validate as JSON if I don't do this.

SELECT  
FROM_UNIXTIME(exp_channel_titles.entry_date,'%m/%e/%Y %h:%i %p') AS entry_date,
exp_channel_titles.status, 
exp_channel_titles.title, 
exp_channel_data.field_id_66 AS news_releases_subtitle, 
exp_channel_data.field_id_65 AS news_releases_contact, 
REPLACE(replace(exp_channel_data.field_id_67,char(10),''),char(13),'') AS news_releases_body
LTRIM(RTRIM(REPLACE(replace(exp_channel_data.field_id_67,char(10),''),char(13),''))) AS news_releases_summary
FROM exp_channel_data INNER JOIN exp_channel_titles 
ON exp_channel_data.entry_id = exp_channel_titles.entry_id 
WHERE exp_channel_data.channel_id = '21' 
ORDER BY exp_channel_titles.entry_date

UPDATE

This is where I am right now, but I'm getting an error "FUNCTION ee_2.strip_tags does not exist". Should I be including strip_tags function directly in the query? I'm using Sequel Pro on a Mac.

CREATE FUNCTION `strip_tags`($str text) RETURNS text
BEGIN
    DECLARE $start, $end INT DEFAULT 1;
    LOOP
        SET $start = LOCATE("<", $str, $start);
        IF (!$start) THEN RETURN $str; END IF;
        SET $end = LOCATE(">", $str, $start);
        IF (!$end) THEN SET $end = $start; END IF;
        SET $str = INSERT($str, $start, $end - $start + 1, "");
    END LOOP;
END;

SELECT  
FROM_UNIXTIME(exp_channel_titles.entry_date,'%m/%e/%Y %h:%i %p') AS entry_date,
exp_channel_titles.status, 
exp_channel_titles.title, 
exp_channel_data.field_id_66 AS news_releases_subtitle, 
exp_channel_data.field_id_65 AS news_releases_contact, 
REPLACE(REPLACE(exp_channel_data.field_id_67,char(10),''),char(13),'') AS news_releases_body,
LEFT(strip_tags(REPLACE(REPLACE(exp_channel_data.field_id_67,char(10),''),char(13),'')),160)     AS news_releases_summary 
FROM exp_channel_data INNER JOIN exp_channel_titles 
ON exp_channel_data.entry_id = exp_channel_titles.entry_id 
WHERE exp_channel_data.channel_id = '21' 
ORDER BY exp_channel_titles.entry_date

Solution

  • You should be able to use nested replace calls to strip undesired newline characters; there are questions on stack exchange that deal with this already, though they use the literal \n and \r rather than using the char function.

    For stripping HTML entities there's not an easy way of doing it without rolling your function to do the business. Thankfully, someone has done that for you as well.

    Finally, if you only want 160 characters in the result, use the left function to limit the result to the leftmost 160 characters.