Search code examples
mysqltextsummarization

How to select part of a text on mysql?


I have a column saved as LONGTEXT on mysql. This text saves rich text. I'm currently reading all the text then fixing it using javascript to get the first 100 characters in a way not to split the word in its middle.

Yet this way don't seem the best way to do it. I want to select a summary directly using the query, yet I also want to be careful not to include characters that are html tags.

The function below seems working fine to strip the html.

SET GLOBAL log_bin_trust_function_creators=1;
DROP FUNCTION IF EXISTS fnStripTags;
DELIMITER |
CREATE FUNCTION fnStripTags( Dirty varchar(4000) )
RETURNS varchar(4000)
DETERMINISTIC 
BEGIN
  DECLARE iStart, iEnd, iLength int;
  WHILE Locate( '<', Dirty ) > 0 And Locate( '>', Dirty, Locate( '<', Dirty )) > 0 DO
    BEGIN
      SET iStart = Locate( '<', Dirty ), iEnd = Locate( '>', Dirty, Locate('<', Dirty ));
      SET iLength = ( iEnd - iStart) + 1;
      IF iLength > 0 THEN
        BEGIN
          SET Dirty = Insert( Dirty, iStart, iLength, '');
        END;
      END IF;
    END;
  END WHILE;
  RETURN Dirty;
END;
|
DELIMITER ; 

Solution

  • part of solution is to select text stripped use

    This is the mysql function like php function strip_tags

     DROP FUNCTION IF EXISTS htmlStrip;
    CREATE FUNCTION htmlStrip(pmXml longtext)RETURNS longtext
    DETERMINISTIC
    htmlStrip:
    BEGIN 
            DECLARE vStart INTEGER ;
        DECLARE vEnd INTEGER ;
        DECLARE vResult LONGTEXT;
        DECLARE vCount1 INTEGER;
        DECLARE vCount2 INTEGER;
    
        SET vResult:=pmXml;
        SET vCount1:=LENGTH(vResult)-LENGTH(REPLACE(vResult,'<',''));
        SET vCount2:=LENGTH(vResult)-LENGTH(REPLACE(vResult,'>',''));
        IF vCount1<>vCount2 THEN 
                  RETURN 'Input Error'; 
        END IF;
    
        WHILE (LOCATE('<',vResult) OR LOCATE('>',vResult)) DO
             SET vStart:=LOCATE('<',vResult);
             SET vEnd:=LOCATE('>',vResult);
             SET vResult:=REPLACE(vResult,SUBSTRING(vResult,vStart,vEnd-vStart+1),'');
        END WHILE;
        RETURN vResult;
    END;
    
        SELECT htmlStrip('<html>hello<body> how r u?</body></html>') AS Result
    
    Result
    --------
    hello how r u?
    

    so you need to use substring + strip_tags