Search code examples
javahtmldeepl

How to avoid translating df [TEXT] in the html text by Deepl?


I have a lot of data frame such as [COMPANY]in my html text file which I want exclude while Deepl translating my text. I use Deepl Java lib with api and not allowed to change the data frame format.

Any Idea how to exclude df[TEXT] from translation?

Example text:

Dear client,

Please find enclosed [EVENT] for the order you wish to execute for your account [ACCOUNT_NAME_TEXT].


Kind regards,

[COMPANY_NAME]

html file

<!DOCTYPE html>
<html>
    <head>
    </head>
    <body>
        <p>Dear client,</p>
        <p>Please find enclosed the Events for the order you wish to execute for your account [ACCOUNT_NAME_TEXT].</p>
        <p>&#160;</p>
        <p>Kind regards,</p>
        <p>[COMPANY_NAME]</p>
    </body>
</html>

Solution

  • For now, I solved it by parsing my df[TEXT] to ignore tag before translating and setting it back to the original. see the below method, it may help someone with the same request.

      private static final String BEGIN_IGNORE_TAG = "<loveIgnoreTag>";
      private static final String END_IGNORE_TAG = "</loveIgnoreTag>";
    
      public String translate( String source , String target, String text )
                    throws DeepLException, InterruptedException
            {
                //https://www.deepl.com/docs-api/xml/ignored-tags/
                ArrayList<String> ignoreTags = new ArrayList<>( ) ;
                ignoreTags.add( "loveIgnoreTag" );
                
                text = parseToIgnoreTage(text);
                
                TextTranslationOptions translationOptions = new TextTranslationOptions( )
                        .setTagHandling( "xml" )
                        .setFormality( Formality.PreferMore )
                        .setPreserveFormatting( true )
                        .setIgnoreTags( () -> ignoreTags.iterator( )  )
                        .setSentenceSplittingMode( SentenceSplittingMode.All );
        
                TextResult result = translator.translateText( text, source, target, translationOptions );
                String translationResult = parseToDataFrame(result.getText( ));     
                return translationResult;
            }
        
            private String parseToIgnoreTage( String text )
            {
                text = text.replace( "[", BEGIN_IGNORE_TAG ).replace( "]", END_IGNORE_TAG );
                return text;
            }
        
            private String parseToDataFrame( String result )
            {
                result = result.replace(BEGIN_IGNORE_TAG,"[" ).replace(  END_IGNORE_TAG, "]" );
                return result;
            }