Search code examples
ms-wordmergefield

Truncating a mergefield in Microsoft Word for mailmerge


I'm looking to create a formula and apply to a mergefield in Word 2013. I have no access to the db and only given merge fields. My goal is truncate/shorten certain mergefields. E.g. Full Name to just initial, Age [27 years] to short age [27].

Access and excel have the formula 'left' which i've tried to use with no success. There seems to be a low more options available for numbers.

{=left({ MERGEFIELD First_Name },1)}

However this gives a syntax error. Is there a list of formulas that work for mergefield?

Outcome

'Steven' -> 'S'

'27 Years' -> '27'


Solution

  • The short answer to your question is that there is nothing in the Word field language that can reliably do string manipulation such as left(), mid() and so on. The {=} field only has numeric functions such as SUM, ABS, PRODUCT and so on. There are unreliable approaches, and in some cases they may be reliable enough for your requirement, but that really depends on how sure you can be that the data source will always contain values formatted as you expect.

    As a simple example, let's take the "27 years" thing.

    If every value in the relevant data source column is in the same general format, which I will describe as "something that Word recognises as an individual number, followed by an alpha string", then you can in fact use

    { SET dat { MERGEFIELD age } }{ =dat }
    

    Notice that in that case, if you are merging to a new document, { =dat } fields will remain in the output, and updating those fields will cause errors. You can avoid that by nesting either the { =dat } field or the all the fields in a QUOTE field:

    { QUOTE "{ SET dat { MERGEFIELD age } }{ =dat }" }
    
    { SET dat { MERGEFIELD age } }{ QUOTE { =dat } }
    

    However, if your data source field could contain a value such as

    4 years 2 months
    

    then this will not work because in that case { =dat } will evaluate to 6, not 2. Word will also evaluate anything that looks like an { = } field expression, e.g. if your data source contains

    SUM(23,25)
    

    then { =dat } will evaluate to 48. There are further oddities that I will not describe now.

    The simplest unreliable approach to extracting the first letter from a field is to use a large number of IF fields to test for every possible initial letter, e.g.

    { IF "{ MERGEFIELD First_Name }" = "A*" "A" }{ IF "{ MERGEFIELD First_Name }" = "B*" "B" } etc.
    

    If you don't need to distinguish between lower and upper case you can use

    { IF "{ MERGEFIELD First_Name \*Upper }" = "A*" "A" } etc.
    

    That's OK if you know (for example) that the names can only start with A-Z,a-z (and you could obviously test for 0-9 etc. as well. But what if it could start with any Unicode letter? Not sure that inserting thousands of IF fields is a reliable approach.

    There is a similarly "unreliable" - and resource-consuming - way to use functions such as left, mid etc., as long as you are using recent versions of Windows Word (not Mac Word).

    What you can do is create a completely empty Access/Jet database .mdb (let's say it is at c:\i\i.mdb, then insert a DATABASE field nested in a QUOTE field like this

    { QUOTE { DATABASE \d "c:\\i\\i.mdb" \s "SELECT left('{ MERGEFIELD First_Name }',1)" } }
    

    Normally, a DATABASE field inserts a Word table (unless the data source has more columns than a Word table can contain), but when you only insert a single value with no headings, Word does not put the value in a cell. Unfortunately, these days Word does add a paragraph mark, but nesting the DATABASE field inside a QUOTE field appears to remove that again.

    So why is that "unreliable"? Well, the main reason is if the First_Name field contains any quotation marks (certainly single-quotation marks, and OTTOMH I think double quotation marks) then the query that Word sends to Jet will look like this like this

    SELECT left('a name containing a ' mark'),1)
    

    and Jet will return a syntax error.

    There are other problems with the DATABASE field approach, including

    • Word restricts the SELECT statement to 255 characters (I think). If your data source filed causes the SLEECT statement length to exceed that, Jet will return an error.
    • You have to put the database somewhere. If you are just using this merge yourself, that may not be a problem, but if you have to distribute the Word document etc. for others to use, you also have to ensure they have the .mdb and that it's at the specified location.
    • Word sometimes gets confused between a Mail Merge data source and a data source introduced via a DATABASE field.
    • Even one DATABASE field will execute a query for every record in the data source. If you use this technique in several places, a very large number of queries will be issued. That could cause problems.

    As far as "single letter extraction" is concerned, there is another approach, rather similar to the DATABASE one, that uses an external .XML file and a set of INCLUDETEXT fields to specify a node in the file and return its content. But there are also similar difficulties. I may modify this Answer to describe that approach at some point, but as far as I know it has never been used in a real-world scenario.

    So what if you need something more reliable? Well, there are several approaches, but all of them suffer from shortcomings of one kind or another. The main approaches I know are:

    1. use Word VBA and the OpenDataSource method to open the data source. That allows you to specify a query in the SQL dialect understood by the data source.
    2. use a Query/View defined in an intermediate database to extract the data items you need, and use that Query/View as your data source
    3. Use Word VBA's MailMerge Events to manipulate the data for each record in the data source as Word processes the mailmerge
    4. use a manual intermediate step
    5. (more drastic) ditch Word MailMerge and find another approach altogether, e.g. create a .docx using .NET, the relevant database provider, and the Office Open XML SDK

    If you are creating this merge for use by other people, two side-effects of all those approaches is that the overall process becomes more complicated or unfamiliar for the user, and in particular, they may not be able to use Word's facilities for data source record filtering and so on. Another issue that some people encounter is that if your database contains long text fields/memo fields longer than 255 characters, they have a tendency to be truncated by Jet whenever you do something much more complicated than the default "SELECT * FROM TABLE"

    (1) requires that you can write a suitable query to get the columns you need from your data source. Because the query is executed using OLE DB you don't actually need to create any permanent objects in your database. So it may be a viable approach as long as the backend database allows you to execute external queries. But Word also imposes a 255 or 511 character limit on the query, so if you have to manipulate a lot of fields or the functions you need are complicated, you may find that you exceed the character limit quite quickly.

    (2) is rather similar to (1) but may allow you to specify a much more complex query. For example, if your data source is a Jet .accdb, you may be able to create your own .accdb and define a query in that that accesses the tables in the .accdb that you are not allowed to modify. You might either used "linked tables" to achieve that, or in certain cases you can specify the locations of the underlying tables/queries in the SQL.

    (3) means that you use VBA to intercept Word as it processes each data source record. I leave you to research that. You have to control the process from VBA to ensure that the MailMerge events are invoked. There have been reports of various unreliabilities. VBA can only access the first 255 characters of any memo fields.

    (4), e.g. you create an Excel workbook and use it to query the database. In that case you may be able to issue a much longer SQL query than you can in Word, and you may be able to create new Excel columns that manipulate the data using excel formulas. (I have never tried that, though). Then use that as your data source.

    Finally, a web search should reveal a list of functions recognised by Word's "=" field, but recent Microsoft documentation tends to omit the IF() function. The ISO29500 documents on the .docx standard omit it as well, but I think that was not the intention and may be fixed in a future version of the standard. The functions are:

    ABS, AND, AVERAGE, COUNT, DEFINED, FALSE, IF, INT, MIN, MAX, MOD, NOT, OR, PRODUCT, ROUND, SUM, TRUE.