I have a script which assembles an xml document via string manipulation (which I wrote before I discovered the XML Suite).
When certain characters are included such as £, –(en-dash) and —(em dash) (I suspect all non-ascii characters), they're replaced with the unicode replacement character �
(U+FFFD)
.
This only happens when there is an xml header at the start of the document: i.e. <?xml
. Making any change at all to this fixes the problem and writes what I would expect to the file. My assumption is that applescript is trying to parse the string as xml, but I want it to pass as a string.
I'm writing in JXA, but have included the Applescript equivalent as I think the issue is with OSA and there are likely more applescript users!
edit: ok, this is more an encoding issue I guess—reading as UTF-8 (which the xml I'm generating should be) results in the replacement character, but Western or Mac Roman display the characters correctly. UTF-8 definitely supports these characters though, so I'm not sure the best way to move forward?
edit 2: Just to be clear: I think what's happening is that the non-ascii characters are being encoded in something other than UTF-8, which is causing my XML output to be invalid. How can I get applescript or JXA to encode non-ascii characters as UTF-8?
Applescript
set dt to path to desktop as text
set filePath to dt & "test1.txt"
writeTextToFile(text1, filePath, true)
-- using the example handler from the Mac Automation Scripting Guide
on writeTextToFile(theText, theFile, overwriteExistingContent)
try
-- Convert the file to a string
set theFile to theFile as string
-- Open the file for writing
set theOpenedFile to open for access file theFile with write permission
-- Clear the file if content should be overwritten
if overwriteExistingContent is true then set eof of theOpenedFile to 0
-- Write the new content to the file
write theText to theOpenedFile starting at eof
-- Close the file
close access theOpenedFile
-- Return a boolean indicating that writing was successful
return true
-- Handle a write error
on error
-- Close the file
try
close access file theFile
end try
-- Return a boolean indicating that writing failed
return false
end try
end writeTextToFile
Javascript for Automation
app.includeStandardAdditions = true
function writeTextToFile(text, file, overwriteExistingContent) {
try {
// Convert the file to a string
var fileString = file.toString()
// Open the file for writing
var openedFile = app.openForAccess(Path(fileString), { writePermission: true })
// Clear the file if content should be overwritten
if (overwriteExistingContent) {
app.setEof(openedFile, { to: 0 })
}
// Write the new content to the file
app.write(text, { to: openedFile, startingAt: app.getEof(openedFile) })
// Close the file
app.closeAccess(openedFile)
// Return a boolean indicating that writing was successful
return true
}
catch(error) {
try {
// Close the file
app.closeAccess(file)
}
catch(error) {
// Report the error is closing failed
console.log(`Couldn't close file: ${error}`)
}
// Return a boolean indicating that writing was successful
return false
}
}
var text = "<?xml £"
var file = Path("Users/benfrearson/Desktop/text.txt")
writeTextToFile (text, file, true)
In AppleScript, you’d use write theText to theFile as «class utf8»
to write UTF8-encoded text. You can’t do that in JXA as there’s no way to write raw AE codes.
I generally recommend against JXA as it’s 1. buggy and crippled, and 2. abandoned. If you like JavaScript in general you’re far better off with Node. For application automation you’re best sticking to AppleScript: while it’s a crappy language and also moribund, at least it speaks Apple events right and has half-decent documentation and community support.
If you must use JXA, the only workaround is to write your UTF8 file via the Cocoa APIs instead. Though generating XML via string-mashing is evil and bug-prone anyway, so you’d probably be as well taking the opportunity to rewrite your code to use a proper XML API. (Again, with Node you’re spoiled for choice and the hardest part will be figuring which NPM libraries are robust and easy to use and which are junk. With AS/JXA, it’s either System Events’ XML Suite, which is slow, or Cocoa’s XML APIs, which are complex.)