I have an autohotkey script which looks up a word in a bilingual dictionary when I double click any word on a webpage. If I click on something like "l'homme" the l' is copied into the clipboard as well as the homme. I want the autohotkey script to strip out everything up to and including the apostrophe.
I can't get autohotkey to match the apostrophe. Below is a sample script which prints out the ascii values of the first four characters. If I double click "l'homme" on this page, it prints out: 108,8217,104,111. The second character is clearly not the ascii code for an apostrophe. I think it's most probably something to do with the HTML representation of an apostrophe, but I haven't been able to get to the bottom of it. I've tried using autohotkey's transform, HTML function without any luck.
I've tried both the Unicode and non-Unicode versions of autohotkey. I've saved the script in UTF-8.
#Persistent
return
OnClipboardChange:
;debugging info:
c1 := Asc(SubStr(clipboard,1,1))
c2 := Asc(SubStr(clipboard,2,1))
c3 := Asc(SubStr(clipboard,3,1))
c4 := Asc(SubStr(clipboard,4,1))
Msgbox 0,info, char1: %c1% `nchar2: %c2% `nchar3: %c3% `nchar4: %c4%
;the line below is what I want to use, but it doesn't find a match
stripToApostrophe:= RegExReplace(clipboard,".*’")
There is the standard quote '
and there is the "curling" quote ’
.
Your regex might have to be
.*['’]
to cover both cases.
Maybe you'd like to make it non-greedy, too, if a word can have more than one apostrophe and you only want to remove the first:
.*?['’]
EDIT:
Interesting. I tried this:
w1 := "l’homme"
w2 := "l'homme"
c1 := Asc(SubStr(w1,2,1))
c2 := Asc(SubStr(w2,2,1))
v1 := RegExReplace(w1, ".*?['’]")
v2 := RegExReplace(w2, ".*?['’]")
MsgBox 0,info, %c1% - %c2% - %v1% - %v2%
return
And got back 146 - 39 - homme - homme
. I'm editing from Notepad. Is it possible that our regex, while we think we're typing 8217, actually has 146 upon our pasting?
EDIT:
Apparently unicode support was added only for AutoHotkey_L. Using it, I believe the correct regex should be either
".*?[\x{0027}\x{0092}\x{2019}]"
or
".*?(" Chr(0x0027) "|" Chr(0x0092) "|" Chr(0x2019) ")"