Unicode search and replace in Word — decimal and hex
I’ve got a document where the superscripts have been put in using actual superscripted numbers from elsewhere in the Unicode character set, not as Word superscripts. For example, if I highlight one of the characters and hit ‘Alt-x’, I don’t get what I might expect.
Unicode value for ‘6’ is 0036. Whereas highlighting 6 and hitting ‘Alt-x’ gives 2076. 2076 is the hex Unicode value for a raised, little 6. There are special characters for all the digits (and some or maybe all letters, too).
The full set is:
|Digit||Expected (hex)||Expected (dec)||Actual code||Actual (dec)|
Older fonts used to have special characters for superscript 1, 2 and 3 for doing powers and a few footnotes and things on screens that were not WYSIWYG and could not actually raise the character (think a VT100 or similar). At some point the rest of the character set was included, hence the non-contiguous numbering. It’s rather like how real metal typefaces would have to have separately designed superscript characters. And from a design point of view, a number designed to be used in superscript may well look better than a ‘normal’ character raised and shrunk. So I’m not complaining about the existence of the numbers, but I am combing through a document checking to see if the footnotes and references are contiguously numbered, and I can’t search for the cross references/citations, so it’s making the job tedious and error-prone.
It’s relatively easy to get Word to search for a Unicode character. Except these codes are clearly not decimal (00B9, say). Alt-x provides the code in hex (or inserts the character after typing its hex code), but Word searches for it using a decimal Unicode value. Well done Microsoft! The decimal Unicode (and ASCII) value for ‘6’ is 54. If I highlight ‘6’ in a Word doc and press Alt-x to find the code for it I get 0036, but I have to use ^u0054 to search for ‘6’. How stupid is that? (This search works with or without using wildcards.) That is why my table above has decimal values as well (I used HEX2DEC() in Excel).
So now I’m going to search for, say, ^u8313 and replace with superscripted 9. Perhaps this could be more automated, but there are just the 10 possible digits, so it’s easy enough to do 10 replacements.
To use a Unicode code in the ‘Replace with’ box, the simplest thing is to enter the character into the document (or a scratch space), then copy it from the doc into the ‘Replace with’ window; the ^u notation will not work in the replace window. An ungenerous soul would say search and replace in Word is broken by design. I’d never say such a thing. Although see here. Things like this can be automated using Word macros, but that seems like a pretty heavy tool for what should be a routine task. The process would be much simplified and hit the 80:20 rule if the behaviour of Alt-x and search were harmonised, and if the ^u notation could be used in the ‘Replace with’ box.