Unicode search and replace in Word — decimal and hex

I’ve got a document where the superscripts have been put in using actual superscripted numbers from elsewhere in the Unicode character set, not as Word superscripts. For example, if I highlight one of the characters and hit ‘Alt-x’, I don’t get what I might expect.

Unicode value for ‘6’ is 0036. Whereas highlighting 6 and hitting ‘Alt-x’ gives 2076. 2076 is the hex Unicode value for a raised, little 6. There are special characters for all the digits (and some or maybe all letters, too).

The full set is:

Digit Expected (hex) Expected (dec) Actual code Actual (dec)
0 30 48 2070 8304
1 31 49 00B9 185
2 32 50 00B2 178
3 33 51 00B3 179
4 34 52 2074 8308
5 35 53 2075 8309
6 36 54 2076 8310
7 37 55 2077 8311
8 38 56 2078 8312
9 39 57 2079 8313

Older fonts used to have special characters for superscript 1, 2 and 3 for doing powers and a few footnotes and things on screens that were not WYSIWYG and could not actually raise the character (think a VT100 or similar). At some point the rest of the character set was included,  hence the non-contiguous numbering. It’s rather like how real metal typefaces would have to have separately designed superscript characters. And from a design point of view, a number designed to be used in superscript may well look better than a ‘normal’ character raised and shrunk. So I’m not complaining about the existence of the numbers, but I am combing through a document checking to see if the footnotes and references are contiguously numbered, and I can’t search for the cross references/citations, so it’s making the job tedious and error-prone.

It’s relatively easy to get Word to search for a Unicode character. Except these codes are clearly not decimal (00B9, say). Alt-x provides the code in hex (or inserts the character after typing its hex code), but Word searches for it using a decimal Unicode value. Well done Microsoft! The decimal Unicode (and ASCII) value for ‘6’ is 54. If I highlight ‘6’ in a Word doc and press Alt-x to find the code for it I get 0036, but I have to use ^u0054 to search for ‘6’. How stupid is that? (This search works with or without using wildcards.) That is why my table above has decimal values as well (I used HEX2DEC() in Excel).

So now I’m going to search for, say, ^u8313 and replace with superscripted 9. Perhaps this could be more automated, but there are just the 10 possible digits, so it’s easy enough to do 10 replacements.

Windows search and replace dialogue, searching for Unicode decimal code and replacing with formatted text.

Press Ctrl-H to bring up this dialogue.

To use a Unicode code in the ‘Replace with’ box, the simplest thing is to enter the character into the document (or a scratch space), then copy it from the doc into the ‘Replace with’ window; the ^u notation will not work in the replace window. An ungenerous soul would say search and replace in Word is broken by design. I’d never say such a thing. Although see here. Things like this can be automated using Word macros, but that seems like a pretty heavy tool for what should be a routine task. The process would be much simplified and hit the 80:20 rule if the behaviour of Alt-x and search were harmonised, and if the ^u notation could be used in the ‘Replace with’ box.

 

Just my 2¢

Tags: , , , , , , , ,

About Darren

I'm a scientist by training, based in Australia.

2 responses to “Unicode search and replace in Word — decimal and hex”

  1. Cole G Frankel says :

    awesome, super interesting. Thank you!! I love using Word F&R, and I’ve just started to scratch the surface on character codes. All started when I saw someone use ^013 to stand in for paragraph marks in wildcard mode…

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: