Archive | editing

# Unicode search and replace in Word — decimal and hex

I’ve got a document where the superscripts have been put in using actual superscripted numbers from elsewhere in the Unicode character set, not as Word superscripts. For example, if I highlight one of the characters and hit ‘Alt-x’, I don’t get what I might expect.

Unicode value for ‘6’ is 0036. Whereas highlighting 6 and hitting ‘Alt-x’ gives 2076. 2076 is the hex Unicode value for a raised, little 6. There are special characters for all the digits (and some or maybe all letters, too).

The full set is:

 Digit Expected (hex) Expected (dec) Actual code Actual (dec) 0 30 48 2070 8304 1 31 49 00B9 185 2 32 50 00B2 178 3 33 51 00B3 179 4 34 52 2074 8308 5 35 53 2075 8309 6 36 54 2076 8310 7 37 55 2077 8311 8 38 56 2078 8312 9 39 57 2079 8313

Older fonts used to have special characters for superscript 1, 2 and 3 for doing powers and a few footnotes and things on screens that were not WYSIWYG and could not actually raise the character (think a VT100 or similar). At some point the rest of the character set was included,  hence the non-contiguous numbering. It’s rather like how real metal typefaces would have to have separately designed superscript characters. And from a design point of view, a number designed to be used in superscript may well look better than a ‘normal’ character raised and shrunk. So I’m not complaining about the existence of the numbers, but I am combing through a document checking to see if the footnotes and references are contiguously numbered, and I can’t search for the cross references/citations, so it’s making the job tedious and error-prone.

It’s relatively easy to get Word to search for a Unicode character. Except these codes are clearly not decimal (00B9, say). Alt-x provides the code in hex (or inserts the character after typing its hex code), but Word searches for it using a decimal Unicode value. Well done Microsoft! The decimal Unicode (and ASCII) value for ‘6’ is 54. If I highlight ‘6’ in a Word doc and press Alt-x to find the code for it I get 0036, but I have to use ^u0054 to search for ‘6’. How stupid is that? (This search works with or without using wildcards.) That is why my table above has decimal values as well (I used HEX2DEC() in Excel).

So now I’m going to search for, say, ^u8313 and replace with superscripted 9. Perhaps this could be more automated, but there are just the 10 possible digits, so it’s easy enough to do 10 replacements.

Press Ctrl-H to bring up this dialogue.

To use a Unicode code in the ‘Replace with’ box, the simplest thing is to enter the character into the document (or a scratch space), then copy it from the doc into the ‘Replace with’ window; the ^u notation will not work in the replace window. An ungenerous soul would say search and replace in Word is broken by design. I’d never say such a thing. Although see here. Things like this can be automated using Word macros, but that seems like a pretty heavy tool for what should be a routine task. The process would be much simplified and hit the 80:20 rule if the behaviour of Alt-x and search were harmonised, and if the ^u notation could be used in the ‘Replace with’ box.

Just my 2¢

# Simple use of Word regular expressions — remove commas from numbers

Note to self:

Our house style says we should us nonbreaking spaces in numbers of five or more digits, not commas. That is, a number like 13,456 should be 13 456.

If I get a document with lots of commas in, this is what I do in Word:

Wildcard replace
(1) Turn off track changes (sad but true – long time bug Mico$oft have no interest in fixing). (2) Search for numbers of the form XX,XXX. The regular expression for this is: ([0-9]{2}),([0-9]{3}) (3) Replace with first half nonbreaking space second half – that is \1^s\2 (4) Step through in case of accidental fits to the pattern. (5) Turn track changes back on if appropriate. Yes — cannot track changes while doing this. So what’s going on? The component of the search that matches the first set of parentheses is stored in variable 1, the second in variable 2. These values are got at by prepending a backslash — it’s a bit like the$ sign in a bash shell, if that helps, or % on the Windows/DOS command line — it returns the value of the symbol. With ‘Use wildcards’ checked in the Find and Replace box then [0-9] finds any digit. [0-9]{2} finds exactly two digits in a row. The lot in parentheses makes sure the matching fields are stored in variable 1. Then I want to match a comma, but its not in parentheses so I’m not storing it. Then I want to match exactly three digits in a row and put that into variable 2.

Ctrl-H brings up this dialogue box in Word. Not what’s checked.

OK. Now, I want to replace that with the value of variable 1 (\1) then a nonbreaking space (^s) then the value of variable 2 (\2). And that’s it.

Wordy mac wordface.

# Manipulate multiple selections in Word — GUI and macro

Note to self: Well, it’s not flexible, but it’s useful. Styling documents in Word — that is, making sure they are rigorously styled rather than formatted in an ad hoc way — is a pain.

I wanted to search for a specific bit of text and format it in a specific way. Problem; something was embedded in the Word doc such that highlighting the text and selecting the style did not work; I had to select ‘Clear All’ in the styles pane before I could apply the style I wanted.

Now, can you make multiple selections via the Search tool and then process them all at once? Yes.

(1) Ctrl-H to open search tool, and just select find. Here I am going to convert all ‘the’ to Heading 3.

(2) Type ‘the’ in ‘Find what:’ and under ‘Find in’ select ‘Main Document’ (this is the core of the trick — for whatever reason).

(3) When you click on ‘Main Document’ you’ll get a message telling you how many instances were found and they’ll be highlighted on the screen.

(4) Press ‘Esc’.

(5) You can now operate on all the selections at once. In this case, I’ve converted them all to Heading 3 styles, but it could be anything else you like.

Here is a macro that will do the same:

Sub TestSearch()
'
' TestSearch Macro
' Search for some text and process it when it is found, repeat untill all instances done.
'
Selection.Find.ClearFormatting
With Selection.Find
.Text = "the"
.Replacement.Text = ""
.Forward = True
.Wrap = wdFindContinue
.Format = False
.MatchCase = False
.MatchWholeWord = False
.MatchWildcards = False
.MatchSoundsLike = False
.MatchAllWordForms = False
End With
Selection.Find.Execute
While Selection.Find.Found
Selection.ClearFormatting
Selection.Find.Execute
Wend
End Sub


FWIW

# Give me a moment

The Macquarie dictionary is excellent. But it reports what people do do not what experts think they should do. This is good for linguists and scrabble players, bad if you want a simple and strong rule to follow when editing something.

The dictionary has a strong bias towards being descriptive, whereas many users come from a more prescriptive angle. So, for example, if enough people use (say)  ‘fortuitous’ to mean ‘lucky’ when it ‘ought’ to mean ‘accidental’, then that is one common understanding of the word so that is effectively one of its meanings so now it does mean ‘lucky’, even though we already have ‘fortunate’.

What that means is that editors don’t use Macquarie blindly. Since it mirrors usage, it is fairly inconsistent (especially in things like whether to hyphenate prefixes and compound words). So we use consistent rules for a lot of things and don’t slavishly follow the dictionary.

Here you go:

What does momentarily mean?

1. in a moment (I’ll be there momentarily)
2. for a moment (I’ll only be there momentarily)
3. at any moment (my arrival could occur momentarily)
4. every moment, or moment to moment (the noise is increasing momentarily)

Well…

The first is considered a US usage, but all four are in Macquarie.

Oxford Australia says ‘for a moment’, plus notes ‘in a moment’ (‘instantly’) as a US usage.

Webster (a US dictionary) says ‘for a moment’ and ‘instantly’ on equal footing.

Frankly, I think the word should be banned.

Just say ‘soon’, ‘in a moment’, ‘for a moment’ etc, and never use the damned word.

That’s all from the bunker.

# Weird Word behaviour part N+1

Styles in tables act funny. Here is a table (with the few codes that Word lets me see visible):

The bottom of the third data row shows an empty line that I don’t want. Here is what happens when I delete that line:

Huh! The line gets changed to a different size and the bullet point goes away —  and I cannot get it back! I cannot insert a bullet using the bullet menu on the toolbar/ribbon/whatever it’s called this week, but I can insert one using shortcut key combination (Ctrl+Shift+L). But the formatting is wrong — and when I make it right, the bullet goes away!

But that is a hint — the problem probably lies in an interaction between some aspect of the style the document specifies for bullets in tables and Word’s bulleting code. So, what I did was copy the second (working) bullet point, so the list had two bullet points the same. Then I cut the text of the third point (the one I wanted) and pasted it into a brand new document as plain text. I deleted everything that was not text that I wanted, then pasted that back into the third bullet point in the table,  as plain text (which means it takes the formatting around it) so I had a third bullet that looked like:

• theirspending on furniture broken down by number of legs and fabric typecosts

Then I was able to delete the unwanted line, then delete ‘their’ and ‘costs’ to leave what I wanted. It’s all about getting rid of Word’s hidden codes that don’t do what they should. Now I’ve got:

which is what I wanted.

# Word bugs: Search and replace using wildcards

Wildcard search and replace interacts badly with track changes. Even very simple examples.

Very simple — I want to find instances like 7,9 and replace with 7 9 where the space is non-breaking (nbsp). The dialog looks like this:

So this is supposed to find a 3 character pattern, digit comma digit, and replace it with digit nbsp digit. It works with track changes off, but with track changes on, it fails.

For example, it should take ‘2,3’ and replace it with ‘2 3’ but instead it gives ‘23 ’ (that is, the second and third digits are transposed).

Word disappoints again.

# $$math$$ versus $math$ in LaTeX

Let’s say you want to display some maths in LaTeX. You don’t want to use equation and related environments.

Inline maths is enclosed between dollar signs, and is formatted as part of the sentence.

A displayed equation is enclosed between pairs of dollars or between $and$. But $$…$$ and $…$ are not the same. For one thing, $…$ responds to the documentclass option fleqn to align the equations to the left rather than center them. Thus, we get this:

Where ‘Inline’ is indented because it starts a paragraph. First two displayed equations use $$…$$ and are centred, the last uses $…$ and is indented by a length called mathindent.

Here is the LaTeX code

\documentclass[fleqn]{article}
\begin{document}
Inline maths: a simple equation might be $x+y=1$.
The solutions to the quadratic equation
$$ax^2 + bx + c = 0$$
are given by
$$x=\frac{-b \pm \sqrt{b^2-4ac}}{2a},$$
which is to say
$x=\frac{-b \pm \sqrt{b^2-4ac}}{2a}.$
\end{document}


Notice the argument ‘fleqn’ has been given to the article class, but $$…$$ did not obey it. $…$ did, but not all the way to the left. If you want the left aligned maths to be hard left, you need to redefine mathindent to be zero length:

\setlength{\mathindent}{0pt}


and this gives

Anyway.

# Comparing PDFs; diff-pdf, pdftotext, diff…

Command line tool diff-pdf at https://vslavik.github.io/diff-pdf/ is a very handy tool. It basically superimposes two PDFs to make the differences show up. It’s not a textual comparison as such. Here is the command line:

$diff-pdf --view test.pdf test_mod.pdf  The black bits are common to both files, the red is in one version and the cyan in another. If the differences result in new lines being inserted, the whole page turns blue/red, since the lines don’t match up any longer: So it is a very good way of isolating minor changes (say, between consecutive proofs of the same document) and checking if two files are actually identical (though conventional diff can indicate whether two binary files differ or not). It’s less good for comparing and decided which version is ‘better’ since the result can look a bit messy. Using conventional diff: $ diff test.pdf test_mod.pdf
Binary files test.pdf and test_mod.pdf differ


Use of pdftotext on both PDFs, then using conventional diff is pretty useful, too. Here is the output from just such a test:

$pdftotext.exe test.pdf$ pdftotext.exe test_mod.pdf
$diff test.txt test_mod.txt 1,3c1,5 < This is a very basic look at using METAFONT and gnuplot to make figures < for use in LATEX. I am using Linux, but the same process ought to work for < other LATEX environments; indeed, that ought to be one of its strengths. --- > This is an extremely basic look at using METAFONT and gnuplot to make > figures for use in LATEX. I am using Cygwin, but the same process ought to work > for other LATEX environments; indeed, that ought to be one of its strengths. > Here is some text added to make the line wrap and offset compared to the other > document. 36c38 < see. --- > see. Also, this document is really just to show use of diff-pdf. 42d43 < 5. If you like at this point you can try: 45c46,47 <$ gftodvi test2.600gf --- >
5. If you like at this point you can try:
> $gftodvi test2.600gf 78a81,84 1 > 2 > > Figure 1: Here is my pointless plot. 83,85d88 < 2 < < Figure 1: Here is my pointless plot.  No conclusion. Just noting that these tools are handy. Ende. # Plotly on cygwin; the absolute basics This has to be about the absolute basics. I don’t know anything else. Plotly (https://plot.ly/) is an interactive, online graphing tool. It can be called from JavaScript, Python, whatever. This post is about getting it to work through Python on Cygwin. This all mostly follows instructions on the Plotly website. (1) Installed via pip. What’s pip? A Python package manager. I ran the Cygwin setup.exe program and made sure that Puython was installed (in my case it was 2.7) and then installed the matching pip2.7 (Cygwin package python2-pip). So installed that and all its dependencies. (2) Opened a Cygwin terminal (not an X terminal, just mintty) and typed: $ pip install plotly


and watched some magic occur.

(3) Went to the Plotly website and created my (free) account. Went to my account settings and selected ‘API keys’. Could not see key — just looked like a row of dots! But hitting ‘Regenerate key’ gave me a new, visible one. Copied that text and noted my username.

(4) In Cygwin, (note, $is the Cygwin prompt, >>> is the python prompt) typed: $ python
Python 2.7.13 (default, Mar 13 2017, 20:56:15)
[GCC 5.4.0] on Cygwin

>>> import plotly

quit()


This set up the info needed for the local Plotly/Python installation to talk to the website where the graph will appear.

(5) Checked that this had worked out. Back at Cygwin prompt, in home directory, typed:

$cat .plotly/.credentials { "username": "DarrenG2", "stream_ids": [], "api_key": "<<your key here>>", "proxy_username": "", "proxy_password": "" }  (6) OK, looked good. Now, tested it by grabbing an example from the Plotly website. Created a file ‘plotly_example.py’ and pasted in some text copied from the website: import plotly.plotly as py from plotly.graph_objs import * trace0 = Scatter( x=[1, 2, 3, 4], y=[10, 15, 13, 17] ) trace1 = Scatter( x=[1, 2, 3, 4], y=[16, 5, 11, 9] ) data = Data([trace0, trace1]) py.plot(data, filename = 'basic-line')  (7) Then saved and ran the script $ python2.7.exe plotly_example.py
High five! You successfuly sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~DarrenG2/0 or inside your plot.ly account where it is named 'basic-line'


Looked good, though they’ve spelled ‘successfully’ unsuccessfully.

(8) But where was the graph? Well, I was working in a basic terminal window. It sent the graph to the web, but then tried to open it using the default links, the text browser. So all I got was a blank screen (typed ‘q’ to quit links). There are a couple of options to see the graph — one is just to paste the given URL into Edge, Chrome, Firefox. Another is to tell Cygwin to look elsewhere for its browser…

(9) Edited my .bash_profile file in my Cygwin home directory and added these two lines:

$BROWSER=/cygdrive/c/Users/darren/AppData/Local/Mozilla\ Firefox/firefox.exe$ export BROWSER


This set up the environment variable BROWSER and pointed it at the firefox.exe file (non-admin install, so in an unusual place). I also ran these two lines in the terminal window to save me closing and reopening it.

(10) Repeated step (7) and — lo and behold! — a Firefox window popped up and the graph was in it!

Plotly graph in Firefox, after running the script in Cygwin.

Now, mastery of Plotly and Python is a much bigger project, but at least this offers the beginnings. Note also that the graph can be edited interactively within the webpage where it appears.

Plots away!

# I don’t exactly love Microsoft Word

Man I hate the revision pain pane. When Word does this weird fail (as shown below) it’s not always while the pane is open, but far too often it is. And this document is a paltry 7000 words — hardly a vast epic. The program really should be able to cope.

Word is not responding. Yes, that’s the least of it.

So: How do I stop the revision pane from ever opening? I know if you show markup (simple or all) the pane pops up less often — but it still pops up sometimes. I’d like to permanently disable it, root it out from the program and never ever see it again under any circumstances.

Any thoughts?

Word word word.