Comparing Word files using diff

Note to self.

When you have to do something 1 or 2 times, a GUI is fine. When you have to repeat it 100 times, a GUI is less fine.

I have a very specific use-case. I want to compare many pairs of Word files. I only want to compare the text, and the changes will be very small. I don’t want to have to manually run the Word compare documents tools on all hundred pairs of files.

First, I converted all to text, using a command like this for each one:

soffice --convert-to "txt:Text (encoded):UTF8" "filename(17Jun20).docx"

I could not use

soffice --convert-to "txt:Text (encoded):UTF8" *.docx

because the filenames have special characters in them. But a script can be made.

The trick is that one of the pair of files was back-converted from PDF, so there are hard returns and soft returns all over the place, so even though the texts look quite similar to the eye, the line breaking and other issues can be quite different. I want to make sure diff, which works line-by-line, won’t produce lots of output just because line-breaking has changed and one doc has long wrapped lines and the other has a return at the end of each visible line, or something. I decided to turn each text file into a single column of words.

In this script, $1 is a 5 digit number given on the command line of the script. It identifies the file. Some of the character codes might render funny in the blog.

# Replace all white space with newline
sed -E -e 's/[[:blank:]]+/\n/g' input-$1-file.txt > $1_f1.txt
# Replace all ^M with newline
sed -i "s/^M/\n/g" $1_f1.txt
# various hyphens; breaking, nonbreaking, en rules, etc; I am not looking for them, so
# Replace all - with newline, etc 
sed -i "s/-/\n/g" $1_f1.txt
sed -i "s/â?"/\n/g" $1_f1.txt
sed -i "s/â?"/\n/g" $1_f1.txt
sed -i "s/A-/\n/g" $1_f1.txt
sed -i "s/â?`/\n/g" $1_f1.txt
sed -i "s/â?O/\n/g" $1_f1.txt
sed -i "s/â??/\n/g" $1_f1.txt
# Remove all empty lines
sed -i '/^\s*$/d' $1_f1.txt

Putting odd characters in sed or tr commands can be done several ways. Things that render OK (modern terminal emulators can cope with en dashes, etc, for example) can be middle-button pasted into the script. Others (things that look like <200b> in Vim, for example) you enter by Ctrl+V u200d (or whatever) (in Vim). That is, Ctrl+V then u then the code that you saw in the angle brackets. ^M is inserted by holding down Ctrl then hitting V then M then releasing Ctrl.

The script turns the file into a single long column, one word/character group per line. The various sed commands just put in line breaks in place of various characters that I want removed from the comparison. This must be tuned to the project in question. I then remove all the empty lines.

I then do the same to the other file in the pair and run diff.

diff $1_?1.txt > $1_diff.txt
paste -d '\t' $1_?1.txt > $1_paste.txt
wc $1_diff.txt

I also paste the two into a single file side-by-side, and word count the diff result. If the wc output is zero, they are identical. If not, I look at the others and isolate the differences. Of the 100-odd files, I was able to eliminate more than half immediately.

That’s a result.

Lynx with OpenSSL (https) on Windows

Grabbed an installer — the one using ‘old ssl’:

https://invisible-island.net/lynx/#http_install

Downloaded:

lynx-oldssl-setup.exe

This oldssl version needs OpenSSL 1.0.2 (LTS), new version needs 1.1.0 (not LTS) and the world has moved on to 1.1.1, which Lynx says is a no-no. Need to find the correct version(s) of SSL — that’s 1.0.2 or 1.1.0. (1.0.2 in my case.)

Lynx page says:

“Old” (version 1.0.2)

You will need these DLLs, either in Lynx‘s directory or in your system32 directory:

  • libeay32.dll

  • ssleay32.dll

OK, I have some various softwares installed on my computer — what if I already have what I need but Lynx is not finding it? (Why would it?)

Went to C:\ in Explorer and searched for ssleay32.dll — lots came up! It’s in places like LibreOffice’s installed folder, GIMP, some antivirus software, mingw tree on Cygwin, some Windows components, and so on. Right click on one and look at the Properties and Details to get the version … Then, I copied it to Downloads. Did the same for the other DLL (libeay.dll, from same folder where I found ssleay.dll, to make sure they were compatible), and when I had 1.0.2X (X = some letter) versions of both, copied them into my Downloads folder, where I could leave them and tell Lynx about them.

OK, now tried to install Lynx, old ssl version:

Right-click on installer and run as administrator, and at the correct point in the install process, point it at the relevant DLLs. Seems to work but then says it could not copy msvcr120.dll. OK, is that where the other DLLs were found? Nope. Find it on my system and copy it to where I put the other DLLs. Run installer again.

OK, that seems to work!

News/NNTP: Yep.

Gopher: Yep

https: yep, SSL seems to be working.

One of the neat things about Lynx is you can jump around from old protocols to new in the same tool. Click on a link in a Usenet post to a gopher page and then jump to an HTTP URL.

OK, so that’s how to get Lynx with SSL to work on Windows. Just search your own machine for the requisite DLLs.

Caveat: I do not know about licensing issues. I would guess that using DLLs from an open source project would be safe. OpenSSL itself is FOSS. I dunno.

As usual, I dunno.

 

Booting Linux, FreeDOS and ReactOS from GRUB2

I have a Compaq E500, an old Pentium III, 32-bit x86-compatible machine just for messing around with. It has USB and FDD and CD and serial and parallel and even TV output. Very flexible. Plus the Ethernet card is Intel E100, so drivers are no problem. It currently runs:

  • SliTaz GNU/Linux (very good on old hardware, and with very small (50 MB) install media). It is kept up to date, but kernel is getting old; more details in a separate post.
  • ReactOS 0.4.13 (needs some drivers at present to really get it to work anywhere near its capabilities).
  • FreeDOS 1.3 RC3.

All I want to discuss here is how I set up GRUB2 to boot them all. Just to record it for my own records and in case anyone else finds it useful. I am not going into each install procedure in detail; this is about the boot setup. Where I say SliTaz, you can insert the name of you preferred distro. I am using SliTaz because it is snappy on 20-year-old hardware. You will probably use something bigger on more modern hardware.

It is helpful to install them in the correct order. FreeDOS will only (AFAIK) install onto the first partition, and will write its bootloader to the MBR, right at the front of the disk. This means if you have already installed GRUB it will get overwritten and you’ll only be able to boot FreeDOS. This is not really a problem, but adds extra steps (see below).

Here is one possible procedure (this assumes we’re using the first HDD, which will be /dev/sda in Linux parlance):

  1. Boot a Linux live disc — I used the SliTaz live CD. Run gparted (a nice and easy-to-use graphical partitioning tool).  Create 4 partitions, with the one at the front of the disk (sda1) for FreeDOS. ReactOS will go on sda2 and SliTaz in sda3. The last one (sda4) is Linux swap. sda1 and sda2 are formatted FAT32, sda3 is an ext file system. I chose an msdos partition table. May be good to mark the bootable flag on the FreeDOS partition. Do not install Linux yet.
  2. Boot the FreeDOS 1.3RC3 live disc and install to sda1. I prefer a minimal install, with just the base system plus a few networking tools, then using FDNPKG (FreeDOS’s answer to apt or yum) to install what I want later. More detail in later posts! It will install its bootloader to the MBR, but we don’t care because we’re booting from discs for now.
  3. Boot the ReactOS disc and choose sda2. Let it install, but write the boot loader to the partition (sda2) not to the MBR. TBH I cannot exactly remember the menu options. The ReactOS bootloader, FreeLoader, can possibly also work as your boot manager, but that’s not what I’m writing about here. Instead, we install it to the front of the partition and load it from GRUB2.
  4. Boot the SliTaz live disc again and install it to sda3 using sda4 as swap. (Details elsewhere). Install the bootloader, GRUB2, and reboot. GRUB will only give SliTaz as a boot option, but that’s ok. Boot SliTaz (or whatever Linux you’re using).
  5. Open a root terminal and set up GRUB2: The main issue is the 40_custom file. Here is one that works for me:
$ cat /etc/grub.d/40_custom 
#!/bin/sh 
exec tail -n +3 $0 
# This file provides an easy way to add custom menu entries. Simply type the 
# menu entries you want to add after this comment. Be careful not to change 
# the 'exec tail' line above. 
# 

menuentry 'ReactOS 0.4.13' { 
  load_video 
  insmod gzio 
  insmod part_msdos 
  insmod fat set 
  root='(hd0,msdos2)' 
# Below is from ReactOS wiki 
  chainloader +1 
  parttool (hd0,2) boot+ 
  multiboot /freeldr.sys 
} 

menuentry 'FreeDOS1.3RC3' { 
  load_video 
  insmod gzio 
  insmod part_msdos 
  insmod fat 
  set root='(hd0,msdos1)' 
  parttool (hd0,1) boot+ 
  chainloader +1 
}

What’s going on?

I don’t know if all the insmod lines are needed, but this works so I am not messing with it. They install drivers (modules) that deal with msdos partitions and fat file systems. I suspect the video and gzio lines are not needed, at least for FreeDOS (FreeDOS entry was mostly copied from the ReactOS one). ReactOS probably needs them; it has a splash screen and all.

The parttool line sets the bootable flag on the specified partition. Don’t know if this is needed if the flag is set (as it was for FreeDOS when I used gparted), but no harm done.

The root line defines the root of the file system in GRUB notation.

GRUB2 finds the Linux install automatically, so that is not in here.

We then update grub (as root):

# grub-mkconfig -o /boot/grub/grub.cfg

and can set the default entry by reading the man page for grub-set-default.

Something else.

Jugurthine war and Conspiracy of Catiline by Sallust

Sallust lived through some tumultuous times for the Roman republic. He was born in the era of the rise of military dictatorship under Marius and Sulla, was a young man during the conspiracy of which he speaks, served with Gaius Julius Caesar and saw the apotheosis of that dictator and its ending with assassination. Having seen that, he decided maybe he ought to leave politics and start writing history. Did not live to see Octavian become the first Princeps, however.

Penguin cover

The book is well-judged. Sallust is mostly accurate (we’re told in the introductions) and though the editor (S. A. Handford) chips in to tell us when he’s not, the notes never overwhelm or make the reader lose their way.

The Jugurthine war took place in north Africa a generation or two after the Romans destroyed Carthage for the last time. The puppet states they set up in the resulting power vacuum came to lack leadership, and a charismatic figure who was a few deaths away from the throne of Numidia decided to self-actualise by killing a couple of cousins. The Romans could not let this stand, and spent 5 or 6 years chasing him around the desert before installing an even more puppety ruler. In a sense, Numidia became an example to other states — don’t mess with our arrangements! If we put a king on the throne, leave him there!

In some ways, the most important aspect of the war was that it gave opportunities to Marius and Sulla, both of whom played crucial roles that led to other opportunities that lead to both of them becoming, effectively, military dictators (Marius first, then his arch-enemy Sulla) and hastening the end of the republican era of Rome.

The Conspiracy of Catiline is briefer and more schematic. It touches on many famous lives: Cato the Younger, Cicero, Caesar, Pompey, Crassus and more. It portrays Catiline as reckless, undisciplined and foolhardy, though brave, fomenting a revolution for private gain, largely in the hope that his many debts would be forgotten when the wealthy aristocracy was overthrown. One never gets the sense that Catiline never had much chance of success, though that could just be the knowledge that he did not succeed.

He reminds me a little of Lenin, who, as has been noted, did not become a dictator to protect the revolution, but made a revolution so he could become dictator. The main difference is that Lenin succeeded.

history

Choose application to open PDF on Cygwin

Note to self.

This might work elsewhere (Linux as well as Cygwin…?)

For some reason, when I wanted to look at LaTeX documentation, PDFs were opening in vim.

For example, if I type:

$ texdoc hyperref

I see the the resulting file in gvim.

I suspect at some point I wanted to edit a PDF directly, and accidentally changed the setting somehow. I can only guess.

So, how to use (say) xpdf?

First, look at the current configuration:

$ cat ~/.config/mimeapps.list 
[Added Associations]
text/html=konqueror.exe.desktop;

[Default Applications]
application/pdf=gvim.desktop
text/html=konqueror.exe.desktop;

Yeah, it says gvim.

OK, I want to verify that I have the .desktop file I need:

$ find /usr/share/ -name "*pdf*desktop" 2> /dev/null 
/usr/share/applications/okularApplication_pdf.desktop
/usr/share/applications/qpdfview.desktop
/usr/share/applications/xpdf.desktop

OK, I have some choices here. There’s also:

$ find /usr/share/ -name "*atril*desktop" 2> /dev/null 
/usr/share/applications/atril.desktop

So what shall I use?

Why not xpdf? Change the mimeapps.list file:

$ cat ~/.config/mimeapps.list 
[Added Associations]
text/html=konqueror.exe.desktop;

[Default Applications]
application/pdf=xpdf.desktop
text/html=konqueror.exe.desktop;

And it works.

Add a dictionary to TeXworks

It is installed, but has those US spellings, which are fine if that’s what you want.

I wanted AU, so:

  1. Went to https://extensions.libreoffice.org/en/extensions/show/english-dictionaries and downloaded the current English bundle.
  2. Right-clicked on it and (after selecting Open With and Choose Another App) opened it using 7zFM (7-zip file manager — any archive manager should do); left the window open
  3. Went to TeXworks and clicked Help and Settings and Resources and then the C:\Users\etc link beside ‘Resources’
  4. That opened a File Manager window
  5. In there, created a ‘dictionaries’ folder
  6. From the 7z dialog, extracted all .dic and .aff files into that new folder
  7. Quit Texworks
  8. Opened TeXworks
  9. Opened my .tex file
  10. Went to Edit > Spelling and selected the dictionary of choice
  11. Went to Edit > Preferences and selected the dictionary of choice
  12. OK!

Not too hard. 

Fixing uneven cell edges in Word tables when saved as PDF

So we have some tables and the cells are shaded and if you look closely the edges of the cells are not straight lines. Oh, they look fine in Word, but when you save as PDF using Word’s almost-broken PDF save tools, you see annoying little artefacts. (We have to save as PDF not print to PDF to preserve hyperlinks and metadata and alt text.)

Here is the top left corner of the table viewed in Word:

Nie smooth sytraight edges to the block of colour
The corner of a table viewed in Word

Here is the same table after Save As PDF and then viewed in a PDF viewer:

Now the colours cells have uneven edges
Viewed in PDF viewer

The red boxes highlight the problem regions. How to fix?

If I format the table in Word, I notice that it uses cell margins.

Open Table Properties and Options and see the cell margins

I find that if I set the top and bottom margins to zero, the problem goes away.

But then the cells are small and cramped.

If I highlight the table and use paragraph formatting with suitable space before and after the paragraph (arrived at by a mix of using the cell margins as a guide and then scaling the space up or down), I fix the problem.

The Word paragraph formatting dialog

Now, there might be some arrangements of text where this is not an exact like-for-like fix — if I have several paragraphs in the same cell, for example — but eyeballing the table as it was with cell margins alongside as it is with paragraph spacing allows for a pretty good imitation the original look. And the uneven borders go away. Choose your poison.

 

Word sucks

JetDirect 170X — attach a network printer on Linux using CUPS

Note to self.

I picked up an old JetDirect 170X (J3258B) print server to make an oldish but perfectly reliable USB/parallel port printer networkable. (It’s a Brother HL‑5340D — but not a DN, where D = duplex and N = network) It’s plugged into my desktop by USB, but it has a parallel port as well and a suitable old network print server like the 170X — plentiful and cheap — can attach to the parallel port, which is currently unused, and allow other household members, and my laptop, to print readily.

The JetDirect 170X as bought off ebay

I am using Debian, but that should not matter. I am using CUPS to manage my printing.

  • Went to the CUPS browser interface: http://localhost:631/printers/
  • Clicked ‘Administration’ and ‘Add’
  • Chose AppSocket/HP JetDirectConnection:

But what is the server IP address? I need the print server details. Clicked through to the help and found that I was needing port 9100, but what was the IP address of the print server? It’s just on the house network, so I used my desktop computer. I just ran nmap without the JetDirect server plugged in, then again after plugging it in, the ran diff on the result. Something like:

$ nmap -v -sn 192.168.0.* > out

then

$ nmap -v -sn 192.168.0.* > in

Or wherever suits your situation, then:

$ diff out in

Showed me that 192.168.0.121 was down and then was up. OK.

Back to CUPS and check share (don’t know if need to).

  • Connection: socket://192.168.0.121:9100

The rest of the dialogue was much as any use of CUPS — add name etc. Turned on sharing, though don’t know if I need to.

Chose printer driver as per usual — though I have a feeling the PPD you download from Brother might be better than  the one that comes with CUPS… not sure.

Print test page; ok.

Done.

wodim on Cygwin

Works a treat. It’s a bit odd, mixing the Windows ‘drive’ notation with the POSIX Cygwin, but it worked!

$ wodim -v speed=4 dev=D: -data ~/haiku-r1beta2-hrev54154_111-x86_64-anyboot.iso

Unpack:

  • wodim -v — the program (-v is increase verbosity so we can see what’s going on)
  • speed=4 — burn it slowly (4x speed — could be bigger)
  • dev=D: — the device with the DVD in it is D:
  • -data — burn a data CD not an audio CD (this is the default, but does no harm)
  • ~/hai....iso — this is the iso image (RC1 Beta 2 for Haiku in this case)

You can find the right burning device simply with:

$ wodim --devices
hm, 0, 1, 0
wodim: Overview of accessible drives (1 found) :
-------------------------------------------------
0 dev='D:' rw-w-- : 'HL-DT-ST' 'DVD+-RW GU90N'
-------------------------------------------------

Note the ‘dev=’ field.

 

Tah-dah!

Zoom conversion issues; thanks

I recorded the Zoom meeting locally. At the end, I shut down the computer and left in a hurry for another appointment. I failed to leave the Zoom meeting gracefully, and so my recording was left on the HDD as a .zoom file in a folder under Documents, not as a nice, compact and readable MP4.

What to do?

The file has a name:

double_click_to_convert_01.zoom

But double-click on it does nothing.

One bit of advice was to find the executable zTscoder.exe and run it on the command line (it lives in your AppData folder):

> cd \path\to\double_click_to_convert_01.zoom
> c:\users\username\appdata\roaming\zoom\bin_00\ztscoder "double_click_to_convert_01.zoom"

But a whole lot of nothing happened. I got the command prompt back immediately, which seemed unlikely given the size of the video file (more than a GB).

Turns out there is a valuable trick for this, and I direct you to this reddit:

https://www.reddit.com/r/Zoom/comments/gemgka/help_converting_zoom_files_to_mp4/

What I did was as follows:

  1. Logged into the Zoom desktop application and started a meeting with just me.
  2. Pressed Record and asked to store the file locally.
  3. Talked nonsense into the webcam for 15 seconds then exited the meeting.
  4. The little file conversion graphic came up. Something like this:
  5. The Zoom video conversion progress-o-meter
  6. When it was done it opened a file explorer window showing an MP4 file and an audio file. Most excellent.
  7. This new folder with the new, successfully converted files in it, was in my Documents\Zoom folder, in a subfolder with the date and meeting name in the title.
  8. Checked that the little MP4 was a valid video (it was), then deleted all the contents of the new, working folder.
  9. Copied the .zoom and .tag file(s) from the older folder (the one in which the conversion did not work) into the now-empty new folder. (I saw no .tmp files, and I have no idea if the .tag file(s) are needed, I just included them because they have the same timestamp as the .zoom file(s).)
  10. Went to the Zoom desktop application and clicked the ‘Meetings’ icon (down the bottom (arrow 1 on this image)).
  11. Buttons to push
  12. Choose the new meeting. In fact, hovered the mouse over the new meeting’s name (arrow 2), and a small menu came up on the right with two options — Open and a button with three dots.
  13. Clicked the three dots (arrow 3).
  14. Saw an option for Conversion, and clicked it.
  15. The conversion began!
  16. And it worked.

I can only guess at why this works, but thanks to https://www.reddit.com/user/Mac_Avoy/!

Zoom up and away!