crgrep is a very useful utility that can be used to search through directories full of Word files, Excel files, PDFs and so on.
It is a little bit manual to install, but actually very easy, and ell worthwhile once it is working. Regrettably, it is not maintained.
Download from https://sourceforge.net/projects/crgrep/
Extract the folder to (in my case) c:\users\username\installs
, so now I have a folder called
C:\Users\username\installs\crgrep-1.0.5
Read the bloody instructions in the install instructions file:
C:\Users\username\installs\crgrep-1.0.5\INSTALL.txt
Add bin folder to path; to do this, type ‘env’ in Windows search box, then choose Edit environment variables for your account then select Path, click Edit and then New then put “C:\Users\username\installs\crgrep-1.0.5\bin” in a new line and say ok.
Now I need to find Java. I have ImageJ installed. crgrep specifies Java 8 (jdk1.8.0_xx). I have installed ImageJ using the file ImageJ bundled with 64-bit Java 1.8.0_172, and I put it in my local installs directory, so I have the java.exe file in:
C:\Users\username\installs\ImageJ\jre\bin
I can run it to see what the version is:
C:\Users\username\installs\ImageJ\jre\bin>java -version java version "1.8.0_172" Java(TM) SE Runtime Environment (build 1.8.0_172-b11) Java HotSpot(TM) 64-Bit Server VM (build 25.172-b11, mixed mode)
Looks perfect!
Add this to the environment — a bit like editing the path (above), but create a new environment variable:
JAVA_HOME=C:\Users\goossens\installs\ImageJ\jre
Do some tests
c:\> crgrep -help
OK, now for example, I want to search for ‘Fred Smith’ in all DOCX files in a tree of files starting at the current directory. I only want to look in files that have ‘final’ in the title, and I want to find occurrences that have a single character between Fred and Smith, but it may not be a space.
c:\etc\etc\etc> crgrep -r --colour=always "Fred?Smith" "**\*final*.docx"
What’s going on?
crgrep — the command
-r — recursively search subdirectories
–colour=always — turn occurrences of the desired pattern red in the output
“Fred?Smith” — the search pattern; ? means any one character, so this will find ‘Fred Smith’, ‘Fred-Smith’, ‘FredQSmith’ and so on. Note it is enclosed in quotes
“**\*final*.docx” — the pattern for the files to look in; ** means ‘dig through all subdirectories’, and *final*.docx means find all files that have file names of the form <some text>final< some text>.docx. Note it is enclosed in quotes