Home Page

 


EARLIER FEATURES

 


FEATURES CONTENTS

 


LATER FEATURES

 

Features Contents


26th April 2015

UNIVERSAL SEARCH AND REPLACE

Brian Grainger

email.gif (183 bytes)
brianATgrainger1.freeserve.co.uk


 

Most of us are familiar with using a search and replace tool in a word processor. Suppose you are writing an article about Microsoft Word. While you are typing your article you reference Word, rather than Microsoft Word, to save typing. The final printed copy should really relate to Microsoft Word so, via the search and replace tool, you change all instances of 'Word' to 'Microsoft Word'. This is an example of a global search and replace where we replace all instances of the search text with the replace text. Now, what happens if the search text spans multiple paragraphs? I know of no way of using the Microsoft Word search and replace facility to do this. Taking it a step further, what if we want to replace the same search text in more than one file? This is what I call 'universal search and replace'.

This article will also discuss a new tool I have found for Windows, MobaXterm, which allows manipulation of Windows systems using the tools from Linux!

Background

Typing single articles is not going to have much need of a universal search and replace facility so why have I found the need for it? If I have to look for a software manual I usually find it is online only and it consists of a multitude of html and picture files. I like to be able to consult the manual offline so I use a tool (HTTrack) to download all the files in one go. Hopefully, the resulting copy will work without need for change but this does not always happen. I may have to tweak the downloaded files and maybe, because of the multi-file nature, I have to make the same tweak to many files.

Network Administrators may also have a need for a universal search and replace facility. When administering many network users they may wish to make the same profile changes, for example, to every user on the network. In a company of hundreds of users one could not simply edit each profile individually.

Example

Here is the real life example that gave rise to this article. I had a number of .htm files (>30) that had within the code the following lines:

<script language="JavaScript1.1">
if (navigator.appName == "Netscape")
document.write("<td bgcolor='#AAAAAA' width='1'>&nbsp;</td>");
</script>

Within each file this block was repeated nearly 40 times! It did nothing except waste space and cause Internet Explorer on Windows 7 to generate a warning. I wanted all the blocks removed while keeping the rest of the file intact but, because the text spreads over 4 lines (paragraphs in Microsoft Word), I could not use the word processor to remove them.

Linux Tools

Having used Linux for a number of years I have dabbled a bit with the Bash shell and auxiliary tools. I knew the grep, (used to search), and sed, (used to edit), tools were probably what I needed. You may wonder why I am trying to use Linux tools when I am running on Windows. The simple answer is I do not know of any Windows tools to do the job, although I suspect the inbuilt PowerShell facility would do the job if I knew how to use it. Another reason may be that I had MobaXterm. I first read of this tool from Dr. Chris Brown's Administeria column in Linux Format. In a few lines it mentioned that MobaXterm provided a Linux-like command line experience on Windows and a Linux-like view of the Windows file system. Although the main purpose of MobaXterm is to provide remote access tools for managing Linux machines from Windows, the prospect of managing Windows using Linux commands was enough for me to get a copy. It is a portable program so does not have to be installed. It has been sitting on my PC waiting to be used for a little while now.

Analysis of the problem and construction of the solution

Having decided on using Linux tools I searched the grep and sed sections of my Linux manual for how to search for the multi-line text. Unfortunately, I am not that clever yet and I decided on another simpler approach. Although I am sure it could be done, grep is a complex command and at this time I didn't want to invest the time necessary to get to grips with it. I assessed that the 4 lines to be removed were unique in that they were the only lines containing the words script, navigator or document. I knew that sed, (stream editor if you are interested!), could search for lines containing specified text and delete them. sed is a command line tool and the format for doing that is:

sed '/<search text>/d' <file(s) to modify>

In my example I have 3 searches to perform, for script, for navigator and for document. Searching for each one from the command line would be possible but tedious. sed is useful in that it allows commands, in this case /<search text>/d, to come from a file so I set up the following text file and called it sf.txt:

/script/d
/navigator/d
/document/d

sed works in the following way:

  • read a line from the file
  • apply each command from the sf.txt file in turn on the read line
  • output the line (by default to screen but can be redirected to a file)
  • repeat above until all lines from all files are read.

It is interesting that the operation will only read the file once despite having 3 searches to execute.

Having constructed the solution I tested it on a single test file – test.txt, located in the same folder as sf.txt. Here is my test.txt file:

block 1 line 1
block 1 line 2
block 1 line 3
<script language="JavaScript1.1">
if (navigator.appName == "Netscape")
document.write("<td bgcolor='#AAAAAA' width='1'>&nbsp;</td>");
</script>
block 2 line 1
block 2 line 2
<script language="JavaScript1.1">
if (navigator.appName == "Netscape")
document.write("<td bgcolor='#AAAAAA' width='1'>&nbsp;</td>");
</script>
block 3 line 1

Using MobaXterm I changed location to the folder where sf.txt and test.txt were located and executed the following command:

sed -ia -f sf.txt test.txt

-ia means edit in place after saving the original file with suffix 'a'. Execution of the command will cause test.txt to be edited after a copy of the original was stored as test.txta.

-f sf.txt means take the commands from the file sf.txt

The output from that one command resulted in test.txt reading:

block 1 line 1
block 1 line 2
block 1 line 3
block 2 line 1
block 2 line 2
block 3 line 1

This was the answer I wanted so, having tested that it worked on one file, I now wanted to apply it to all the files to be edited. I collected them all in a folder editfiles below the location of sf.txt and gave the following command.

sed -ia -f sf.txt ./editfiles/*

This is very similar to previously except the one file test.txt has been replaced with a (Linux) wild card, * , meaning all the files in the folder ./editfiles. As in Windows the initial '.' means start the path from the current folder.

Even I was amazed by the response. I had expected a small delay while deleting 40x4 lines in each of 30+ files. In fact it was instant (on my Intel Core i5 machine). I wondered whether it had worked but when I looked in the editfiles folder I saw twice as many files, half of which had an 'a' suffix to the filenames of the other half. I checked both the 'a' suffix file and edited output of one combination and they were just as I wanted.

Job done and I guess it took about an hour to devise the solution and execute it.

Postscript

I realise this example is not general enough so that the solution can be applied to any universal search and replace action. It is fairly easy to perform a search and replace across multiple files provided you do not want to search across multiple paragraphs. The sed command to perform such a search and replace is:

s/<search text>/<replacement text>/g

The 'g' specifies that the replacement will occur for all instances of <search text> within the line, not just the first instance.

This technique is a stepping stone on the path to creating the universal search and replace tool required. However, it shows that Linux tools can be used to solve problems in Windows and it gives me the confidence to explore grep and sed in more detail in order to create the generic tool.

Since creating this solution I have searched the web for manuals on how to use Windows PowerShell and also whether PowerShell had a universal search and replace tool. I came up with some free pdf manuals:

Mastering-PowerShell.pdf at:
http://powershell.com/Mastering-PowerShell.pdf

powershell_v2_owners_manual.pdf at:
https://allunifiedcom.files.wordpress.com/2010/07/powershell_v2_owners_manual.pdf

Version 2 of PowerShell is that which comes with Windows 7 and the owners manual is a good introduction on how to get started with PowerShell. At 68 pages it does not have the depth to cover all aspects of PowerShell. That comes from the 567 page Mastering PowerShell. This is written in 2009 so probably only covers version 1 of PowerShell. I believe later editions are available for purchase and the second edition can be read online at http://powershell.com.

In his article at:

http://windowsitpro.com/scripting/replacing-strings-files-using-powershell

Bill Stewart informs us that PowerShell does not have a native cmdlet that will perform universal search and replace. However, he has written a script to fill the void and it is available for download from the address above. I have not tested it to see how universal it is but from his article it looks as if it will hit the sweet spot.


 

 

 

 


TOP