Friday 3 November 2017

regex - Microsoft Word wildcards


I have a document that has the word "The" capitalised in many places it shouldn't be. I wish to change "The" in those cases, to "the".


I need to change any instances of "The" that are not at the beginning of a sentence, that is, a capital is being used in the middle of a sentence, for example, A dog bit The boy.


I played around with the following


Use the wildcard expression <The> and replace results with the -- This will change each "The" to "the". From there I had planned on finding the words that SHOULD be capitalised and changing them back to "The", thus keeping all the lower-case words where they should be.



  1. Ctrl + H to open the Replace Window. Find *<The> and replace with the*

  2. Next, I wanted to find each "the" that was at the beginning of a new line and hence should be capitalised. [!. {1,9}]<the>. This should find each "the" that is NOT preceded by by 1-9 white-spaces. I set the replacement text to "The" to re-capitalise them.

  3. Next, I wanted to find each "the" that was at the start of a new sentence, but not at the start of a new line, for example, the Dog bit the boy. the boy cried hard. I came up with the following expression: [. {1,9}]<the> to be replaced with ". The". This should find each "the" that is preceded by a period and between one and nine white-spaces (in case of clumsy formatting).


That should set everything the way I want.


I ran in to the following issues:



  • When following step 3, it is not specific enough. It returns results that are well beyong a period. enter image description here


How should I write this so that I can find the beginning of a sentence not on a new line? Such as selecting the second the in the following: A brick smashed the window. the nun was not pleased.



  • When following step 2, I am getting results with a white-space selected. I am trying to find "The" at the beginning of a sentence and a new line, that is, there shouldn't be any spaces or a period before it. This doesn't work; it returns the following results.


Step 2 Why is it selecting a white-space with the text, and how should I alter it to only select "The" at the beginning of a sentence, and new line.


Also, what would be an example expression that finds only "The" when it is preceded by one of the following symbols: -,*,:, for example, * The. It would be helpful if you could also inverse that so that it only finds "The" when it is not preceded by one of those symbols.



Answer



In regards to your issue with selecting extra spaces, the solution is to let Word select the spaces, but have it have no net effect on these spaces...


Add parentheses to the find, so that Word will assign variables to the sections of the search:


([!. {1,9}])(the)


Then, in the replace with


\1The


This will substitute back in the spaces right where they were.


As for



please help me with an example expression that finds only "The" when it is preceded by one of the following symbols -,*,: e.g * The.



Just escape the asterisk with the "\". So it would be: [\*:,]The and the opposite would be [!\*:,]The


No comments:

Post a Comment

Where does Skype save my contact&#39;s avatars in Linux?

I'm using Skype on Linux. Where can I find images cached by skype of my contact's avatars? Answer I wanted to get those Skype avat...