Vim regular expressions and escape characters

Today I learned that regex in Vim can be even more irritating than normal regex.

I work for an agency that is (rightly) anal about typography which means that apostrophes should always be curly. This means (mis-)typing ’ a lot. I thought I’d write a little :substitute command in my vimrc to make all apostrophes in body copy (titles, paragraphs, links and spans) “curly”. I started off by working out a regex that would find such things:

/\v\<(h[1-6]|p|a|span).+(\_.)*\zs'\ze(\_.)*\</\1\>

Let’s break it down

  • /v – very magic switch that means less escape characters
  • \< – find an opening tag
  • (h[1-6]|p|a|span) – any tags for body copy
  • (_.)* – zero or more characters (incl. new lines)
  • \zs'\ze – limit the pattern to apostrophes followed by:
  • zero or more characters and accompanying closing tag (the \1 matches the captured opening tag)

This (eventually) worked like a charm. It can no doubt be optimised by someone with more of a Vim Regex brain than me.

I also could have been smart about characters trailing the apostrophe (like ‘s, ‘d, ‘m, ‘ll, ‘ve) etc. but this seemed needlessly complex and my brain was already hurting. I was then able to do a global substitute for the encoded curly quote:

:%s//\&rsquo;/

The next step was to transfer all this to my .vimrc - I mapped it to <leader> Q for “Quotes”.

" replace aposrophes with curly ones in body copy
nnoremap &lt;leader&gt;Q :%s/\v\&lt;(h[1-6]|p|a|span).+(\_.)*\zs'\ze(\_.)*\&lt;/\1\&gt;/\&rsquo;/&lt;CR&gt;

Now I saved my config and was shouted at by Vim (my .vimrc auto sources on save).

vimerror

After a lot of head scratching it turned out that the OR operator needed to be escaped even when using /v and even though it worked whilst searching and substituting in a buffer - I believe it has something to do with | having some kind of special meaning in vimscript.

This is the final snippet:

" replace aposrophes with curly ones in body copy
nnoremap &lt;leader&gt;Q :%s/\v\&lt;(h[1-6]\|p\|a\|span).+(\_.)*\zs'\ze(\_.)*\&lt;/\1\&gt;/\&rsquo;/&lt;CR&gt;

Problem solved, designers happy and lots of curly quotes everywhere.

If you hate email but still want to keep in touch, follow me on Twitter.

Guy Routledge avatar
Currently available hire me