RegEx Library | Lifeway eBook Development Style Guide

General

extract text: in the Find window, choose ‘Extract’ to pull contents from a file or project
F: <body(?msi)(.*?)</body>
extract classes: choose ‘Extract’ to pull classes from a file or project
F: \sclass="[^"]+"
remove divs: Find divs and replace with only the div content
F: <div(?: class="[^"]+")?>((?:.|\s)*?)</div>
R: \1

Clean and Code

Languages, Apparatus and Symbols

lang-hbo: Find instances of Hebrew
F: ([ְֱֲֳִֵֶַָֹֺֻּֽ֑֖֛֢֣֤֥֦֧֪֚֭֮֒֓֔֕֗֘֙֜֝֞֟֠֡֨֩֫֬֯־ֿ׀ׁׂ׃ׅׄ׆ׇאבגדהוזחטיךכלםמןנסעףפץצקרשתװױײ׳״]+-? ?)+)
lang-grc: Find instances of Greek
F: ((?:[\x{0300}-\x{036F}\x{0370}-\x{03FF}\x{1F00}-\x{1FFF}\x{20D0}-\x{20FF}\x{FE20}-\x{FE2F}]+[,. ]*)+)
lang-grc (2): Find instances of Greek
F: ([\p{Greek}][\p{Greek} ́¨ˆ̂˘̆̑̃ˋ̔̓ ͂.,’“;]+\b)
apparatus symbols: Find apparatus symbols.
F: ([ℵ]|&#x(?:2135;|E(?:00[021];|5(?:0[45E6FA];|1[034679];))))
check lang: Find special lang characters
F: ([^A-Z][^<]*[āåâêëėèēîīôöòōûüū][^<]*)
extract lang: Choose ‘Extract’ to create a list of italicized words. Use this list to look for untagged lang or translit
F: ([^<]*)
ampersands: replace ampersands
F: ([a-z]+\s*)&(\s*[a-z]+)
R: \1&\2
unsafe chars: find characters that are unsafe to use within HTML attribute values
F: [a-z-]+="[^"]*?[\x{0000}-\x{0009}\x{000b}\x{000c}\x{000e}-\x{001f}\x{007f}-\x{009f}\x{00ad}\x{0600}-\x{0604}\x{070f}\x{17b4}\x{17b5}\x{200c}-\x{200f}\x{2028}-\x{202f}\x{2060}-\x{206f}\x{feff}\x{fff0}-\x{ffff}]+?[^"]*"

Page Breaks and Paragraphs

pagebreak breaking words: Find pagebreaks that are in between words.
F: ([a-z]+)-\s*()
R: \2 \1

Example find:
left-hand
pagebreak with no space: Find page breaks that have no space on either side.
F: (\w+)(\w+)
R: \1 \2

Example find:
Ihave
pagebreak begin line space: Find a pagebreak that has a space at the beginning of a line
F: (<[^>]*>]*>)\s
R: \1

Example find:
 All
find broken paragraphs (1): Find potential broken paragraphs
F: ([^.|!|”|?|"|>|)|:])\s*<p[^>]*>\s*(]*>)
R: \1 \2
find broken paragraphs (2): Find potential broken paragraphs. Case sensitive
F: <p([^>]*)>\s*(]*>)([a-z]+)

Scriptext

scriptext finder (1): Find blockquotes that have data-ref tags in them. (Use after running Percival)
F: <blockquote>(\s*(<p[^>]*>.*?\s*)*<p[^>]*>.*?(<a data-ref="[^"]*">[^<]*</a>.*?\s*</blockquote>))
R: <blockquote class="scriptext">\1
scriptext finder (2): Find blockquotes that have a data-ref before it. (Use after running Percival)
F: (<a data-ref="[^"]*">([^<]*)</a>(:|.)\s*)<blockquote>
R: \1<blockquote class="scriptext">

Spacing

no space between words: Find and replace words with no space in between
F: ([^<]*)(\w)
R: \1 \2

Example find:
A 100 footdrop
no space between spans: Find and replace span tags with no space in between(Check before using span combine)
F: ([^<]*)(\w+[^<]*)
R: \1 \2

Example find:
A 100 footdrop
no space open parens: Find and replace an opening parenthesis with no space before
F: (\w)(()
R: \1 \2

Example find:
100 foot drop(30 meters).
begin span spacing: Find spans lacking a space before
F: ([a-z]+)(<span)
R: \1 \2

Example find:
A100 foot drop
space after first tag: Find and replace opening tags with a space after
F: <([^>])> (.*?)
R: <\1>\2

Example find:
 A 100 foot drop
space before last tag: Find and replace closing tags with a space before
F: </(p|td|h1|h2|h3)>
R:

Example find:
drop. 
dash spacing: Find dashes with potential spacing issues
F: (\s[^>/= ]*\s[-–][^/= ]*[-–]\s[^


space after comma: Find a comma with no space after
F: ,([^"’”'<0-9 —)]+)
R: , \1


Spans
Span Combine (1)
In this Regex Library navigate to Clean and Code > Spacing > no space between spans and check before running span combine.
Find and replace to combine the content of spans with the same class.
Find:
	<span class="([^"]*)">([^<]*)</span>(\s*)<span class="\1">([^<]*)</span>

Replace:
	<span class="\1">\2\3\4</span>


Span Combine (2)
Find and replace spans that can be combined into a single class.
Find:
	<span class="([^"]*)"><span class="([^"]*)">([^<]*)</span></span>

Replace:
	<span class="\1 \2">\3</span>



remove spans from headings: Find spans in headings that are potentially not needed
F: (<h\d[^>]*>.*?)<span(\s*class="(?!label)[^"]*")*>([^<]*)</span>(.*?</h\d>)
R: \1\3\4
Example find: 
<h1><span class="i">Foreword</span></h1>
<h2>The <span class="i">Rock-Star</span> Complex</h2>
remove space within spans: Find spans with a space inside
F: <span class="([^"]+)"> ([^<]+)</span>
R: <span class="\1">\2</span> (include the space before the span)

F: <span class="([^"]+)">([^<]+) </span>
R: <span class="\1">\2</span> (include the space after the span)

move non-english chars in span: Find and replace the class of a span containing non-english characters
F: <span class="(italic|i)">([^a-zA-Z0-9\s]+)</span>
R: <span class="\1">\2</span>

remove unnecessary span: Find spans around punctuation and replace without the span
F: <span class="[^"]*">(‘|“|’|”|.|)|(|?|!|,)+</span>
R: \1
Example find: 
<span class="i">(</span>
<span class="b">.</span>
repeating spans: Find and replace adjacent spans that repeat
F: <span class="([^\n<>]+)">([^\n<>]+)</span><span class="\1">
R: <span class="\1">\2


Remove Empty Spans
Find Remove all spans that are empty or only contain space
Find:
	<span class="[A-Za-z0-9_-]*">( |)</span>

Replace:
	\1


Enhance
Abbreviations

tables to ABBR 1: convert tables to abbreviation lists
F: <tr>\s*<td>(.*?)</td>\s*<td>(.*?)</td>\s*</tr>
R: <dt epub:type="glossterm"><dfn>\1</dfn></dt><dd epub:type="glossdef">\2</dd>

tables to ABBR 2: after running tables to ABBR 1 use this regex to format the lists new lines
F: <dfn>(.*?)</dfn></dt><dd epub:type="glossdef">(.*?)</dd>
R: \n            <dfn>\1</dfn>\n          </dt>\n          <dd epub:type="glossdef">\2</dd>


Footnotes

footnote references: for footnotes not in backmatter use this find and replace to format footnote refs in each file. Adjust the find to match source file markup, if necessary, and edit the replace to ensure unique IDs. After replacing in BBEdit use Markup > Update > Document to change #FILENAME# to document filename
F: <p>(\d). (.*?)</p>
R: <div epub:type="footnote" id="\1">\n <p><sup><a href="#FILENAME##backlink-\1">\1</a></sup>&#160;<span class="note">\2</span></p>\n </div>

footnote indicators: for footnotes not in backmatter use this find and replace to format footnote indicators in each file. Adjust the find to match source file markup, if necessary, and edit the replace to ensure unique IDs. After replacing in BBEdit use Markup > Update > Document to change #FILENAME# to document filename
F: <sup>(\d+)</sup>
R: <sup class="fn" id="backlink-intro-\1"><a epub:type="noteref" href="#FILENAME##intro-\1">[\1]</a></sup>

unique footnote reference id: use filename to make footnote reference id unique
F: <sup class="fn" id="note-backlink-(\d+)"><a epub:type="noteref" href="([^#]+)_([^#]*?).xhtml#note-(\d+)">\[(\d+)]</a></sup>
R: <sup class="fn" id="note-backlink-\3-\1"><a epub:type="noteref" href="\2_\3.xhtml#note-\3-\4">[\5]</a></sup>

unique footnote indicator id: use filename to make footnote id unique
F: <div id="note-(\d+)" epub:type="footnote">\s*<p><sup><a href="([^#]+)_([^#]*?).xhtml#note-backlink-(\d+)">
R: <div id="note-\3-\1" epub:type="footnote"><p><sup><a href="\2_\3.xhtml#note-backlink-\3-\4">

remove Ibids: make sure footnotes are formatted correctly according to the style guide and then use to replace Ibids
F: (<p class="[^"]*"><sup>(\d+)</sup>(.*?<span class="i">.*?</span>).*?</p>\s*<p class="[^"]*"><sup>\d+</sup>)Ibid.(,.*?)*</p>
R: \1\3\4</p>


Index

move pagebreaks up top: find pagebreaks in a file and move them before the h1. (Run multiple times until there are no new finds)
F: (<h1[^>]*>.*?</h1>(?msi)(.*?))(<span epub:type="pagebreak"[^>]*></span>)
R: \3\1

Links

add target="_blank" to links: Add target="_blank" attribute to existing external links
F: <a href="http([^"]+)">
R: <a href="http\1" target="_blank" rel="noopener">

URLs: Add links to URLs (Does not capture every instance)
F: \shttp(.+?)([;|.|,|)][\s|<])
R: \s<a href="http\1" target="_blank" rel="noopener">http\1</a>\2\3

tag hyperlinks: find and replace to tag hyperlinks
F: <a (?:class="[^"]*"\s*)*href="((?:mail[^"]*)|(?:http[^"]*))">([^<]*)</a>
R: <a href="\1" target="_blank" rel="noopener">\2</a>

link chapters: Find potential instances where chapters can be linked. Adjust the word first to second and the number 1 to 2 etc., to find all chapters
F: (first chap(.|ters?)|chap(s?.|ters?) 1)(?!\d)

link parts: Find potential instances where parts can be linked. Adjust the word first to second and the number 1 to 2 etc., to find all parts
F: (first part|parts? 1)(?!\d)


Percival

percival parsing: add parsing tags before headings containing scripture. Replace Gen with Bible book needed
F: ^(\s+)<(h\d)>(.*?)(\d+):(.*?)</\2>
R: <span data-parsing="Gen.\4"></span>\n\1<\2>\3\4:\5</\2>

format existing scripture tags: capture existing verse tags and convert them to data-ref format
F: <data(?: tag="auto-generated")* ref="Bible[^:]&#42;:(\d&#42;\s&#42;[a-z]+)\.&#42; (\d+):(\d+)[a-z]&#42;">
R: <a data-ref="\1.\2.\3">


Commentary Markup

headings data-context: add data-context tags before headings. Adjust h3 to capture desired heading
F: ^(\s+)<(h3)>(.*?<a data-ref="(.*?)">.*?</a>.*?)</\2>
R: \1<hr data-context="\4" />\n\1<\2>\3</\2>

Review

remove pagebreaks from headings: find and replace to move pagebreaks out of headings
F: (<h\d>.*?)(<span epub:type="pagebreak[^>]*></span>)
R: \2\1
Example find: 
<h1><span epub:type="pagebreak" id="page1" title="1"></span>Chapter 1</h1>
remove space before footnote: find and replace extra space before a footnote indicator
F: \s<sup class="fn"
R: <sup class="fn"

special chars spacing: find special characters with extra spacing on either side of it
F: \s+(&#123;|&#36;|&#38;|,|:|;|?|@|&#35;|&#124;|'|&#60;|&#62;|&#45;|^|&#42;|&#40;|&#41;|%|&#33;|&#93;|&#34;|\”|\“)\s+
R: \2 \1
Example finds: 
 ( 
 : 
 $ 
special chars spans: review special characters in spans and replace the character without the span
F: <span[^>]>(&#123;|&#36;|&#38;|,|:|;|?|@|&#35;|&#124;|'|.|&#45;|^|&#40;|&#41;|%|&#33;|&#93;|&#34;|\”|\“|\—)+</span>
R: \1
Example finds: 
<span class="i">)</span>
<span class="b">.</span>
non-english chars spans: review non-english characters in spans that could be tagged as lang
F: <span class="i(?:talic)?">([^a-zA-Z0-9\s]+)</span>

missed verses: Find digits with a colon in between and no tag that could potentially be missed scripture verses
F: (?<!</abbr>|</span>)(?<!'>|[a-z]|\d|.)(?:(| )\d+:\d{1,2}(?!</a>)
Example finds: 
106:9
10:10