General
extract text: in the Find window, choose ‘Extract’ to pull contents from a file or project
F:<body(?msi)(.*?)</body>
extract classes: choose ‘Extract’ to pull classes from a file or project
F:\sclass="[^"]+"
remove divs: Find divs and replace with only the div content
F:<div(?: class="[^"]+")?>((?:.|\s)*?)</div>
R:\1
Clean and Code
Languages, Apparatus and Symbols
lang-hbo: Find instances of Hebrew
F:([ְֱֲֳִֵֶַָֹֺֻּֽ֑֖֛֢֣֤֥֦֧֪֚֭֮֒֓֔֕֗֘֙֜֝֞֟֠֡֨֩֫֬֯־ֿ׀ׁׂ׃ׅׄ׆ׇאבגדהוזחטיךכלםמןנסעףפץצקרשתװױײ׳״]+-? ?)+)
lang-grc: Find instances of Greek
F:((?:[\x{0300}-\x{036F}\x{0370}-\x{03FF}\x{1F00}-\x{1FFF}\x{20D0}-\x{20FF}\x{FE20}-\x{FE2F}]+[,. ]*)+)
lang-grc (2): Find instances of Greek
F:([\p{Greek}][\p{Greek} ́¨ˆ̂˘̆̑̃ˋ̔̓ ͂.,’“;]+\b)
apparatus symbols: Find apparatus symbols.
F:([ℵ]|&#x(?:2135;|E(?:00[021];|5(?:0[45E6FA];|1[034679];))))
check lang: Find special
lang
characters
F:<span class="([^"]+)">([^A-Z][^<]*[āåâêëėèēîīôöòōûüū][^<]*)</span>
extract lang: Choose ‘Extract’ to create a list of italicized words. Use this list to look for untagged lang or translit
F:<span class="(italic|i)">([^<]*)</span>
ampersands: replace ampersands
F:([a-z]+\s*)&(\s*[a-z]+)
R:\1&\2
unsafe chars: find characters that are unsafe to use within HTML attribute values
F:[a-z-]+="[^"]*?[\x{0000}-\x{0009}\x{000b}\x{000c}\x{000e}-\x{001f}\x{007f}-\x{009f}\x{00ad}\x{0600}-\x{0604}\x{070f}\x{17b4}\x{17b5}\x{200c}-\x{200f}\x{2028}-\x{202f}\x{2060}-\x{206f}\x{feff}\x{fff0}-\x{ffff}]+?[^"]*"
Page Breaks and Paragraphs
pagebreak breaking words: Find pagebreaks that are in between words.
F:([a-z]+)-\s*(<span epub:type="pagebreak" id="[^"]*" title="[^"]*"></span>)
R:\2 \1
Example find:
left-<span epub:type="pagebreak" id="page1" title="1"></span>hand
pagebreak with no space: Find page breaks that have no space on either side.
F:(\w+<span epub:type="pagebreak" id="[^"]*" title="[^"]*"></span>)(\w+)
R:\1 \2
Example find:
I<span epub:type="pagebreak" id="page1" title="1"></span>have
pagebreak begin line space: Find a pagebreak that has a space at the beginning of a line
F:(<[^>]*><span epub:type="pagebreak"[^>]*></span>)\s
R:\1
Example find:
<p><span epub:type="pagebreak" id="page1" title="1"></span> All
find broken paragraphs (1): Find potential broken paragraphs
F:([^.|!|”|?|"|>|)|:])</p>\s*<p[^>]*>\s*(<span epub:type="pagebreak" id="page.+?" title="[^>]*></span>)
R:\1 \2
find broken paragraphs (2): Find potential broken paragraphs. Case sensitive
F:<p([^>]*)>\s*(<span epub:type="pagebreak" id="page.+?" title="[^>]*></span>)([a-z]+)
Scriptext
scriptext finder (1): Find blockquotes that have data-ref tags in them. (Use after running Percival)
F:<blockquote>(\s*(<p[^>]*>.*?</p>\s*)*<p[^>]*>.*?(<a data-ref="[^"]*">[^<]*</a>.*?</p>\s*</blockquote>))
R:<blockquote class="scriptext">\1
scriptext finder (2): Find blockquotes that have a data-ref before it. (Use after running Percival)
F:(<a data-ref="[^"]*">([^<]*)</a>(:|.)</p>\s*)<blockquote>
R:\1<blockquote class="scriptext">
Spacing
no space between words: Find and replace words with no space in between
F:(<span class="(?!label)[^"]*">[^<]*</span>)(\w)
R:\1 \2
Example find:
A <span class="i">100 foot</span>drop
no space between spans: Find and replace span tags with no space in between(Check before using span combine)
F:(<span class="(?!label)[^"]*">[^<]*</span>)(<span class="(?!label)[^"]*">\w+[^<]*</span>)
R:\1 \2
Example find:
A <span class="i">100 foot</span><span class="i">drop</span>
no space open parens: Find and replace an opening parenthesis with no space before
F:(\w</span>)(()
R:\1 \2
Example find:
<span class="i">100 foot drop</span>(30 meters).
begin span spacing: Find spans lacking a space before
F:([a-z]+)(<span)
R:\1 \2
Example find:
A<span class="i">100 foot drop</span>
space after first tag: Find and replace opening tags with a space after
F:<([^>])> (.*?)
R:<\1>\2
Example find:
<p> A <span class="i">100 foot drop</span>
space before last tag: Find and replace closing tags with a space before
F:</(p|td|h1|h2|h3)>
R:Example find:
drop. </p>
dash spacing: Find dashes with potential spacing issues
F:(\s[^>/= ]*\s[-–][^/= ]*[-–]\s[^
space after comma: Find a comma with no space after
F:,([^"’”'<0-9 —)]+)
R:, \1
Spans
Span Combine (1)
In this Regex Library navigate to Clean and Code > Spacing > no space between spans and check before running span combine.
Find and replace to combine the content of spans with the same class.
Find:
<span class="([^"]*)">([^<]*)</span>(\s*)<span class="\1">([^<]*)</span>
Replace:
<span class="\1">\2\3\4</span>
Span Combine (2)
Find and replace spans that can be combined into a single class.
Find:
<span class="([^"]*)"><span class="([^"]*)">([^<]*)</span></span>
Replace:
<span class="\1 \2">\3</span>
remove spans from headings: Find spans in headings that are potentially not needed
F: (<h\d[^>]*>.*?)<span(\s*class="(?!label)[^"]*")*>([^<]*)</span>(.*?</h\d>)
R: \1\3\4
Example find:
<h1><span class="i">Foreword</span></h1>
<h2>The <span class="i">Rock-Star</span> Complex</h2>
remove space within spans: Find spans with a space inside
F: <span class="([^"]+)"> ([^<]+)</span>
R: <span class="\1">\2</span>
(include the space before the span)
F: <span class="([^"]+)">([^<]+) </span>
R: <span class="\1">\2</span>
(include the space after the span)
move non-english chars in span: Find and replace the class of a span containing non-english characters
F: <span class="(italic|i)">([^a-zA-Z0-9\s]+)</span>
R: <span class="\1">\2</span>
remove unnecessary span: Find spans around punctuation and replace without the span
F: <span class="[^"]*">(‘|“|’|”|.|)|(|?|!|,)+</span>
R: \1
Example find:
<span class="i">(</span>
<span class="b">.</span>
repeating spans: Find and replace adjacent spans that repeat
F: <span class="([^\n<>]+)">([^\n<>]+)</span><span class="\1">
R: <span class="\1">\2
Remove Empty Spans
Find Remove all spans that are empty or only contain space
Find:
<span class="[A-Za-z0-9_-]*">( |)</span>
Replace:
\1
Enhance
Abbreviations
tables to ABBR 1: convert tables to abbreviation lists
F: <tr>\s*<td>(.*?)</td>\s*<td>(.*?)</td>\s*</tr>
R: <dt epub:type="glossterm"><dfn>\1</dfn></dt><dd epub:type="glossdef">\2</dd>
tables to ABBR 2: after running tables to ABBR 1 use this regex to format the lists new lines
F: <dfn>(.*?)</dfn></dt><dd epub:type="glossdef">(.*?)</dd>
R: \n <dfn>\1</dfn>\n </dt>\n <dd epub:type="glossdef">\2</dd>
Footnotes
footnote references: for footnotes not in backmatter
use this find and replace to format footnote refs in each file. Adjust the find to match source file markup, if necessary, and edit the replace to ensure unique IDs. After replacing in BBEdit use Markup > Update > Document to change #FILENAME#
to document filename
F: <p>(\d). (.*?)</p>
R: <div epub:type="footnote" id="\1">\n <p><sup><a href="#FILENAME##backlink-\1">\1</a></sup> <span class="note">\2</span></p>\n </div>
footnote indicators: for footnotes not in backmatter
use this find and replace to format footnote indicators in each file. Adjust the find to match source file markup, if necessary, and edit the replace to ensure unique IDs. After replacing in BBEdit use Markup > Update > Document to change #FILENAME#
to document filename
F: <sup>(\d+)</sup>
R: <sup class="fn" id="backlink-intro-\1"><a epub:type="noteref" href="#FILENAME##intro-\1">[\1]</a></sup>
unique footnote reference id: use filename to make footnote reference id unique
F: <sup class="fn" id="note-backlink-(\d+)"><a epub:type="noteref" href="([^#]+)_([^#]*?).xhtml#note-(\d+)">\[(\d+)]</a></sup>
R: <sup class="fn" id="note-backlink-\3-\1"><a epub:type="noteref" href="\2_\3.xhtml#note-\3-\4">[\5]</a></sup>
unique footnote indicator id: use filename to make footnote id unique
F: <div id="note-(\d+)" epub:type="footnote">\s*<p><sup><a href="([^#]+)_([^#]*?).xhtml#note-backlink-(\d+)">
R: <div id="note-\3-\1" epub:type="footnote"><p><sup><a href="\2_\3.xhtml#note-backlink-\3-\4">
remove Ibids: make sure footnotes are formatted correctly according to the style guide and then use to replace Ibids
F: (<p class="[^"]*"><sup>(\d+)</sup>(.*?<span class="i">.*?</span>).*?</p>\s*<p class="[^"]*"><sup>\d+</sup>)Ibid.(,.*?)*</p>
R: \1\3\4</p>
Index
- move pagebreaks up top: find pagebreaks in a file and move them before the h1. (Run multiple times until there are no new finds)
F: (<h1[^>]*>.*?</h1>(?msi)(.*?))(<span epub:type="pagebreak"[^>]*></span>)
R: \3\1
Links
add target="_blank"
to links: Add target="_blank"
attribute to existing external links
F: <a href="http([^"]+)">
R: <a href="http\1" target="_blank" rel="noopener">
URLs: Add links to URLs (Does not capture every instance)
F: \shttp(.+?)([;|.|,|)][\s|<])
R: \s<a href="http\1" target="_blank" rel="noopener">http\1</a>\2\3
tag hyperlinks: find and replace to tag hyperlinks
F: <a (?:class="[^"]*"\s*)*href="((?:mail[^"]*)|(?:http[^"]*))">([^<]*)</a>
R: <a href="\1" target="_blank" rel="noopener">\2</a>
link chapters: Find potential instances where chapters can be linked. Adjust the word first
to second
and the number 1
to 2
etc., to find all chapters
F: (first chap(.|ters?)|chap(s?.|ters?) 1)(?!\d)
link parts: Find potential instances where parts can be linked. Adjust the word first
to second
and the number 1
to 2
etc., to find all parts
F: (first part|parts? 1)(?!\d)
Percival
percival parsing: add parsing tags before headings containing scripture. Replace Gen
with Bible book needed
F: ^(\s+)<(h\d)>(.*?)(\d+):(.*?)</\2>
R: <span data-parsing="Gen.\4"></span>\n\1<\2>\3\4:\5</\2>
format existing scripture tags: capture existing verse tags and convert them to data-ref
format
F: <data(?: tag="auto-generated")* ref="Bible[^:]*:(\d*\s*[a-z]+)\.* (\d+):(\d+)[a-z]*">
R: <a data-ref="\1.\2.\3">
Commentary Markup
- headings
data-context
: add data-context
tags before headings. Adjust h3
to capture desired heading
F: ^(\s+)<(h3)>(.*?<a data-ref="(.*?)">.*?</a>.*?)</\2>
R: \1<hr data-context="\4" />\n\1<\2>\3</\2>
Review
remove pagebreaks from headings: find and replace to move pagebreaks out of headings
F: (<h\d>.*?)(<span epub:type="pagebreak[^>]*></span>)
R: \2\1
Example find:
<h1><span epub:type="pagebreak" id="page1" title="1"></span>Chapter 1</h1>
remove space before footnote: find and replace extra space before a footnote indicator
F: \s<sup class="fn"
R: <sup class="fn"
special chars spacing: find special characters with extra spacing on either side of it
F: \s+({|$|&|,|:|;|?|@|#|||'|<|>|-|^|*|(|)|%|!|]|"|\”|\“)\s+
R: \2 \1
Example finds:
(
:
$
special chars spans: review special characters in spans and replace the character without the span
F: <span[^>]>({|$|&|,|:|;|?|@|#|||'|.|-|^|(|)|%|!|]|"|\”|\“|\—)+</span>
R: \1
Example finds:
<span class="i">)</span>
<span class="b">.</span>
non-english chars spans: review non-english characters in spans that could be tagged as lang
F: <span class="i(?:talic)?">([^a-zA-Z0-9\s]+)</span>
missed verses: Find digits with a colon in between and no tag that could potentially be missed scripture verses
F: (?<!</abbr>|</span>)(?<!'>|[a-z]|\d|.)(?:(| )\d+:\d{1,2}(?!</a>)
Example finds:
106:9
10:10