RegEx Library

General

  • extract text: in the Find window, choose ‘Extract’ to pull contents from a file or project
    F: <body(?msi)(.*?)</body>

  • extract classes: choose ‘Extract’ to pull classes from a file or project
    F: \sclass="[^"]+"

  • remove divs: Find divs and replace with only the div content
    F: <div(?: class="[^"]+")?>((?:.|\s)*?)</div>
    R: \1

Clean and Code

Languages, Apparatus and Symbols

  • lang-hbo: Find instances of Hebrew
    F: ([ְֱֲֳִֵֶַָֹֺֻּֽ֑֖֛֢֣֤֥֦֧֪֚֭֮֒֓֔֕֗֘֙֜֝֞֟֠֡֨֩֫֬֯־ֿ׀ׁׂ׃ׅׄ׆ׇאבגדהוזחטיךכלםמןנסעףפץצקרשתװױײ׳״]+-? ?)+)

  • lang-grc: Find instances of Greek
    F: ((?:[\x{0300}-\x{036F}\x{0370}-\x{03FF}\x{1F00}-\x{1FFF}\x{20D0}-\x{20FF}\x{FE20}-\x{FE2F}]+[,. ]*)+)

  • lang-grc (2): Find instances of Greek
    F: ([\p{Greek}][\p{Greek} ́¨ˆ̂˘̆̑̃ˋ̔̓ ͂.,’“;]+\b)

  • apparatus symbols: Find apparatus symbols.
    F: ([ℵ]|&#x(?:2135;|E(?:00[021];|5(?:0[45E6FA];|1[034679];))))

  • check lang: Find special lang characters
    F: <span class="([^"]+)">([^A-Z][^<]*[āåâêëėèēîīôöòōûüū][^<]*)</span>

  • extract lang: Choose ‘Extract’ to create a list of italicized words. Use this list to look for untagged lang or translit
    F: <span class="(italic|i)">([^<]*)</span>

  • ampersands: replace ampersands
    F: ([a-z]+\s*)&#38;(\s*[a-z]+)
    R: \1&\2

  • unsafe chars: find characters that are unsafe to use within HTML attribute values
    F: [a-z-]+="[^"]*?[\x{0000}-\x{0009}\x{000b}\x{000c}\x{000e}-\x{001f}\x{007f}-\x{009f}\x{00ad}\x{0600}-\x{0604}\x{070f}\x{17b4}\x{17b5}\x{200c}-\x{200f}\x{2028}-\x{202f}\x{2060}-\x{206f}\x{feff}\x{fff0}-\x{ffff}]+?[^"]*"

Page Breaks and Paragraphs

  • pagebreak breaking words: Find pagebreaks that are in between words.
    F: ([a-z]+)-\s*(<span epub:type="pagebreak" id="[^"]*" title="[^"]*"></span>)
    R: \2 \1

    Example find:
    left-<span epub:type="pagebreak" id="page1" title="1"></span>hand
  • pagebreak with no space: Find page breaks that have no space on either side.
    F: (\w+<span epub:type="pagebreak" id="[^"]*" title="[^"]*"></span>)(\w+)
    R: \1 \2

    Example find:
    I<span epub:type="pagebreak" id="page1" title="1"></span>have
  • pagebreak begin line space: Find a pagebreak that has a space at the beginning of a line
    F: (<[^>]*><span epub:type="pagebreak"[^>]*></span>)\s
    R: \1

    Example find:
    <p><span epub:type="pagebreak" id="page1" title="1"></span> All
  • find broken paragraphs (1): Find potential broken paragraphs
    F: ([^.|!|”|?|"|>|)|:])</p>\s*<p[^>]*>\s*(<span epub:type="pagebreak" id="page.+?" title="[^>]*></span>)
    R: \1 \2

  • find broken paragraphs (2): Find potential broken paragraphs. Case sensitive
    F: <p([^>]*)>\s*(<span epub:type="pagebreak" id="page.+?" title="[^>]*></span>)([a-z]+)

Scriptext

  • scriptext finder (1): Find blockquotes that have data-ref tags in them. (Use after running Percival)
    F: <blockquote>(\s*(<p[^>]*>.*?</p>\s*)*<p[^>]*>.*?(<a data-ref="[^"]*">[^<]*</a>.*?</p>\s*</blockquote>))
    R: <blockquote class="scriptext">\1

  • scriptext finder (2): Find blockquotes that have a data-ref before it. (Use after running Percival)
    F: (<a data-ref="[^"]*">([^<]*)</a>(:|.)</p>\s*)<blockquote>
    R: \1<blockquote class="scriptext">

Spacing

  • no space between words: Find and replace words with no space in between
    F: (<span class="(?!label)[^"]*">[^<]*</span>)(\w)
    R: \1 \2

    Example find:
    A <span class="i">100 foot</span>drop
  • no space between spans: Find and replace span tags with no space in between(Check before using span combine)
    F: (<span class="(?!label)[^"]*">[^<]*</span>)(<span class="(?!label)[^"]*">\w+[^<]*</span>)
    R: \1 \2

    Example find:
    A <span class="i">100 foot</span><span class="i">drop</span>
  • no space open parens: Find and replace an opening parenthesis with no space before
    F: (\w</span>)(()
    R: \1 \2

    Example find:
    <span class="i">100 foot drop</span>(30 meters).
  • begin span spacing: Find spans lacking a space before
    F: ([a-z]+)(<span)
    R: \1 \2

    Example find:
    A<span class="i">100 foot drop</span>
  • space after first tag: Find and replace opening tags with a space after
    F: <([^>])> (.*?)
    R: <\1>\2

    Example find:
    <p> A <span class="i">100 foot drop</span>
  • space before last tag: Find and replace closing tags with a space before
    F: </(p|td|h1|h2|h3)>
    R:

    Example find:
    drop. </p>
  • dash spacing: Find dashes with potential spacing issues
    F: (\s[^>/= ]*\s[-–][^/= ]*[-–]\s[^

  • space after comma: Find a comma with no space after
    F: ,([^"’”'<0-9 —)]+)
    R: , \1

Spans

Span Combine (1)

In this Regex Library navigate to Clean and Code > Spacing > no space between spans and check before running span combine.
Find and replace to combine the content of spans with the same class.

Find:
<span class="([^"]*)">([^<]*)</span>(\s*)<span class="\1">([^<]*)</span>

Replace:
<span class="\1">\2\3\4</span>

Span Combine (2)

Find and replace spans that can be combined into a single class.

Find:
<span class="([^"]*)"><span class="([^"]*)">([^<]*)</span></span>

Replace:
<span class="\1 \2">\3</span>
  • remove spans from headings: Find spans in headings that are potentially not needed
    F: (<h\d[^>]*>.*?)<span(\s*class="(?!label)[^"]*")*>([^<]*)</span>(.*?</h\d>)
    R: \1\3\4

    Example find:
    <h1><span class="i">Foreword</span></h1>
    <h2>The <span class="i">Rock-Star</span> Complex</h2>
  • remove space within spans: Find spans with a space inside
    F: <span class="([^"]+)"> ([^<]+)</span>
    R: <span class="\1">\2</span> (include the space before the span)

    F: <span class="([^"]+)">([^<]+) </span>
    R: <span class="\1">\2</span> (include the space after the span)

  • move non-english chars in span: Find and replace the class of a span containing non-english characters
    F: <span class="(italic|i)">([^a-zA-Z0-9\s]+)</span>
    R: <span class="\1">\2</span>

  • remove unnecessary span: Find spans around punctuation and replace without the span
    F: <span class="[^"]*">(‘|“|’|”|.|)|(|?|!|,)+</span>
    R: \1

    Example find:
    <span class="i">(</span>
    <span class="b">.</span>
  • repeating spans: Find and replace adjacent spans that repeat
    F: <span class="([^\n<>]+)">([^\n<>]+)</span><span class="\1">
    R: <span class="\1">\2

Remove Empty Spans

Find Remove all spans that are empty or only contain space

Find:
<span class="[A-Za-z0-9_-]*">( |)</span>

Replace:
\1

Enhance

Abbreviations

  • tables to ABBR 1: convert tables to abbreviation lists
    F: <tr>\s*<td>(.*?)</td>\s*<td>(.*?)</td>\s*</tr>
    R: <dt epub:type="glossterm"><dfn>\1</dfn></dt><dd epub:type="glossdef">\2</dd>

  • tables to ABBR 2: after running tables to ABBR 1 use this regex to format the lists new lines
    F: <dfn>(.*?)</dfn></dt><dd epub:type="glossdef">(.*?)</dd>
    R: \n <dfn>\1</dfn>\n </dt>\n <dd epub:type="glossdef">\2</dd>

Footnotes

  • footnote references: for footnotes not in backmatter use this find and replace to format footnote refs in each file. Adjust the find to match source file markup, if necessary, and edit the replace to ensure unique IDs. After replacing in BBEdit use Markup > Update > Document to change #FILENAME# to document filename
    F: <p>(\d). (.*?)</p>
    R: <div epub:type="footnote" id="\1">\n <p><sup><a href="#FILENAME##backlink-\1">\1</a></sup>&#160;<span class="note">\2</span></p>\n </div>

  • footnote indicators: for footnotes not in backmatter use this find and replace to format footnote indicators in each file. Adjust the find to match source file markup, if necessary, and edit the replace to ensure unique IDs. After replacing in BBEdit use Markup > Update > Document to change #FILENAME# to document filename
    F: <sup>(\d+)</sup>
    R: <sup class="fn" id="backlink-intro-\1"><a epub:type="noteref" href="#FILENAME##intro-\1">[\1]</a></sup>

  • unique footnote reference id: use filename to make footnote reference id unique
    F: <sup class="fn" id="note-backlink-(\d+)"><a epub:type="noteref" href="([^#]+)_([^#]*?).xhtml#note-(\d+)">\[(\d+)]</a></sup>
    R: <sup class="fn" id="note-backlink-\3-\1"><a epub:type="noteref" href="\2_\3.xhtml#note-\3-\4">[\5]</a></sup>

  • unique footnote indicator id: use filename to make footnote id unique
    F: <div id="note-(\d+)" epub:type="footnote">\s*<p><sup><a href="([^#]+)_([^#]*?).xhtml#note-backlink-(\d+)">
    R: <div id="note-\3-\1" epub:type="footnote"><p><sup><a href="\2_\3.xhtml#note-backlink-\3-\4">

  • remove Ibids: make sure footnotes are formatted correctly according to the style guide and then use to replace Ibids
    F: (<p class="[^"]*"><sup>(\d+)</sup>(.*?<span class="i">.*?</span>).*?</p>\s*<p class="[^"]*"><sup>\d+</sup>)Ibid.(,.*?)*</p>
    R: \1\3\4</p>

Index

  • move pagebreaks up top: find pagebreaks in a file and move them before the h1. (Run multiple times until there are no new finds)
    F: (<h1[^>]*>.*?</h1>(?msi)(.*?))(<span epub:type="pagebreak"[^>]*></span>)
    R: \3\1

Links

  • add target="_blank" to links: Add target="_blank" attribute to existing external links
    F: <a href="http([^"]+)">
    R: <a href="http\1" target="_blank" rel="noopener">

  • URLs: Add links to URLs (Does not capture every instance)
    F: \shttp(.+?)([;|.|,|)][\s|<])
    R: \s<a href="http\1" target="_blank" rel="noopener">http\1</a>\2\3

  • tag hyperlinks: find and replace to tag hyperlinks
    F: <a (?:class="[^"]*"\s*)*href="((?:mail[^"]*)|(?:http[^"]*))">([^<]*)</a>
    R: <a href="\1" target="_blank" rel="noopener">\2</a>

  • link chapters: Find potential instances where chapters can be linked. Adjust the word first to second and the number 1 to 2 etc., to find all chapters
    F: (first chap(.|ters?)|chap(s?.|ters?) 1)(?!\d)

  • link parts: Find potential instances where parts can be linked. Adjust the word first to second and the number 1 to 2 etc., to find all parts
    F: (first part|parts? 1)(?!\d)

Percival

  • percival parsing: add parsing tags before headings containing scripture. Replace Gen with Bible book needed
    F: ^(\s+)<(h\d)>(.*?)(\d+):(.*?)</\2>
    R: <span data-parsing="Gen.\4"></span>\n\1<\2>\3\4:\5</\2>

  • format existing scripture tags: capture existing verse tags and convert them to data-ref format
    F: <data(?: tag="auto-generated")* ref="Bible[^:]&#42;:(\d&#42;\s&#42;[a-z]+)\.&#42; (\d+):(\d+)[a-z]&#42;">
    R: <a data-ref="\1.\2.\3">

Commentary Markup

  • headings data-context: add data-context tags before headings. Adjust h3 to capture desired heading
    F: ^(\s+)<(h3)>(.*?<a data-ref="(.*?)">.*?</a>.*?)</\2>
    R: \1<hr data-context="\4" />\n\1<\2>\3</\2>

Review

  • remove pagebreaks from headings: find and replace to move pagebreaks out of headings
    F: (<h\d>.*?)(<span epub:type="pagebreak[^>]*></span>)
    R: \2\1

    Example find:
    <h1><span epub:type="pagebreak" id="page1" title="1"></span>Chapter 1</h1>
  • remove space before footnote: find and replace extra space before a footnote indicator
    F: \s<sup class="fn"
    R: <sup class="fn"

  • special chars spacing: find special characters with extra spacing on either side of it
    F: \s+(&#123;|&#36;|&#38;|,|:|;|?|@|&#35;|&#124;|'|&#60;|&#62;|&#45;|^|&#42;|&#40;|&#41;|%|&#33;|&#93;|&#34;|\”|\“)\s+
    R: \2 \1

    Example finds:
    (
    :
    $
  • special chars spans: review special characters in spans and replace the character without the span
    F: <span[^>]>(&#123;|&#36;|&#38;|,|:|;|?|@|&#35;|&#124;|'|.|&#45;|^|&#40;|&#41;|%|&#33;|&#93;|&#34;|\”|\“|\—)+</span>
    R: \1

    Example finds:
    <span class="i">)</span>
    <span class="b">.</span>
  • non-english chars spans: review non-english characters in spans that could be tagged as lang
    F: <span class="i(?:talic)?">([^a-zA-Z0-9\s]+)</span>

  • missed verses: Find digits with a colon in between and no tag that could potentially be missed scripture verses
    F: (?<!</abbr>|</span>)(?<!'>|[a-z]|\d|.)(?:(| )\d+:\d{1,2}(?!</a>)

    Example finds:
    106:9
    10:10