8.6 KiB
8.6 KiB
Concepts for markdown
Example implementation
A less than ideal approach from the current example.py:
import re
def parse(markdown):
lines = markdown.split('\n')
html = ''
in_list = False
in_list_append = False
for line in lines:
result = parse_line(line, in_list, in_list_append)
html += result['line']
in_list = result['in_list']
in_list_append = result['in_list_append']
if in_list:
html += '</ul>'
return html
def wrap(line, tag):
return '<{tag}>{line}</{tag}>'.format(line=line, tag=tag)
def check_headers(line):
pattern = '# (.*)'
for index in range(6):
if re.match(pattern, line):
return wrap(line[(index + 2):], 'h' + str(index + 1))
pattern = '#' + pattern
return line
def check_bold(line):
bold_pattern = '(.*)__(.*)__(.*)'
bold_match = re.match(bold_pattern, line)
if bold_match:
return bold_match.group(1) + wrap(bold_match.group(2), 'strong')\
+ bold_match.group(3)
else:
return None
def check_italic(line):
italic_pattern = '(.*)_(.*)_(.*)'
italic_match = re.match(italic_pattern, line)
if italic_match:
return italic_match.group(1) + wrap(italic_match.group(2), 'em')\
+ italic_match.group(3)
else:
return None
def parse_line(line, in_list, in_list_append):
result = check_headers(line)
list_match = re.match(r'\* (.*)', result)
if (list_match):
if not in_list:
result = '<ul>' + wrap(list_match.group(1), 'li')
in_list = True
else:
result = wrap(list_match.group(1), 'li')
else:
if in_list:
in_list_append = True
in_list = False
if not re.match('<h|<ul|<li', result):
result = wrap(result, 'p')
if list_match is None:
result = re.sub('(.*)(<li>)(.*)(</li>)(.*)',
r'\1\2<p>\3</p>\4\5', result)
while check_bold(result):
result = check_bold(result)
while check_italic(result):
result = check_italic(result)
if in_list_append:
result = '</ul>' + result
in_list_append = False
return {
'line': result,
'in_list': in_list,
'in_list_append': in_list_append
}
An alternate example using regular expressions:
import re
def parse(markdown):
s = markdown
s = re.sub(r'__([^\n]+?)__', r'<strong>\1</strong>', s)
s = re.sub(r'_([^\n]+?)_', r'<em>\1</em>', s)
s = re.sub(r'^\* (.*?$)', r'<li>\1</li>', s, flags=re.M)
s = re.sub(r'(<li>.*</li>)', r'<ul>\1</ul>', s, flags=re.S)
for i in range(6, 0, -1):
s = re.sub(r'^{} (.*?$)'.format('#' * i), r'<h{0}>\1</h{0}>'.format(i), s, flags=re.M)
s = re.sub(r'^(?!<[hlu])(.*?$)', r'<p>\1</p>', s, flags=re.M)
s = re.sub(r'\n', '', s)
return s
Another alternate example using Python with Regex
import re
BOLD_RE = re.compile(r"__(.*?)__")
ITALICS_RE = re.compile(r"_(.*?)_")
HEADER_RE = re.compile(r"(#+) (.*)")
LIST_RE = re.compile(r"\* (.*)")
def parse(markdown: str) -> str:
"""
Parse a simple markdown-formatted string to HTML.
"""
result = []
for line in markdown.splitlines():
# expand inline bold tags
line = BOLD_RE.sub(r"<strong>\1</strong>", line)
# expand inline italics tags
line = ITALICS_RE.sub(r"<em>\1</em>", line)
# line may be a header item or a list item
is_header = HEADER_RE.match(line)
is_list = LIST_RE.match(line)
# a header is not itself a paragraph
if is_header:
result.append("<h{0}>{1}</h{0}>".format(len(is_header.group(1)),
is_header.group(2)))
# neither is any part of a list
elif is_list:
# we may be appending to an existing list
if result and result[-1] == "</ul>":
result.pop()
# or starting a new one
else:
result.append("<ul>")
result.extend(["<li>" + is_list.group(1) + "</li>", "</ul>"])
# everything else is a paragraph
else:
result.append("<p>" + line + "</p>")
return "".join(result)
Concepts
- [Refactor][refactor]: Reviewing and rewriting (or re-organizing) code for clarity and efficiency. This exercise requires a re-write of pre-existing code that uses functions to parse passed-in text in markdown.
- [Functions][functions]: Tests for this exercise expect a function named
parsethat can be called to transform the markdown formatted text and return HTML formatted text. - [Function arguments][function-arguments]: The example solutions use functions that take function arguments to operate on passed in markdown strings.
- [Regular Expressions][regular-expressions]: Both the original code to be refactored for this exercise and the example solution import and use the
remodule for Regular Expressions in python. - [Importing][importing]: Both the original code to be refactored for the exercise and the example solution use the
importkeyword to import theremodule in support of Regular Expressions in python. - [String Splitting][string-splitting]: The example solution uses
str.split()to break the passed in markdown string into a list of lines broken up by the\ncharacter. The alternate Python example solution usesstr.splitlines()for the same effect across all line end characters. - [Regular Expressions][regular-expressions]: the
re.match()function from theremodule returns amatchobject with any matched values from a specified Regular Expression or pre-compiled Regular Expression. The example usesre.match()in multiple places to search for text patterns that need re-formatting or substituting. - [Regular expressions][regular-expressions]: A Domain Specific Language (DSL) for text processing. Like many other programming languages in use, python supports a quasi-dialect of PCRE (Perl compatible regular expressions).
Regular expressionscan be used via the core pythonremodule, or the third-partyregexmodule. Both the original code to be refactored for this exercise and the example solutions use the coreremodule to accessregular expressionsfunctionality. - [Return value][return-value]: Most of the functions in the example solution specify a return value using the
returnkeyword. - [None][none]: Pythons null type, referred to when a null or "placeholder" is needed. It is in and of itself a singleton in any given python program.
- [Booleans][booleans]: True and False of type
bool. The example solution usesTrueandFalseas return values from functions that test membership in a list of values. - [Assignment][assignment]: The example solution uses assignment for variables and other values.
- [Regular Expressions][regular-expression]: the
re.sub()function of theremodule that replaces aregular expressionmatch with a new value. The example solutions use this function in various places to substitute markdown syntax for HTML syntax in the passed in markdown text. - [Dictionaries][dictionaries]: Mapping type. The example solution employs a dictionary to return values from the
parse_line()function. - [For loops][for-loops]: The example solution uses
forloops to iterate over various function inputs. - [Iteration][iterable]: The example solution uses the
for _ in _syntax to iterate over a list of lines. This is possible because a list is aniterable. - [Conditionals][conditionals]: The example solution uses
ifto check for pattern matching and membership conditions in different functions for processing different markdown patterns. - [Regular Expressions][regular-expressions]: Various functions in the re module return a
re.Matchinstance which in turn has aMatch.groupmethod.Match.groupexists even if there are no groups specified in the pattern. See the Match.group docs for more detail. - [Lists][lists]: The example uses lists in several places to hold text to be processed or searched - or for tracking the state of pieces of the passed-in text.
- [Range][range]: the
range()built-in represents an immutable sequence of numbers (or any object that implements the index magic method). Used in the example to control the number of loops while iterating through a passed-in line or list.