178 lines
8.6 KiB
Markdown
178 lines
8.6 KiB
Markdown
# Concepts for `markdown`
|
|
|
|
## Example implementation
|
|
|
|
A less than ideal approach from the current [example.py](https://github.com/exercism/python/blob/master/exercises/markdown/example.py):
|
|
|
|
```python
|
|
import re
|
|
|
|
def parse(markdown):
|
|
lines = markdown.split('\n')
|
|
html = ''
|
|
in_list = False
|
|
in_list_append = False
|
|
for line in lines:
|
|
result = parse_line(line, in_list, in_list_append)
|
|
html += result['line']
|
|
in_list = result['in_list']
|
|
in_list_append = result['in_list_append']
|
|
if in_list:
|
|
html += '</ul>'
|
|
return html
|
|
|
|
def wrap(line, tag):
|
|
return '<{tag}>{line}</{tag}>'.format(line=line, tag=tag)
|
|
|
|
def check_headers(line):
|
|
pattern = '# (.*)'
|
|
for index in range(6):
|
|
if re.match(pattern, line):
|
|
return wrap(line[(index + 2):], 'h' + str(index + 1))
|
|
pattern = '#' + pattern
|
|
return line
|
|
|
|
def check_bold(line):
|
|
bold_pattern = '(.*)__(.*)__(.*)'
|
|
bold_match = re.match(bold_pattern, line)
|
|
if bold_match:
|
|
return bold_match.group(1) + wrap(bold_match.group(2), 'strong')\
|
|
+ bold_match.group(3)
|
|
else:
|
|
return None
|
|
|
|
def check_italic(line):
|
|
italic_pattern = '(.*)_(.*)_(.*)'
|
|
italic_match = re.match(italic_pattern, line)
|
|
if italic_match:
|
|
return italic_match.group(1) + wrap(italic_match.group(2), 'em')\
|
|
+ italic_match.group(3)
|
|
else:
|
|
return None
|
|
|
|
def parse_line(line, in_list, in_list_append):
|
|
result = check_headers(line)
|
|
|
|
list_match = re.match(r'\* (.*)', result)
|
|
|
|
if (list_match):
|
|
if not in_list:
|
|
result = '<ul>' + wrap(list_match.group(1), 'li')
|
|
in_list = True
|
|
else:
|
|
result = wrap(list_match.group(1), 'li')
|
|
else:
|
|
if in_list:
|
|
in_list_append = True
|
|
in_list = False
|
|
|
|
if not re.match('<h|<ul|<li', result):
|
|
result = wrap(result, 'p')
|
|
|
|
if list_match is None:
|
|
result = re.sub('(.*)(<li>)(.*)(</li>)(.*)',
|
|
r'\1\2<p>\3</p>\4\5', result)
|
|
|
|
while check_bold(result):
|
|
result = check_bold(result)
|
|
while check_italic(result):
|
|
result = check_italic(result)
|
|
|
|
if in_list_append:
|
|
result = '</ul>' + result
|
|
in_list_append = False
|
|
|
|
return {
|
|
'line': result,
|
|
'in_list': in_list,
|
|
'in_list_append': in_list_append
|
|
}
|
|
```
|
|
|
|
An alternate example using [regular expressions](https://exercism.io/tracks/python/exercises/markdown/solutions/daf30e5227414a61a00bac391ee2bd79):
|
|
|
|
```python
|
|
import re
|
|
|
|
|
|
def parse(markdown):
|
|
s = markdown
|
|
s = re.sub(r'__([^\n]+?)__', r'<strong>\1</strong>', s)
|
|
s = re.sub(r'_([^\n]+?)_', r'<em>\1</em>', s)
|
|
s = re.sub(r'^\* (.*?$)', r'<li>\1</li>', s, flags=re.M)
|
|
s = re.sub(r'(<li>.*</li>)', r'<ul>\1</ul>', s, flags=re.S)
|
|
for i in range(6, 0, -1):
|
|
s = re.sub(r'^{} (.*?$)'.format('#' * i), r'<h{0}>\1</h{0}>'.format(i), s, flags=re.M)
|
|
s = re.sub(r'^(?!<[hlu])(.*?$)', r'<p>\1</p>', s, flags=re.M)
|
|
s = re.sub(r'\n', '', s)
|
|
return s
|
|
```
|
|
|
|
Another alternate example using [Python with Regex](https://exercism.io/tracks/python/exercises/markdown/solutions/a1f1d7b60bfc42818b2c2225fe0f8d7a)
|
|
|
|
```python
|
|
import re
|
|
|
|
BOLD_RE = re.compile(r"__(.*?)__")
|
|
ITALICS_RE = re.compile(r"_(.*?)_")
|
|
HEADER_RE = re.compile(r"(#+) (.*)")
|
|
LIST_RE = re.compile(r"\* (.*)")
|
|
|
|
|
|
def parse(markdown: str) -> str:
|
|
"""
|
|
Parse a simple markdown-formatted string to HTML.
|
|
"""
|
|
result = []
|
|
for line in markdown.splitlines():
|
|
# expand inline bold tags
|
|
line = BOLD_RE.sub(r"<strong>\1</strong>", line)
|
|
# expand inline italics tags
|
|
line = ITALICS_RE.sub(r"<em>\1</em>", line)
|
|
|
|
# line may be a header item or a list item
|
|
is_header = HEADER_RE.match(line)
|
|
is_list = LIST_RE.match(line)
|
|
|
|
# a header is not itself a paragraph
|
|
if is_header:
|
|
result.append("<h{0}>{1}</h{0}>".format(len(is_header.group(1)),
|
|
is_header.group(2)))
|
|
# neither is any part of a list
|
|
elif is_list:
|
|
# we may be appending to an existing list
|
|
if result and result[-1] == "</ul>":
|
|
result.pop()
|
|
# or starting a new one
|
|
else:
|
|
result.append("<ul>")
|
|
result.extend(["<li>" + is_list.group(1) + "</li>", "</ul>"])
|
|
# everything else is a paragraph
|
|
else:
|
|
result.append("<p>" + line + "</p>")
|
|
return "".join(result)
|
|
```
|
|
|
|
## Concepts
|
|
|
|
- [Refactor][refactor]: Reviewing and rewriting (or re-organizing) code for clarity and efficiency. This exercise requires a re-write of pre-existing code that uses functions to parse passed-in text in markdown.
|
|
- [Functions][functions]: Tests for this exercise expect a function named `parse` that can be called to transform the _markdown_ formatted text and return HTML formatted text.
|
|
- [Function arguments][function-arguments]: The example solutions use functions that take function arguments to operate on passed in markdown strings.
|
|
- [Regular Expressions][regular-expressions]: Both the original code to be refactored for this exercise and the example solution import and use the `re` module for Regular Expressions in python.
|
|
- [Importing][importing]: Both the original code to be refactored for the exercise and the example solution use the `import` keyword to import the `re` module in support of Regular Expressions in python.
|
|
- [String Splitting][string-splitting]: The example solution uses `str.split()` to break the passed in markdown string into a list of lines broken up by the `\n` character. The alternate Python example solution uses `str.splitlines()` for the same effect across all line end characters.
|
|
- [Regular Expressions][regular-expressions]: the `re.match()` function from the `re` module returns a `match` object with any matched values from a specified Regular Expression or pre-compiled Regular Expression. The example uses `re.match()` in multiple places to search for text patterns that need re-formatting or substituting.
|
|
- [Regular expressions][regular-expressions]: A Domain Specific Language (DSL) for text processing. Like many other programming languages in use, python supports a quasi-dialect of PCRE (_Perl compatible regular expressions_). `Regular expressions` can be used via the core python `re` module, or the third-party `regex` module. Both the original code to be refactored for this exercise and the example solutions use the core `re` module to access `regular expressions` functionality.
|
|
- [Return value][return-value]: Most of the functions in the example solution specify a _return_ value using the `return` keyword.
|
|
- [None][none]: Pythons null type, referred to when a null or "placeholder" is needed. It is in and of itself a singleton in any given python program.
|
|
- [Booleans][booleans]: True and False of type `bool`. The example solution uses `True` and `False` as return values from functions that test membership in a list of values.
|
|
- [Assignment][assignment]: The example solution uses assignment for variables and other values.
|
|
- [Regular Expressions][regular-expression]: the `re.sub()` function of the `re` module that replaces a `regular expression` match with a new value. The example solutions use this function in various places to substitute _markdown_ syntax for _HTML_ syntax in the passed in markdown text.
|
|
- [Dictionaries][dictionaries]: Mapping type. The example solution employs a dictionary to return values from the `parse_line()` function.
|
|
- [For loops][for-loops]: The example solution uses `for` loops to iterate over various function inputs.
|
|
- [Iteration][iterable]: The example solution uses the `for _ in _` syntax to iterate over a list of lines. This is possible because a list is an `iterable`.
|
|
- [Conditionals][conditionals]: The example solution uses `if` to check for pattern matching and membership conditions in different functions for processing different markdown patterns.
|
|
- [Regular Expressions][regular-expressions]: Various functions in the re module return a `re.Match` _instance_ which in turn has a `Match.group` method. `Match.group` exists even if there are no groups specified in the pattern. See the [Match.group docs](https://docs.python.org/3/library/re.html#re.Match.group) for more detail.
|
|
- [Lists][lists]: The example uses lists in several places to hold text to be processed or searched - or for tracking the state of pieces of the passed-in text.
|
|
- [Range][range]: the `range()` built-in represents an immutable sequence of numbers (or any object that implements the **index** magic method). Used in the example to control the number of loops while iterating through a passed-in line or list.
|