Files
python/reference/exercise-concepts/markdown.md

178 lines
8.6 KiB
Markdown

# Concepts for `markdown`
## Example implementation
A less than ideal approach from the current [example.py](https://github.com/exercism/python/blob/master/exercises/markdown/example.py):
```python
import re
def parse(markdown):
lines = markdown.split('\n')
html = ''
in_list = False
in_list_append = False
for line in lines:
result = parse_line(line, in_list, in_list_append)
html += result['line']
in_list = result['in_list']
in_list_append = result['in_list_append']
if in_list:
html += '</ul>'
return html
def wrap(line, tag):
return '<{tag}>{line}</{tag}>'.format(line=line, tag=tag)
def check_headers(line):
pattern = '# (.*)'
for index in range(6):
if re.match(pattern, line):
return wrap(line[(index + 2):], 'h' + str(index + 1))
pattern = '#' + pattern
return line
def check_bold(line):
bold_pattern = '(.*)__(.*)__(.*)'
bold_match = re.match(bold_pattern, line)
if bold_match:
return bold_match.group(1) + wrap(bold_match.group(2), 'strong')\
+ bold_match.group(3)
else:
return None
def check_italic(line):
italic_pattern = '(.*)_(.*)_(.*)'
italic_match = re.match(italic_pattern, line)
if italic_match:
return italic_match.group(1) + wrap(italic_match.group(2), 'em')\
+ italic_match.group(3)
else:
return None
def parse_line(line, in_list, in_list_append):
result = check_headers(line)
list_match = re.match(r'\* (.*)', result)
if (list_match):
if not in_list:
result = '<ul>' + wrap(list_match.group(1), 'li')
in_list = True
else:
result = wrap(list_match.group(1), 'li')
else:
if in_list:
in_list_append = True
in_list = False
if not re.match('<h|<ul|<li', result):
result = wrap(result, 'p')
if list_match is None:
result = re.sub('(.*)(<li>)(.*)(</li>)(.*)',
r'\1\2<p>\3</p>\4\5', result)
while check_bold(result):
result = check_bold(result)
while check_italic(result):
result = check_italic(result)
if in_list_append:
result = '</ul>' + result
in_list_append = False
return {
'line': result,
'in_list': in_list,
'in_list_append': in_list_append
}
```
An alternate example using [regular expressions](https://exercism.io/tracks/python/exercises/markdown/solutions/daf30e5227414a61a00bac391ee2bd79):
```python
import re
def parse(markdown):
s = markdown
s = re.sub(r'__([^\n]+?)__', r'<strong>\1</strong>', s)
s = re.sub(r'_([^\n]+?)_', r'<em>\1</em>', s)
s = re.sub(r'^\* (.*?$)', r'<li>\1</li>', s, flags=re.M)
s = re.sub(r'(<li>.*</li>)', r'<ul>\1</ul>', s, flags=re.S)
for i in range(6, 0, -1):
s = re.sub(r'^{} (.*?$)'.format('#' * i), r'<h{0}>\1</h{0}>'.format(i), s, flags=re.M)
s = re.sub(r'^(?!<[hlu])(.*?$)', r'<p>\1</p>', s, flags=re.M)
s = re.sub(r'\n', '', s)
return s
```
Another alternate example using [Python with Regex](https://exercism.io/tracks/python/exercises/markdown/solutions/a1f1d7b60bfc42818b2c2225fe0f8d7a)
```python
import re
BOLD_RE = re.compile(r"__(.*?)__")
ITALICS_RE = re.compile(r"_(.*?)_")
HEADER_RE = re.compile(r"(#+) (.*)")
LIST_RE = re.compile(r"\* (.*)")
def parse(markdown: str) -> str:
"""
Parse a simple markdown-formatted string to HTML.
"""
result = []
for line in markdown.splitlines():
# expand inline bold tags
line = BOLD_RE.sub(r"<strong>\1</strong>", line)
# expand inline italics tags
line = ITALICS_RE.sub(r"<em>\1</em>", line)
# line may be a header item or a list item
is_header = HEADER_RE.match(line)
is_list = LIST_RE.match(line)
# a header is not itself a paragraph
if is_header:
result.append("<h{0}>{1}</h{0}>".format(len(is_header.group(1)),
is_header.group(2)))
# neither is any part of a list
elif is_list:
# we may be appending to an existing list
if result and result[-1] == "</ul>":
result.pop()
# or starting a new one
else:
result.append("<ul>")
result.extend(["<li>" + is_list.group(1) + "</li>", "</ul>"])
# everything else is a paragraph
else:
result.append("<p>" + line + "</p>")
return "".join(result)
```
## Concepts
- [Refactor][refactor]: Reviewing and rewriting (or re-organizing) code for clarity and efficiency. This exercise requires a re-write of pre-existing code that uses functions to parse passed-in text in markdown.
- [Functions][functions]: Tests for this exercise expect a function named `parse` that can be called to transform the _markdown_ formatted text and return HTML formatted text.
- [Function arguments][function-arguments]: The example solutions use functions that take function arguments to operate on passed in markdown strings.
- [Regular Expressions][regular-expressions]: Both the original code to be refactored for this exercise and the example solution import and use the `re` module for Regular Expressions in python.
- [Importing][importing]: Both the original code to be refactored for the exercise and the example solution use the `import` keyword to import the `re` module in support of Regular Expressions in python.
- [String Splitting][string-splitting]: The example solution uses `str.split()` to break the passed in markdown string into a list of lines broken up by the `\n` character. The alternate Python example solution uses `str.splitlines()` for the same effect across all line end characters.
- [Regular Expressions][regular-expressions]: the `re.match()` function from the `re` module returns a `match` object with any matched values from a specified Regular Expression or pre-compiled Regular Expression. The example uses `re.match()` in multiple places to search for text patterns that need re-formatting or substituting.
- [Regular expressions][regular-expressions]: A Domain Specific Language (DSL) for text processing. Like many other programming languages in use, python supports a quasi-dialect of PCRE (_Perl compatible regular expressions_). `Regular expressions` can be used via the core python `re` module, or the third-party `regex` module. Both the original code to be refactored for this exercise and the example solutions use the core `re` module to access `regular expressions` functionality.
- [Return value][return-value]: Most of the functions in the example solution specify a _return_ value using the `return` keyword.
- [None][none]: Pythons null type, referred to when a null or "placeholder" is needed. It is in and of itself a singleton in any given python program.
- [Booleans][booleans]: True and False of type `bool`. The example solution uses `True` and `False` as return values from functions that test membership in a list of values.
- [Assignment][assignment]: The example solution uses assignment for variables and other values.
- [Regular Expressions][regular-expression]: the `re.sub()` function of the `re` module that replaces a `regular expression` match with a new value. The example solutions use this function in various places to substitute _markdown_ syntax for _HTML_ syntax in the passed in markdown text.
- [Dictionaries][dictionaries]: Mapping type. The example solution employs a dictionary to return values from the `parse_line()` function.
- [For loops][for-loops]: The example solution uses `for` loops to iterate over various function inputs.
- [Iteration][iterable]: The example solution uses the `for _ in _` syntax to iterate over a list of lines. This is possible because a list is an `iterable`.
- [Conditionals][conditionals]: The example solution uses `if` to check for pattern matching and membership conditions in different functions for processing different markdown patterns.
- [Regular Expressions][regular-expressions]: Various functions in the re module return a `re.Match` _instance_ which in turn has a `Match.group` method. `Match.group` exists even if there are no groups specified in the pattern. See the [Match.group docs](https://docs.python.org/3/library/re.html#re.Match.group) for more detail.
- [Lists][lists]: The example uses lists in several places to hold text to be processed or searched - or for tracking the state of pieces of the passed-in text.
- [Range][range]: the `range()` built-in represents an immutable sequence of numbers (or any object that implements the **index** magic method). Used in the example to control the number of loops while iterating through a passed-in line or list.