[Matching Brackets] draft approaches (#3670)

* [Matching Brackets] draft approaches

* [Matching Brackets] Approaches Review & Edits

* Additional grammar and spelling edits

* Final Edits

Hopefully, the final edits.  😄

* Un crossed left vs right

---------

Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>
This commit is contained in:
colinleach
2024-07-31 18:37:03 -07:00
committed by GitHub
parent e06436a8bd
commit bcb5300353
11 changed files with 480 additions and 0 deletions

View File

@@ -0,0 +1,30 @@
{
"introduction": {
"authors": [
"colinleach",
"BethanyG"
]
},
"approaches": [
{
"uuid": "449c828e-ce19-4930-83ab-071eb2821388",
"slug": "stack-match",
"title": "Stack Match",
"blurb": "Maintain context during stream processing by use of a stack.",
"authors": [
"colinleach",
"BethanyG"
]
},
{
"uuid": "b4c42162-751b-42c8-9368-eed9c3f4e4c8",
"slug": "repeated-substitution",
"title": "Repeated Substitution",
"blurb": "Use substring replacement to iteratively simplify the string.",
"authors": [
"colinleach",
"BethanyG"
]
}
]
}

View File

@@ -0,0 +1,78 @@
# Introduction
The aim in this exercise is to determine whether opening and closing brackets are properly paired within the input text.
These brackets may be nested deeply (think Lisp code) and/or dispersed among a lot of other text (think complex LaTeX documents).
Community solutions fall into two main groups:
1. Those which make a single pass or loop through the input string, maintaining necessary context for matching.
2. Those which repeatedly make global substitutions within the text for context.
## Single-pass approaches
```python
def is_paired(input_string):
bracket_map = {"]" : "[", "}": "{", ")":"("}
tracking = []
for element in input_string:
if element in bracket_map.values():
tracking.append(element)
if element in bracket_map:
if not tracking or (tracking.pop() != bracket_map[element]):
return False
return not tracking
```
The key in this approach is to maintain context by pushing open brackets onto some sort of stack (_in this case appending to a `list`_), then checking if there is a corresponding closing bracket to pair with the top stack item.
See [stack-match][stack-match] approaches for details.
## Repeated-substitution approaches
```python
def is_paired(text):
text = "".join(item for item in text if item in "()[]{}")
while "()" in text or "[]" in text or "{}" in text:
text = text.replace("()","").replace("[]", "").replace("{}","")
return not text
```
In this approach, we first remove any non-bracket characters, then use a loop to repeatedly remove inner bracket pairs.
See [repeated-substitution][repeated-substitution] approaches for details.
## Other approaches
Languages prizing immutibility are likely to use techniques such as `foldl()` or recursive matching, as discussed on the [Scala track][scala].
This is possible in Python, but can read as unidiomatic and will (likely) result in inefficient code if not done carefully.
For anyone wanting to go down the functional-style path, Python has [`functools.reduce()`][reduce] for folds and added [structural pattern matching][pattern-matching] in Python 3.10.
Recursion is not highly optimised in Python and there is no tail call optimization, but the default stack depth of 1000 should be more than enough for solving this problem recursively.
## Which approach to use
For short, well-defined input strings such as those currently in the test file, repeated-substitution allows a passing solution in very few lines of code.
But as input grows, this method could become less and less performant, due to the multiple passes and changes needed to determine matches.
The single-pass strategy of the stack-match approach allows for stream processing, scales linearly (_`O(n)` time complexity_) with text length, and will remain performant for very large inputs.
Examining the community solutions published for this exercise, it is clear that many programmers prefer the stack-match method which avoids the repeated string copying of the substitution approach.
Thus it is interesting and perhaps humbling to note that repeated-substitution is **_at least_** as fast in benchmarking, even with large (>30 kB) input strings!
See the [performance article][article-performance] for more details.
[article-performance]:https://exercism.org/tracks/python/exercises/matching-brackets/articles/performance
[pattern-matching]: https://docs.python.org/3/whatsnew/3.10.html#pep-634-structural-pattern-matching
[reduce]: https://docs.python.org/3/library/functools.html#functools.reduce
[repeated-substitution]: https://exercism.org/tracks/python/exercises/matching-brackets/approaches/repeated-substitution
[scala]: https://exercism.org/tracks/scala/exercises/matching-brackets/dig_deeper
[stack-match]: https://exercism.org/tracks/python/exercises/matching-brackets/approaches/stack-match

View File

@@ -0,0 +1,67 @@
# Repeated Substitution
```python
def is_paired(text):
text = "".join([element for element in text if element in "()[]{}"])
while "()" in text or "[]" in text or "{}" in text:
text = text.replace("()","").replace("[]", "").replace("{}","")
return not text
```
In this approach, the steps are:
1. Remove all non-bracket characters from the input string (_as done through the filter clause in the list-comprehension above_).
2. Iteratively remove all remaining bracket pairs: this reduces nesting in the string from the inside outwards.
3. Test for a now empty string, meaning all brackets have been paired.
The code above spells out the approach particularly clearly, but there are (of course) several possible variants.
## Variation 1: Walrus Operator within a Generator Expression
```python
def is_paired(input_string):
symbols = "".join(char for char in input_string if char in "{}[]()")
while (pair := next((pair for pair in ("{}", "[]", "()") if pair in symbols), False)):
symbols = symbols.replace(pair, "")
return not symbols
```
The second solution above does essentially the same thing as the initial approach, but uses a generator expression assigned with a [walrus operator][walrus] `:=` (_introduced in Python 3.8_) in the `while-loop` test.
## Variation 2: Regex Substitution in a While Loop
Regex enthusiasts can modify the previous approach, using `re.sub()` instead of `string.replace()` in the `while-loop` test:
```python
import re
def is_paired(text: str) -> bool:
text = re.sub(r'[^{}\[\]()]', '', text)
while text != (text := re.sub(r'{\}|\[]|\(\)', '', text)):
continue
return not bool(text)
```
## Variation 3: Regex Substitution and Recursion
It is possible to combine `re.sub()` and recursion in the same solution, though not everyone would view this as idiomatic Python:
```python
import re
def is_paired(input_string):
replaced = re.sub(r"[^\[\(\{\}\)\]]|\{\}|\(\)|\[\]", "", input_string)
return not input_string if input_string == replaced else is_paired(replaced)
```
Note that solutions using regular expressions ran slightly *slower* than `string.replace()` solutions in benchmarking, so adding this type of complexity brings no benefit to this problem.
[walrus]: https://martinheinz.dev/blog/79/

View File

@@ -0,0 +1,5 @@
def is_paired(text):
text = "".join(element for element in text if element in "()[]{}")
while "()" in text or "[]" in text or "{}" in text:
text = text.replace("()","").replace("[]", "").replace("{}","")
return not text

View File

@@ -0,0 +1,50 @@
# Stack Match
```python
def is_paired(input_string):
bracket_map = {"]" : "[", "}": "{", ")":"("}
stack = []
for element in input_string:
if element in bracket_map.values():
stack.append(element)
if element in bracket_map:
if not stack or (stack.pop() != bracket_map[element]):
return False
return not stack
```
The point of this approach is to maintain a context of which bracket sets are currently "open":
- If a left bracket is found, push it onto the stack (_append it to the `list`_).
- If a right bracket is found, **and** it pairs with the last item placed on the stack, pop the bracket off the stack and continue.
- If there is a mismatch, for example `'['` with `'}'` or there is no left bracket on the stack, the code can immediately terminate and return `False`.
- When all the input text is processed, determine if the stack is empty, meaning all left brackets were matched.
In Python, a [`list`][concept:python/lists]() is a good implementation of a stack: it has [`list.append()`][list-append] (_equivalent to a "push"_) and [`lsit.pop()`][list-pop] methods built in.
Some solutions use [`collections.deque()`][collections-deque] as an alternative implementation, though this has no clear advantage (_since the code only uses appends to the right-hand side_) and near-identical runtime performance.
The default iteration for a dictionary is over the _keys_, so the code above uses a plain `bracket_map` to search for right brackets, while `bracket_map.values()` is used to search for left brackets.
Other solutions created two sets of left and right brackets explicitly, or searched a string representation:
```python
if element in ']})':
```
Such changes made little difference to code length or readability, but ran about 5-fold faster than the dictionary-based solution.
At the end, success is an empty stack, tested above by using the [False-y quality][falsey] of `[]` (_as Python programmers often do_).
To be more explicit, we could alternatively use an equality:
```python
return stack == []
```
[list-append]: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists
[list-pop]: https://docs.python.org/3/tutorial/datastructures.html#more-on-lists
[collections-deque]: https://docs.python.org/3/library/collections.html#collections.deque
[falsey]: https://docs.python.org/3/library/stdtypes.html#truth-value-testing

View File

@@ -0,0 +1,8 @@
bracket_map = {"]" : "[", "}": "{", ")":"("}
stack = []
for element in input_string:
if element in bracket_map.values(): tracking.append(element)
if element in bracket_map:
if not stack or (stack.pop() != bracket_map[element]):
return False
return not stack

View File

@@ -0,0 +1,14 @@
{
"articles": [
{
"uuid": "af7a43b5-c135-4809-9fb8-d84cdd5138d5",
"slug": "performance",
"title": "Performance",
"blurb": "Compare a variety of solutions using benchmarking data.",
"authors": [
"colinleach",
"BethanyG"
]
}
]
}

View File

@@ -0,0 +1,184 @@
import timeit
import pandas as pd
import numpy as np
import requests
# ------------ FUNCTIONS TO TIME ------------- #
def stack_match1(input_string):
bracket_map = {"]" : "[", "}": "{", ")":"("}
tracking = []
for element in input_string:
if element in bracket_map.values():
tracking.append(element)
if element in bracket_map:
if not tracking or (tracking.pop() != bracket_map[element]):
return False
return not tracking
def stack_match2(input_string):
opening = {'[', '{', '('}
closing = {']', '}', ')'}
pairs = {('[', ']'), ('{', '}'), ('(', ')')}
stack = list()
for char in input_string:
if char in opening:
stack.append(char)
elif char in closing:
if not stack or (stack.pop(), char) not in pairs:
return False
return stack == []
def stack_match3(input_string):
BRACKETS = {'(': ')', '[': ']', '{': '}'}
END_BRACKETS = {')', ']', '}'}
stack = []
def is_valid(char):
return stack and stack.pop() == char
for char in input_string:
if char in BRACKETS:
stack.append(BRACKETS[char])
elif char in END_BRACKETS and not is_valid(char):
return False
return not stack
def stack_match4(input_string):
stack = []
r = {')': '(', ']': '[', '}': '{'}
for c in input_string:
if c in '[{(':
stack.append(c)
if c in ']})':
if not stack:
return False
if stack[-1] == r[c]:
stack.pop()
else:
return False
return not stack
from collections import deque
from typing import Deque
def stack_match5(text: str) -> bool:
"""
Determine if the given text properly closes any opened brackets.
"""
PUSH = {"[": "]", "{": "}", "(": ")"}
PULL = set(PUSH.values())
stack: Deque[str] = deque()
for char in text:
if char in PUSH:
stack.append(PUSH[char])
elif char in PULL:
if not stack or char != stack.pop():
return False
return not stack
def repeated_substitution1(text):
text = "".join(x for x in text if x in "()[]{}")
while "()" in text or "[]" in text or "{}" in text:
text = text.replace("()","").replace("[]", "").replace("{}","")
return not text
def repeated_substitution2(input_string):
symbols = "".join(c for c in input_string if c in "{}[]()")
while (pair := next((pair for pair in ("{}", "[]", "()") if pair in symbols), False)):
symbols = symbols.replace(pair, "")
return not symbols
import re
def repeated_substitution3(str_: str) -> bool:
str_ = re.sub(r'[^{}\[\]()]', '', str_)
while str_ != (str_ := re.sub(r'{\}|\[]|\(\)', '', str_)):
pass
return not bool(str_)
def repeated_substitution4(input_string):
replaced = re.sub(r"[^\[\(\{\}\)\]]|\{\}|\(\)|\[\]", "", input_string)
return not input_string if input_string == replaced else repeated_substitution4(replaced)
## ---------END FUNCTIONS TO BE TIMED-------------------- ##
## -------- Timing Code Starts Here ---------------------##
def get_file(url):
resp = requests.get(url)
return resp.text
short = "\\left(\\begin{array}{cc} \\frac{1}{3} & x\\\\ \\mathrm{e}^{x} &... x^2 \\end{array}\\right)"
mars_moons = get_file("https://raw.githubusercontent.com/colinleach/PTYS516/main/term_paper/term_paper.tex")
galaxy_cnn = get_file("https://raw.githubusercontent.com/colinleach/proj502/main/project_report/report.tex")
# Input Data Setup
inputs = [short, mars_moons, galaxy_cnn]
# Ensure the code doesn't terminate early with a mismatch
assert all([stack_match1(txt) for txt in inputs])
# #Set up columns and rows for Pandas Data Frame
col_headers = ['short', 'mars_moons', 'galaxy_cnn']
row_headers = [
"stack_match1",
"stack_match2",
"stack_match3",
"stack_match4",
"stack_match5",
"repeated_substitution1",
"repeated_substitution2",
"repeated_substitution3",
"repeated_substitution4"
]
# Empty dataframe will be filled in one cell at a time later
df = pd.DataFrame(np.nan, index=row_headers, columns=col_headers)
# Function List to Call When Timing
functions = [stack_match1, stack_match2, stack_match3, stack_match4, stack_match5,
repeated_substitution1, repeated_substitution2, repeated_substitution3, repeated_substitution4]
# Run timings using timeit.autorange(). Run Each Set 3 Times.
for function, title in zip(functions, row_headers):
timings = [[
timeit.Timer(lambda: function(data), globals=globals()).autorange()[1] /
timeit.Timer(lambda: function(data), globals=globals()).autorange()[0]
for data in inputs] for rounds in range(3)]
# Only the fastest Cycle counts.
timing_result = min(timings)
print(f'{title}', f'Timings : {timing_result}')
# Insert results into the dataframe
df.loc[title, col_headers[0]:col_headers[-1]] = timing_result
# Save the data to avoid constantly regenerating it
df.to_feather('run_times.feather')
print("\nDataframe saved to './run_times.feather'")
# The next bit is useful for `introduction.md`
pd.options.display.float_format = '{:,.2e}'.format
print('\nDataframe in Markdown format:\n')
print(df.to_markdown(floatfmt=".2e"))

View File

@@ -0,0 +1,41 @@
# Performance
All functions were tested on three inputs, a short string from the exercise tests plus two scientific papers in $\LaTeX$ format.
Python reported these string lengths:
```
short: 84
mars_moons: 34836
galaxy_cnn: 31468
```
A total of 9 community solutions were tested: 5 variants of stack-match and 4 of repeated-substitution.
Full details are in the [benchmark code][benchmark-code], including URLs for the downloaded papers.
Results are summarized in the table below, with all times in seconds:
| | short | mars_moons | galaxy_cnn |
|:-----------------------|:--------:|:------------:|:------------:|
| stack_match4 | 1.77e-06 | 5.92e-04 | 5.18e-04 |
| stack_match2 | 1.71e-06 | 7.38e-04 | 6.64e-04 |
| stack_match3 | 1.79e-06 | 7.72e-04 | 6.95e-04 |
| stack_match5 | 1.70e-06 | 7.79e-04 | 6.97e-04 |
| stack_match1 | 5.64e-06 | 21.9e-04 | 39.7e-04 |
| repeated_substitution1 | 1.20e-06 | 3.50e-04 | 3.06e-04 |
| repeated_substitution2 | 1.86e-06 | 3.58e-04 | 3.15e-04 |
| repeated_substitution3 | 4.27e-06 | 14.0e-04 | 12.5e-04 |
| repeated_substitution4 | 4.96e-06 | 14.9e-04 | 13.5e-04 |
Overall, most of these solutions had fairly similar performance, and runtime scaled similarly with input length.
There is certainly no evidence for either class of solutions being systematically better than the other.
The slowest was `stack_match1`, which did a lot of lookups in dictionary.
keys and values. Searching instead in sets or strings gave a small but perhaps useful improvement.
Among the repeated-substitution solutions, the first two used standard Python string operations, running slightly faster than the second two which use regular expressions.
[benchmark-code]: https://github.com/exercism/python/blob/main/exercises/practice/matching-brackets/.articles/performance/code/Benchmark.py

View File

@@ -0,0 +1,3 @@
# Performance
Compare a variety of solutions using benchmarking data.