[Sieve]: Draft approaches (#3626)
* [Sieve]: Draft approaches * fixes various typos and random gibberish * Update introduction.md * Update exercises/practice/sieve/.approaches/comprehensions/content.md Co-authored-by: BethanyG <BethanyG@users.noreply.github.com> * Update exercises/practice/sieve/.approaches/comprehensions/content.md Co-authored-by: BethanyG <BethanyG@users.noreply.github.com> * Update exercises/practice/sieve/.approaches/comprehensions/content.md Co-authored-by: BethanyG <BethanyG@users.noreply.github.com> * Update exercises/practice/sieve/.approaches/comprehensions/content.md Co-authored-by: BethanyG <BethanyG@users.noreply.github.com> * Update exercises/practice/sieve/.approaches/nested-loops/content.md Co-authored-by: BethanyG <BethanyG@users.noreply.github.com> * Update exercises/practice/sieve/.approaches/comprehensions/content.md Co-authored-by: BethanyG <BethanyG@users.noreply.github.com> * Update exercises/practice/sieve/.approaches/comprehensions/snippet.txt Co-authored-by: BethanyG <BethanyG@users.noreply.github.com> * Update exercises/practice/sieve/.approaches/introduction.md Co-authored-by: BethanyG <BethanyG@users.noreply.github.com> * Update exercises/practice/sieve/.approaches/nested-loops/content.md Co-authored-by: BethanyG <BethanyG@users.noreply.github.com> * Update exercises/practice/sieve/.approaches/nested-loops/snippet.txt Co-authored-by: BethanyG <BethanyG@users.noreply.github.com> * Update exercises/practice/sieve/.approaches/comprehensions/content.md Does this add a spurious extra space after the link? Co-authored-by: BethanyG <BethanyG@users.noreply.github.com> * Removed graph from content.md To save us forgetting it later. * Delete timeit_bar_plot.svg I didn't intend to commit this in the first place. * removed space from content.md * Update exercises/practice/sieve/.approaches/nested-loops/content.md * Update exercises/practice/sieve/.approaches/nested-loops/content.md * Update exercises/practice/sieve/.approaches/introduction.md * Update exercises/practice/sieve/.approaches/introduction.md * Update exercises/practice/sieve/.approaches/introduction.md * Code Block Corrections Somehow, the closing of the codeblocks got dropped. Added them back in, along with final typo corrections. --------- Co-authored-by: BethanyG <BethanyG@users.noreply.github.com>
This commit is contained in:
@@ -0,0 +1,36 @@
|
|||||||
|
# Comprehensions
|
||||||
|
|
||||||
|
```python
|
||||||
|
def primes(number):
|
||||||
|
prime = (item for item in range(2, number+1)
|
||||||
|
if item not in (not_prime for item in range(2, number+1)
|
||||||
|
for not_prime in range(item*item, number+1, item)))
|
||||||
|
return list(prime)
|
||||||
|
```
|
||||||
|
|
||||||
|
Many of the solutions to Sieve use `comprehensions` or `generator-expressions` at some point, but this page is about examples that put almost *everything* into a single, elaborate `generator-expression` or `comprehension`.
|
||||||
|
|
||||||
|
The above example uses a `generator-expression` to do all the calculation.
|
||||||
|
|
||||||
|
There are at least two problems with this:
|
||||||
|
- Readability is poor.
|
||||||
|
- Performance is exceptionally bad, making this the slowest solution tested, for all input sizes.
|
||||||
|
|
||||||
|
Notice the many `for` clauses in the generator.
|
||||||
|
|
||||||
|
This makes the code similar to [nested loops][nested-loops], and run time scales quadratically with the size of `number`.
|
||||||
|
In fact, when this code is compiled, it _compiles to nested loops_ that have the additional overhead of generator setup and tracking.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def primes(limit):
|
||||||
|
return [number for number in range(2, limit + 1)
|
||||||
|
if all(number % divisor != 0 for divisor in range(2, number))]
|
||||||
|
```
|
||||||
|
|
||||||
|
This second example using a `list-comprehension` with `all()` is certainly concise and _relatively_ readable, but the performance is again quite poor.
|
||||||
|
|
||||||
|
This is not quite a fully nested loop (_there is a short-circuit when `all()` evaluates to `False`_), but it is by no means "performant".
|
||||||
|
In this case, scaling with input size is intermediate between linear and quadratic, so not quite as bad as the first example.
|
||||||
|
|
||||||
|
|
||||||
|
[nested-loops]: https://exercism.org/tracks/python/exercises/sieve/approaches/nested-loops
|
||||||
@@ -0,0 +1,3 @@
|
|||||||
|
def primes(limit):
|
||||||
|
return [number for number in range(2, limit + 1) if
|
||||||
|
all(number % divisor != 0 for divisor in range(2, number))]
|
||||||
40
exercises/practice/sieve/.approaches/config.json
Normal file
40
exercises/practice/sieve/.approaches/config.json
Normal file
@@ -0,0 +1,40 @@
|
|||||||
|
{
|
||||||
|
"introduction": {
|
||||||
|
"authors": [
|
||||||
|
"colinleach",
|
||||||
|
"BethanyG"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"approaches": [
|
||||||
|
{
|
||||||
|
"uuid": "85752386-a3e0-4ba5-aca7-22f5909c8cb1",
|
||||||
|
"slug": "nested-loops",
|
||||||
|
"title": "Nested Loops",
|
||||||
|
"blurb": "Relativevly clear solutions with explicit loops.",
|
||||||
|
"authors": [
|
||||||
|
"colinleach",
|
||||||
|
"BethanyG"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"uuid": "04701848-31bf-4799-8093-5d3542372a2d",
|
||||||
|
"slug": "set-operations",
|
||||||
|
"title": "Set Operations",
|
||||||
|
"blurb": "Performance enhancements with Python sets.",
|
||||||
|
"authors": [
|
||||||
|
"colinleach",
|
||||||
|
"BethanyG"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"uuid": "183c47e3-79b4-4afb-8dc4-0deaf094ce5b",
|
||||||
|
"slug": "comprehensions",
|
||||||
|
"title": "Comprehensions",
|
||||||
|
"blurb": "Ultra-concise code and its downsides.",
|
||||||
|
"authors": [
|
||||||
|
"colinleach",
|
||||||
|
"BethanyG"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
99
exercises/practice/sieve/.approaches/introduction.md
Normal file
99
exercises/practice/sieve/.approaches/introduction.md
Normal file
@@ -0,0 +1,99 @@
|
|||||||
|
# Introduction
|
||||||
|
|
||||||
|
The key to this exercise is to keep track of:
|
||||||
|
- A list of numbers.
|
||||||
|
- Their status of possibly being prime.
|
||||||
|
|
||||||
|
|
||||||
|
## General Guidance
|
||||||
|
|
||||||
|
To solve this exercise, it is necessary to choose one or more appropriate data structures to store numbers and status, then decide the best way to scan through them.
|
||||||
|
|
||||||
|
There are many ways to implement the code, and the three broad approaches listed below are not sharply separated.
|
||||||
|
|
||||||
|
|
||||||
|
## Approach: Using nested loops
|
||||||
|
|
||||||
|
```python
|
||||||
|
def primes(number):
|
||||||
|
not_prime = []
|
||||||
|
prime = []
|
||||||
|
|
||||||
|
for item in range(2, number+1):
|
||||||
|
if item not in not_prime:
|
||||||
|
prime.append(item)
|
||||||
|
for element in range(item*item, number+1, item):
|
||||||
|
not_prime.append(element)
|
||||||
|
|
||||||
|
return prime
|
||||||
|
```
|
||||||
|
|
||||||
|
The theme here is nested, explicit `for` loops to move through ranges, testing validity as we go.
|
||||||
|
|
||||||
|
For details and another example see [`nested-loops`][approaches-nested].
|
||||||
|
|
||||||
|
|
||||||
|
## Approach: Using set operations
|
||||||
|
|
||||||
|
```python
|
||||||
|
def primes(number):
|
||||||
|
not_prime = set()
|
||||||
|
primes = []
|
||||||
|
|
||||||
|
for num in range(2, number+1):
|
||||||
|
if num not in not_prime:
|
||||||
|
primes.append(num)
|
||||||
|
not_prime.update(range (num*num, number+1, num))
|
||||||
|
|
||||||
|
return primes
|
||||||
|
```
|
||||||
|
|
||||||
|
In this group, the code uses the special features of the Python [`set`][sets] to improve efficiency.
|
||||||
|
|
||||||
|
For details and other examples see [`set-operations`][approaches-sets].
|
||||||
|
|
||||||
|
|
||||||
|
## Approach: Using complex or nested comprehensions
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
def primes(limit):
|
||||||
|
return [number for number in range(2, limit + 1) if
|
||||||
|
all(number % divisor != 0 for divisor in range(2, number))]
|
||||||
|
```
|
||||||
|
|
||||||
|
Here, the emphasis is on implementing a solution in the minimum number of lines, even at the expense of readability or performance.
|
||||||
|
|
||||||
|
For details and another example see [`comprehensions`][approaches-comps].
|
||||||
|
|
||||||
|
|
||||||
|
## Using packages outside base Python
|
||||||
|
|
||||||
|
|
||||||
|
In statically typed languages, common approaches include bit arrays and arrays of booleans.
|
||||||
|
|
||||||
|
Neither of these is a natural fit for core Python, but there are external packages that could perhaps provide a better implementation:
|
||||||
|
|
||||||
|
- For bit arrays, there is the [`bitarray`][bitarray] package and [`bitstring.BitArray()`][bitstring].
|
||||||
|
- For arrays of booleans, we could use the NumPy package: `np.ones((number,), dtype=np.bool_)` will create a pre-dimensioned array of `True`.
|
||||||
|
|
||||||
|
It should be stressed that these will not work in the Exercism test runner, and are mentioned here only for completeness.
|
||||||
|
|
||||||
|
## Which Approach to Use?
|
||||||
|
|
||||||
|
|
||||||
|
This exercise is for learning, and is not directly relevant to production code.
|
||||||
|
|
||||||
|
The point is to find a solution which is correct, readable, and remains reasonably fast for larger input values.
|
||||||
|
|
||||||
|
The "set operations" example above is clean, readable, and in benchmarking was the fastest code tested.
|
||||||
|
|
||||||
|
Further details of performance testing are given in the [Performance article][article-performance].
|
||||||
|
|
||||||
|
[approaches-nested]: https://exercism.org/tracks/python/exercises/sieve/approaches/nested-loops
|
||||||
|
[approaches-sets]: https://exercism.org/tracks/python/exercises/sieve/approaches/set-operations
|
||||||
|
[approaches-comps]: https://exercism.org/tracks/python/exercises/sieve/approaches/comprehensions
|
||||||
|
[article-performance]:https://exercism.org/tracks/python/exercises/sieve/articles/performance
|
||||||
|
[sets]: https://docs.python.org/3/library/stdtypes.html#set-types-set-frozenset
|
||||||
|
[bitarray]: https://pypi.org/project/bitarray/
|
||||||
|
[bitstring]: https://bitstring.readthedocs.io/en/latest/
|
||||||
49
exercises/practice/sieve/.approaches/nested-loops/content.md
Normal file
49
exercises/practice/sieve/.approaches/nested-loops/content.md
Normal file
@@ -0,0 +1,49 @@
|
|||||||
|
# Nested Loops
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
def primes(number):
|
||||||
|
not_prime = []
|
||||||
|
prime = []
|
||||||
|
|
||||||
|
for item in range(2, number+1):
|
||||||
|
if item not in not_prime:
|
||||||
|
prime.append(item)
|
||||||
|
for element in range (item*item, number+1, item):
|
||||||
|
not_prime.append(element)
|
||||||
|
|
||||||
|
return prime
|
||||||
|
```
|
||||||
|
|
||||||
|
This is the type of code that many people might write as a first attempt.
|
||||||
|
|
||||||
|
It is very readable and passes the tests.
|
||||||
|
|
||||||
|
The clear disadvantage is that run time is quadratic in the input size: `O(n**2)`, so this approach scales poorly to large input values.
|
||||||
|
|
||||||
|
Part of the problem is the line `if item not in not_prime`, where `not-prime` is a list that may be long and unsorted.
|
||||||
|
|
||||||
|
This operation requires searching the entire list, so run time is linear in list length: not ideal within a loop repeated many times.
|
||||||
|
|
||||||
|
```python
|
||||||
|
def primes(number):
|
||||||
|
number += 1
|
||||||
|
prime = [True for item in range(number)]
|
||||||
|
for index in range(2, number):
|
||||||
|
if not prime[index]:
|
||||||
|
continue
|
||||||
|
for candidate in range(2 * index, number, index):
|
||||||
|
prime[candidate] = False
|
||||||
|
return [index for index, value in enumerate(prime) if index > 1 and value]
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
At first sight, this second example looks quite similar to the first.
|
||||||
|
|
||||||
|
However, on testing it performs much better, scaling linearly with `number` rather than quadratically.
|
||||||
|
|
||||||
|
A key difference is that list entries are tested by index: `if not prime[index]`.
|
||||||
|
|
||||||
|
This is a constant-time operation independent of the list length.
|
||||||
|
|
||||||
|
Relatively few programmers would have predicted such a major difference just by looking at the code, so if performance matters we should always test, not guess.
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
def primes(number):
|
||||||
|
number += 1
|
||||||
|
prime = [True for item in range(number)]
|
||||||
|
for index in range(2, number):
|
||||||
|
if not prime[index]: continue
|
||||||
|
for candidate in range(2 * index, number, index):
|
||||||
|
prime[candidate] = False
|
||||||
|
return [index for index, value in enumerate(prime) if index > 1 and value]
|
||||||
@@ -0,0 +1,69 @@
|
|||||||
|
# Set Operations
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
def primes(number):
|
||||||
|
not_prime = set()
|
||||||
|
primes = []
|
||||||
|
|
||||||
|
for num in range(2, number+1):
|
||||||
|
if num not in not_prime:
|
||||||
|
primes.append(num)
|
||||||
|
not_prime.update(range(num*num, number+1, num))
|
||||||
|
|
||||||
|
return primes
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
This is the fastest method so far tested, at all input sizes.
|
||||||
|
|
||||||
|
With only a single loop, performance scales linearly: `O(n)`.
|
||||||
|
|
||||||
|
A key step is the set `update()`.
|
||||||
|
|
||||||
|
Less commonly seen than `add()`, which takes single element, `update()` takes any iterator of hashable values as its parameter and efficiently adds all the elements in a single operation.
|
||||||
|
|
||||||
|
In this case, the iterator is a range resolving to all multiples, up to the limit, of the prime we just found.
|
||||||
|
|
||||||
|
Primes are collected in a list, in ascending order, so there is no need for a separate sort operation at the end.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
def primes(number):
|
||||||
|
numbers = set(item for item in range(2, number+1))
|
||||||
|
|
||||||
|
not_prime = set(not_prime for item in range(2, number+1)
|
||||||
|
for not_prime in range(item**2, number+1, item))
|
||||||
|
|
||||||
|
return sorted(list((numbers - not_prime)))
|
||||||
|
```
|
||||||
|
|
||||||
|
After a set comprehension in place of an explicit loop, the second example uses set-subtraction as a key feature in the return statement.
|
||||||
|
|
||||||
|
The resulting set needs to be converted to a list then sorted, which adds some overhead, [scaling as O(n *log* n)][sort-performance].
|
||||||
|
|
||||||
|
In performance testing, this code is about 4x slower than the first example, but still scales as `O(n)`.
|
||||||
|
|
||||||
|
|
||||||
|
```python
|
||||||
|
def primes(number: int) -> list[int]:
|
||||||
|
start = set(range(2, number + 1))
|
||||||
|
return sorted(start - {m for n in start for m in range(2 * n, number + 1, n)})
|
||||||
|
```
|
||||||
|
|
||||||
|
The third example is quite similar to the second, just moving the comprehension into the return statement.
|
||||||
|
|
||||||
|
Performance is very similar between examples 2 and 3 at all input values.
|
||||||
|
|
||||||
|
|
||||||
|
## Sets: strengths and weaknesses
|
||||||
|
|
||||||
|
Sets offer two main benefits which can be useful in this exercise:
|
||||||
|
- Entries are guaranteed to be unique.
|
||||||
|
- Determining whether the set contains a given value is a fast, constant-time operation.
|
||||||
|
|
||||||
|
Less positively:
|
||||||
|
- The exercise specification requires a list to be returned, which may involve a conversion.
|
||||||
|
- Sets have no guaranteed ordering, so two of the above examples incur the time penalty of sorting a list at the end.
|
||||||
|
|
||||||
|
[sort-performance]: https://en.wikipedia.org/wiki/Timsort
|
||||||
@@ -0,0 +1,8 @@
|
|||||||
|
def primes(number):
|
||||||
|
not_prime = set()
|
||||||
|
primes = []
|
||||||
|
for num in range(2, number+1):
|
||||||
|
if num not in not_prime:
|
||||||
|
primes.append(num)
|
||||||
|
not_prime.update(range(num*num, number+1, num))
|
||||||
|
return primes
|
||||||
14
exercises/practice/sieve/.articles/config.json
Normal file
14
exercises/practice/sieve/.articles/config.json
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
{
|
||||||
|
"articles": [
|
||||||
|
{
|
||||||
|
"slug": "performance",
|
||||||
|
"uuid": "fdbee56a-b4db-4776-8aab-3f7788c612aa",
|
||||||
|
"title": "Performance deep dive",
|
||||||
|
"authors": [
|
||||||
|
"BethanyG",
|
||||||
|
"colinleach"
|
||||||
|
],
|
||||||
|
"blurb": "Results and analysis of timing tests for the various approaches."
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
126
exercises/practice/sieve/.articles/performance/code/Benchmark.py
Normal file
126
exercises/practice/sieve/.articles/performance/code/Benchmark.py
Normal file
@@ -0,0 +1,126 @@
|
|||||||
|
import timeit
|
||||||
|
|
||||||
|
import pandas as pd
|
||||||
|
import numpy as np
|
||||||
|
|
||||||
|
|
||||||
|
# ------------ FUNCTIONS TO TIME ------------- #
|
||||||
|
|
||||||
|
def nested_loops_1(number):
|
||||||
|
not_prime = []
|
||||||
|
prime = []
|
||||||
|
|
||||||
|
for item in range(2, number + 1):
|
||||||
|
if item not in not_prime:
|
||||||
|
prime.append(item)
|
||||||
|
for element in range(item * item, number + 1, item):
|
||||||
|
not_prime.append(element)
|
||||||
|
|
||||||
|
return prime
|
||||||
|
|
||||||
|
|
||||||
|
def nested_loops_2(limit):
|
||||||
|
limit += 1
|
||||||
|
l = [True for _ in range(limit)]
|
||||||
|
for i in range(2, limit):
|
||||||
|
if not l[i]:
|
||||||
|
continue
|
||||||
|
for j in range(2 * i, limit, i):
|
||||||
|
l[j] = False
|
||||||
|
return [i for i, v in enumerate(l) if i > 1 and v]
|
||||||
|
|
||||||
|
|
||||||
|
def set_ops_1(number):
|
||||||
|
numbers = set(item for item in range(2, number + 1))
|
||||||
|
|
||||||
|
not_prime = set(not_prime for item in range(2, number + 1)
|
||||||
|
for not_prime in range(item ** 2, number + 1, item))
|
||||||
|
|
||||||
|
# sorting adds .2ms, but the tests won't pass with an unsorted list
|
||||||
|
return sorted(list((numbers - not_prime)))
|
||||||
|
|
||||||
|
|
||||||
|
def set_ops_2(number):
|
||||||
|
# fastest
|
||||||
|
not_prime = set()
|
||||||
|
primes = []
|
||||||
|
|
||||||
|
for num in range(2, number + 1):
|
||||||
|
if num not in not_prime:
|
||||||
|
primes.append(num)
|
||||||
|
not_prime.update(range(num * num, number + 1, num))
|
||||||
|
|
||||||
|
return primes
|
||||||
|
|
||||||
|
|
||||||
|
def set_ops_3(limit: int) -> list[int]:
|
||||||
|
start = set(range(2, limit + 1))
|
||||||
|
return sorted(start - {m for n in start for m in range(2 * n, limit + 1, n)})
|
||||||
|
|
||||||
|
|
||||||
|
def generator_comprehension(number):
|
||||||
|
# slowest
|
||||||
|
primes = (item for item in range(2, number + 1) if item not in
|
||||||
|
(not_prime for item in range(2, number + 1) for
|
||||||
|
not_prime in range(item * item, number + 1, item)))
|
||||||
|
return list(primes)
|
||||||
|
|
||||||
|
|
||||||
|
def list_comprehension(limit):
|
||||||
|
return [x for x in range(2, limit + 1)
|
||||||
|
if all(x % y != 0 for y in range(2, x))] if limit >= 2 else []
|
||||||
|
|
||||||
|
|
||||||
|
## ---------END FUNCTIONS TO BE TIMED-------------------- ##
|
||||||
|
|
||||||
|
## -------- Timing Code Starts Here ---------------------##
|
||||||
|
|
||||||
|
|
||||||
|
# Input Data Setup
|
||||||
|
inputs = [10, 30, 100, 300, 1_000, 3_000, 10_000, 30_000, 100_000]
|
||||||
|
|
||||||
|
# #Set up columns and rows for Pandas Data Frame
|
||||||
|
col_headers = [f'Number: {number}' for number in inputs]
|
||||||
|
row_headers = ["nested_loops_1",
|
||||||
|
"nested_loops_2",
|
||||||
|
"set_ops_1",
|
||||||
|
"set_ops_2",
|
||||||
|
"set_ops_3",
|
||||||
|
"generator_comprehension",
|
||||||
|
"list_comprehension"]
|
||||||
|
|
||||||
|
# Empty dataframe will be filled in one cell at a time later
|
||||||
|
df = pd.DataFrame(np.nan, index=row_headers, columns=col_headers)
|
||||||
|
|
||||||
|
# Function List to Call When Timing
|
||||||
|
functions = [nested_loops_1,
|
||||||
|
nested_loops_2,
|
||||||
|
set_ops_1,
|
||||||
|
set_ops_2,
|
||||||
|
set_ops_3,
|
||||||
|
generator_comprehension,
|
||||||
|
list_comprehension]
|
||||||
|
|
||||||
|
# Run timings using timeit.autorange(). Run Each Set 3 Times.
|
||||||
|
for function, title in zip(functions, row_headers):
|
||||||
|
timings = [[
|
||||||
|
timeit.Timer(lambda: function(data), globals=globals()).autorange()[1] /
|
||||||
|
timeit.Timer(lambda: function(data), globals=globals()).autorange()[0]
|
||||||
|
for data in inputs] for rounds in range(3)]
|
||||||
|
|
||||||
|
# Only the fastest Cycle counts.
|
||||||
|
timing_result = min(timings)
|
||||||
|
|
||||||
|
print(f'{title}', f'Timings : {timing_result}')
|
||||||
|
|
||||||
|
# Insert results into the dataframe
|
||||||
|
df.loc[title, 'Number: 10':'Number: 100000'] = timing_result
|
||||||
|
|
||||||
|
# Save the data to avoid constantly regenerating it
|
||||||
|
df.to_feather('run_times.feather')
|
||||||
|
print("\nDataframe saved to './run_times.feather'")
|
||||||
|
#
|
||||||
|
# The next bit is useful for `introduction.md`
|
||||||
|
pd.options.display.float_format = '{:,.2e}'.format
|
||||||
|
print('\nDataframe in Markdown format:\n')
|
||||||
|
print(df.to_markdown(floatfmt=".2e"))
|
||||||
@@ -0,0 +1,43 @@
|
|||||||
|
import matplotlib as mpl
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import pandas as pd
|
||||||
|
|
||||||
|
|
||||||
|
# These dataframes are slow to create, so they should be saved in Feather format
|
||||||
|
|
||||||
|
try:
|
||||||
|
df = pd.read_feather('./run_times.feather')
|
||||||
|
except FileNotFoundError:
|
||||||
|
print("File './run_times.feather' not found!")
|
||||||
|
print("Please run './Benchmark.py' to create it.")
|
||||||
|
exit(1)
|
||||||
|
|
||||||
|
try:
|
||||||
|
transposed = pd.read_feather('./transposed_logs.feather')
|
||||||
|
except FileNotFoundError:
|
||||||
|
print("File './transposed_logs.feather' not found!")
|
||||||
|
print("Please run './Benchmark.py' to create it.")
|
||||||
|
exit(1)
|
||||||
|
|
||||||
|
# Ready to start creating plots
|
||||||
|
|
||||||
|
mpl.rcParams['axes.labelsize'] = 18
|
||||||
|
|
||||||
|
# bar plot of actual run times
|
||||||
|
ax = df.plot.bar(figsize=(10, 7),
|
||||||
|
logy=True,
|
||||||
|
ylabel="time (s)",
|
||||||
|
fontsize=14,
|
||||||
|
width=0.8,
|
||||||
|
rot=-30)
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.savefig('../timeit_bar_plot.svg')
|
||||||
|
|
||||||
|
# log-log plot of times vs n, to see slopes
|
||||||
|
transposed.plot(figsize=(8, 6),
|
||||||
|
marker='.',
|
||||||
|
markersize=10,
|
||||||
|
ylabel="$log_{10}(time)$ (s)",
|
||||||
|
xlabel="$log_{10}(n)$",
|
||||||
|
fontsize=14)
|
||||||
|
plt.savefig('../slopes.svg')
|
||||||
@@ -0,0 +1,56 @@
|
|||||||
|
import pandas as pd
|
||||||
|
import numpy as np
|
||||||
|
from numpy.linalg import lstsq
|
||||||
|
|
||||||
|
|
||||||
|
# These dataframes are slow to create, so they should be saved in Feather format
|
||||||
|
|
||||||
|
try:
|
||||||
|
df = pd.read_feather('./run_times.feather')
|
||||||
|
except FileNotFoundError:
|
||||||
|
print("File './run_times.feather' not found!")
|
||||||
|
print("Please run './Benchmark.py' to create it.")
|
||||||
|
exit(1)
|
||||||
|
|
||||||
|
# To plot and fit the slopes, the df needs to be log10-transformed and transposed
|
||||||
|
|
||||||
|
inputs = [10, 30, 100, 300, 1_000, 3_000, 10_000, 30_000, 100_000]
|
||||||
|
|
||||||
|
pd.options.display.float_format = '{:,.2g}'.format
|
||||||
|
log_n_values = np.log10(inputs)
|
||||||
|
df[df == 0.0] = np.nan
|
||||||
|
transposed = np.log10(df).T
|
||||||
|
transposed = transposed.set_axis(log_n_values, axis=0)
|
||||||
|
transposed.to_feather('transposed_logs.feather')
|
||||||
|
print("\nDataframe saved to './transposed_logs.feather'")
|
||||||
|
|
||||||
|
n_values = (10, 30, 100, 300, 1_000, 3_000, 10_000, 30_000, 100_000)
|
||||||
|
log_n_values = np.log10(n_values)
|
||||||
|
row_headers = ["nested_loops_1",
|
||||||
|
"nested_loops_2",
|
||||||
|
"set_ops_1",
|
||||||
|
"set_ops_2",
|
||||||
|
"set_ops_3",
|
||||||
|
"generator_comprehension",
|
||||||
|
"list_comprehension"]
|
||||||
|
|
||||||
|
|
||||||
|
# Do a least-squares fit to get the slopes, working around missing values
|
||||||
|
# Apparently, it does need to be this complicated
|
||||||
|
|
||||||
|
def find_slope(name):
|
||||||
|
log_times = transposed[name]
|
||||||
|
missing = np.isnan(log_times)
|
||||||
|
log_times = log_times[~missing]
|
||||||
|
valid_entries = len(log_times)
|
||||||
|
A = np.vstack([log_n_values[:valid_entries], np.ones(valid_entries)]).T
|
||||||
|
m, _ = lstsq(A, log_times, rcond=None)[0]
|
||||||
|
return m
|
||||||
|
|
||||||
|
|
||||||
|
# Print the slope results
|
||||||
|
slopes = [(name, find_slope(name)) for name in row_headers]
|
||||||
|
print('\nSlopes of log-log plots:')
|
||||||
|
for name, slope in slopes:
|
||||||
|
print(f'{name:>14} : {slope:.2f}')
|
||||||
|
|
||||||
Binary file not shown.
Binary file not shown.
59
exercises/practice/sieve/.articles/performance/content.md
Normal file
59
exercises/practice/sieve/.articles/performance/content.md
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
# Performance
|
||||||
|
|
||||||
|
The [Approaches page][approaches-page] discusses various ways to approach this exercise, with substantially different performance.
|
||||||
|
|
||||||
|
## Measured timings
|
||||||
|
|
||||||
|
The 7 code implementations described in the various approaches were [benchmarked][benchmark-code], using appropriate values for the upper limit of `n` and number of runs to average over, to keep the total testing time reasonable.
|
||||||
|
|
||||||
|
Numerical results are tabulated below, for 9 values of the upper search limit (chosen to be about equally spaced on a log scale).
|
||||||
|
|
||||||
|
| | 10 | 30 | 100 | 300 | 1000 | 3000 | 10,000 | 30,000 | 100,000 |
|
||||||
|
|:------------------------|---------:|---------:|----------:|----------:|-----------:|-----------:|----------:|------------:|----------:|
|
||||||
|
| nested quadratic | 4.64e-07 | 2.19e-06 | 1.92e-05 | 1.68e-04 | 1.96e-03 | 1.78e-02 | 2.03e-01 | 1.92e+00 | 2.22e+01 |
|
||||||
|
| nested linear | 8.72e-07 | 1.89e-06 | 5.32e-06 | 1.60e-05 | 5.90e-05 | 1.83e-04 | 6.09e-04 | 1.84e-03 | 6.17e-03 |
|
||||||
|
| set with update | 1.30e-06 | 3.07e-06 | 9.47e-06 | 2.96e-05 | 1.18e-04 | 3.92e-04 | 1.47e-03 | 5.15e-03 | 2.26e-02 |
|
||||||
|
| set with sort 1 | 4.97e-07 | 1.23e-06 | 3.25e-06 | 9.57e-06 | 3.72e-05 | 1.19e-04 | 4.15e-04 | 1.38e-03 | 5.17e-03 |
|
||||||
|
| set with sort 2 | 9.60e-07 | 2.61e-06 | 8.76e-06 | 2.92e-05 | 1.28e-04 | 4.46e-04 | 1.77e-03 | 6.29e-03 | 2.79e-02 |
|
||||||
|
| generator comprehension | 4.54e-06 | 2.70e-05 | 2.23e-04 | 1.91e-03 | 2.17e-02 | 2.01e-01 | 2.28e+00 | 2.09e+01 | 2.41e+02 |
|
||||||
|
| list comprehension | 2.23e-06 | 8.94e-06 | 4.36e-05 | 2.35e-04 | 1.86e-03 | 1.42e-02 | 1.39e-01 | 1.11e+00 | 1.10e+01 |
|
||||||
|
|
||||||
|
For the smallest input, all times are fairly close to a microsecond, with about a 10-fold difference between fastest and slowest.
|
||||||
|
|
||||||
|
In contrast, for searches up to 100,000 the timings varied by almost 5 orders of magnitude.
|
||||||
|
|
||||||
|
This is a difference between milliseconds and minutes, which is very hard to ignore.
|
||||||
|
|
||||||
|
## Testing algorithmic complexity
|
||||||
|
|
||||||
|
We have discussed these solutions as `quadratic` or `linear`.
|
||||||
|
Do the experimental data support this?
|
||||||
|
|
||||||
|
For a [power law][power-law] relationship, we have a run time `t` given by `t = a * n**x`, where `a` is a proportionality constant and `x` is the power.
|
||||||
|
|
||||||
|
Taking logs of both sides, `log(t) = x * log(n) + constant.`
|
||||||
|
|
||||||
|
Plots of `log(t)` against `log(n)` will be a straight line with slope equal to the power `x`.
|
||||||
|
|
||||||
|
Graphs of the data (not included here) show that these are all straight lines for larger values of `n`, as we expected.
|
||||||
|
|
||||||
|
Linear least-squares fits to each line gave these slope values:
|
||||||
|
|
||||||
|
| Method | Slope |
|
||||||
|
|:-----------------|:-----:|
|
||||||
|
| nested quadratic | 1.95 |
|
||||||
|
| nested linear | 0.98 |
|
||||||
|
| set with update | 1.07 |
|
||||||
|
| set with sort 1 | 1.02 |
|
||||||
|
| set with sort 2 | 1.13 |
|
||||||
|
| generator comprehension | 1.95 |
|
||||||
|
| list comprehension | 1.69 |
|
||||||
|
|
||||||
|
Clearly, most approaches have a slope of approximately 1 (linear) or 2 (quadratic).
|
||||||
|
|
||||||
|
The `list-comprehension` approach is an oddity, intermediate between these extremes.
|
||||||
|
|
||||||
|
|
||||||
|
[approaches-page]: https://exercism.org/tracks/python/exercises/sieve/approaches
|
||||||
|
[benchmark-code]: https://github.com/exercism/python/blob/main/exercises/practice/sieve/.articles/performance/code/Benchmark.py
|
||||||
|
[power-law]: https://en.wikipedia.org/wiki/Power_law
|
||||||
1555
exercises/practice/sieve/.articles/performance/slopes.svg
Normal file
1555
exercises/practice/sieve/.articles/performance/slopes.svg
Normal file
File diff suppressed because it is too large
Load Diff
|
After Width: | Height: | Size: 47 KiB |
@@ -0,0 +1,8 @@
|
|||||||
|
def primes(number):
|
||||||
|
not_prime = set()
|
||||||
|
primes = []
|
||||||
|
for num in range(2, number+1):
|
||||||
|
if num not in not_prime:
|
||||||
|
primes.append(num)
|
||||||
|
not_prime.update(range (num*num, number+1, num))
|
||||||
|
return primes
|
||||||
Reference in New Issue
Block a user