Files
Python/strings/anagrams.py
Sowndappan S 37b34c2bac perf(strings): optimize anagram signature using frequency counts (#12927)
* fix(strings): use frequency-based signature for anagrams

Replaced the sorting-based signature implementation with a frequency-based
approach using `collections.Counter`. This ensures that the signature
represents both characters and their counts, preventing collisions and
better grouping of true anagrams.

Examples:
- "test" → "e1s1t2"
- "finaltest" → "a1e1f1i1l1n1s1t2"
- "this is a test" → " 3a1e1h1i2s3t3"

Also updated the anagram lookup to use the new frequency-based signatures, making results more accurate and avoiding false positives.

* Refactor anagram function return type to list[str]

* Update anagrams.py

* Update anagrams.py

* Update anagrams.py

* Update anagrams.py

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Maxim Smolskiy <mithridatus@mail.ru>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2025-08-24 15:33:18 +03:00

52 lines
1.3 KiB
Python

from __future__ import annotations
import collections
import pprint
from pathlib import Path
def signature(word: str) -> str:
"""
Return a word's frequency-based signature.
>>> signature("test")
'e1s1t2'
>>> signature("this is a test")
' 3a1e1h1i2s3t3'
>>> signature("finaltest")
'a1e1f1i1l1n1s1t2'
"""
frequencies = collections.Counter(word)
return "".join(
f"{char}{frequency}" for char, frequency in sorted(frequencies.items())
)
def anagram(my_word: str) -> list[str]:
"""
Return every anagram of the given word from the dictionary.
>>> anagram('test')
['sett', 'stet', 'test']
>>> anagram('this is a test')
[]
>>> anagram('final')
['final']
"""
return word_by_signature[signature(my_word)]
data: str = Path(__file__).parent.joinpath("words.txt").read_text(encoding="utf-8")
word_list = sorted({word.strip().lower() for word in data.splitlines()})
word_by_signature = collections.defaultdict(list)
for word in word_list:
word_by_signature[signature(word)].append(word)
if __name__ == "__main__":
all_anagrams = {word: anagram(word) for word in word_list if len(anagram(word)) > 1}
with open("anagrams.txt", "w") as file:
file.write("all_anagrams = \n")
file.write(pprint.pformat(all_anagrams))