Skip to content

Commit

Permalink
Remove debug data normalization for span analysis (#13203)
Browse files Browse the repository at this point in the history
* Remove debug data normalization for span analysis

As a result of this normalization, `debug data` could show a user tokens
that do not exist in their data.

* Update spacy/cli/debug_data.py

---------

Co-authored-by: svlandeg <[email protected]>
  • Loading branch information
adrianeboyd and svlandeg committed Feb 6, 2024
1 parent 1052cba commit afb22ad
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions spacy/cli/debug_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -1073,8 +1073,7 @@ def _get_distribution(docs, normalize: bool = True) -> Counter:
word_counts: Counter = Counter()
for doc in docs:
for token in doc:
# Normalize the text
t = token.text.lower().replace("``", '"').replace("''", '"')
t = token.text.lower()
word_counts[t] += 1
if normalize:
total = sum(word_counts.values(), 0.0)
Expand Down

0 comments on commit afb22ad

Please sign in to comment.