Skip to content

Unable to run sumy in Jupyter Notebook #217

@azamsharpschool

Description

@azamsharpschool

I have been trying without success to get sumy to work in Jupyter Notebook. But it is always throwing error for the Tokenizer.

Here is my Jupyter Notebook code:

!python -c "import nltk; nltk.download('stopwords')"

from sumy.parsers.plaintext import PlaintextParser
from sumy.nlp.tokenizers import Tokenizer
from sumy.summarizers.lsa import LsaSummarizer

text = "Your long text here..."
parser = PlaintextParser.from_string(text, Tokenizer("english"))
summarizer = LsaSummarizer()
summary = summarizer(parser.document, 3)  # Summarize to 3 sentences

for sentence in summary:
    print(sentence)

When I run this code I get the following error:


UnpicklingError                           Traceback (most recent call last)
Cell In[22], line 6
      3 from sumy.summarizers.lsa import LsaSummarizer
      5 text = "Your long text here..."
----> 6 parser = PlaintextParser.from_string(text, Tokenizer("english"))
      7 summarizer = LsaSummarizer()
      8 summary = summarizer(parser.document, 3)  # Summarize to 3 sentences

File ~/Desktop/sample_project/env/lib/python3.10/site-packages/sumy/nlp/tokenizers.py:160, in Tokenizer.__init__(self, language)
    157 self._language = language
    159 tokenizer_language = self.LANGUAGE_ALIASES.get(language, language)
--> 160 self._sentence_tokenizer = self._get_sentence_tokenizer(tokenizer_language)
    161 self._word_tokenizer = self._get_word_tokenizer(tokenizer_language)

File ~/Desktop/sample_project/env/lib/python3.10/site-packages/sumy/nlp/tokenizers.py:172, in Tokenizer._get_sentence_tokenizer(self, language)
    170 try:
    171     path = to_string("tokenizers/punkt/%s.pickle") % to_string(language)
--> 172     return nltk.data.load(path)
    173 except (LookupError, zipfile.BadZipfile) as e:
    174     raise LookupError(
    175         "NLTK tokenizers are missing or the language is not supported.\n"
    176         """Download them by following command: python -c "import nltk; nltk.download('punkt')"\n"""
    177         "Original error was:\n" + str(e)
    178     )

What can I do to fix this issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions