姜鹏辉的个人博客 GreyNius

【NLP】使用gemsim模块处理文本

2020-08-28

https://radimrehurek.com/gensim/auto_examples/index.html

Text Summarization

from gensim.summarization import summarize

text = "..."

print(summarize(text))
print(summarize(text, ratio=0.5))  # 占原文的比例,默认是0.2
print(summarize(text, word_count=50)) # 限定摘要的单词数目

获得关键词

from gensim.summarization import keywords
print(keywords(text, ratio=0.01))

FastText

from gensim.models.fasttext import FastText as FT_gensim
from gensim.test.utils import datapath

# Set file names for train and test data
corpus_file = datapath('lee_background.cor')

model = FT_gensim(size=100)

# build the vocabulary
model.build_vocab(corpus_file=corpus_file)

# train the model
model.train(
    corpus_file=corpus_file, epochs=model.epochs,
    total_examples=model.corpus_count, total_words=model.corpus_total_words
)

print(model)

Comments

Content