structure updates
This commit is contained in:
100
tech_docs/python/NLTK.md
Normal file
100
tech_docs/python/NLTK.md
Normal file
@@ -0,0 +1,100 @@
|
||||
For handling natural language processing (NLP) tasks in Python, `NLTK` (Natural Language Toolkit) is a highly useful library. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning. Here’s a concise reference guide for common use cases with `NLTK`, formatted in Markdown syntax:
|
||||
|
||||
# `NLTK` Reference Guide
|
||||
|
||||
## Installation
|
||||
```
|
||||
pip install nltk
|
||||
```
|
||||
After installing, you may need to download specific data packages used in your project:
|
||||
```python
|
||||
import nltk
|
||||
nltk.download('popular') # Downloads popular packages
|
||||
```
|
||||
|
||||
## Basic NLP Tasks
|
||||
|
||||
### Tokenization
|
||||
```python
|
||||
from nltk.tokenize import word_tokenize, sent_tokenize
|
||||
|
||||
text = "Hello there, how are you? Weather is great, and Python is awesome."
|
||||
words = word_tokenize(text)
|
||||
sentences = sent_tokenize(text)
|
||||
```
|
||||
|
||||
### Removing Stopwords
|
||||
```python
|
||||
from nltk.corpus import stopwords
|
||||
|
||||
stop_words = set(stopwords.words('english'))
|
||||
filtered_words = [word for word in words if not word in stop_words]
|
||||
```
|
||||
|
||||
### Stemming
|
||||
```python
|
||||
from nltk.stem import PorterStemmer
|
||||
|
||||
ps = PorterStemmer()
|
||||
stemmed_words = [ps.stem(word) for word in filtered_words]
|
||||
```
|
||||
|
||||
### Part-of-Speech Tagging
|
||||
```python
|
||||
from nltk import pos_tag
|
||||
|
||||
tagged_words = pos_tag(words)
|
||||
```
|
||||
|
||||
### Named Entity Recognition (NER)
|
||||
```python
|
||||
from nltk import ne_chunk
|
||||
|
||||
ner_tree = ne_chunk(tagged_words)
|
||||
```
|
||||
|
||||
### Working with WordNet
|
||||
```python
|
||||
from nltk.corpus import wordnet
|
||||
|
||||
# Find synonyms
|
||||
synonyms = wordnet.synsets("program")
|
||||
|
||||
# Example of usage
|
||||
word = "ship"
|
||||
synsets = wordnet.synsets(word)
|
||||
for syn in synsets:
|
||||
print("Lemma: ", syn.lemmas()[0].name())
|
||||
print("Definition: ", syn.definition())
|
||||
```
|
||||
|
||||
## Advanced NLP Tasks
|
||||
|
||||
### Parsing Sentence Structure
|
||||
```python
|
||||
from nltk import CFG
|
||||
|
||||
grammar = CFG.fromstring("""
|
||||
S -> NP VP
|
||||
VP -> V NP
|
||||
NP -> 'the' N
|
||||
N -> 'cat'
|
||||
V -> 'sat'
|
||||
""")
|
||||
```
|
||||
|
||||
### Frequency Distribution
|
||||
```python
|
||||
from nltk.probability import FreqDist
|
||||
|
||||
fdist = FreqDist(words)
|
||||
most_common_words = fdist.most_common(2)
|
||||
```
|
||||
|
||||
### Sentiment Analysis
|
||||
NLTK can be used for sentiment analysis, but it's more about providing foundational tools. For complex sentiment analysis, integrating NLTK with machine learning libraries like `scikit-learn` is common.
|
||||
|
||||
## Saving and Loading Models
|
||||
NLTK itself doesn't focus on machine learning models in the way libraries like `scikit-learn` or `tensorflow` do. However, it's often used to preprocess text data for machine learning tasks, after which models can be saved and loaded using those libraries' mechanisms.
|
||||
|
||||
`NLTK` is a comprehensive library for building Python programs to work with human language data, offering a wide array of functionalities from simple tokenization to complex parsing and semantic reasoning. This guide introduces the basics, but exploring NLTK’s documentation and tutorials can provide deeper insights into handling various NLP tasks.
|
||||
Reference in New Issue
Block a user