structure updates

This commit is contained in:
2024-05-01 12:28:44 -06:00
parent a689e58eea
commit aeba9bdb34
461 changed files with 0 additions and 0 deletions

100
tech_docs/python/NLTK.md Normal file
View File

@@ -0,0 +1,100 @@
For handling natural language processing (NLP) tasks in Python, `NLTK` (Natural Language Toolkit) is a highly useful library. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning. Heres a concise reference guide for common use cases with `NLTK`, formatted in Markdown syntax:
# `NLTK` Reference Guide
## Installation
```
pip install nltk
```
After installing, you may need to download specific data packages used in your project:
```python
import nltk
nltk.download('popular') # Downloads popular packages
```
## Basic NLP Tasks
### Tokenization
```python
from nltk.tokenize import word_tokenize, sent_tokenize
text = "Hello there, how are you? Weather is great, and Python is awesome."
words = word_tokenize(text)
sentences = sent_tokenize(text)
```
### Removing Stopwords
```python
from nltk.corpus import stopwords
stop_words = set(stopwords.words('english'))
filtered_words = [word for word in words if not word in stop_words]
```
### Stemming
```python
from nltk.stem import PorterStemmer
ps = PorterStemmer()
stemmed_words = [ps.stem(word) for word in filtered_words]
```
### Part-of-Speech Tagging
```python
from nltk import pos_tag
tagged_words = pos_tag(words)
```
### Named Entity Recognition (NER)
```python
from nltk import ne_chunk
ner_tree = ne_chunk(tagged_words)
```
### Working with WordNet
```python
from nltk.corpus import wordnet
# Find synonyms
synonyms = wordnet.synsets("program")
# Example of usage
word = "ship"
synsets = wordnet.synsets(word)
for syn in synsets:
print("Lemma: ", syn.lemmas()[0].name())
print("Definition: ", syn.definition())
```
## Advanced NLP Tasks
### Parsing Sentence Structure
```python
from nltk import CFG
grammar = CFG.fromstring("""
S -> NP VP
VP -> V NP
NP -> 'the' N
N -> 'cat'
V -> 'sat'
""")
```
### Frequency Distribution
```python
from nltk.probability import FreqDist
fdist = FreqDist(words)
most_common_words = fdist.most_common(2)
```
### Sentiment Analysis
NLTK can be used for sentiment analysis, but it's more about providing foundational tools. For complex sentiment analysis, integrating NLTK with machine learning libraries like `scikit-learn` is common.
## Saving and Loading Models
NLTK itself doesn't focus on machine learning models in the way libraries like `scikit-learn` or `tensorflow` do. However, it's often used to preprocess text data for machine learning tasks, after which models can be saved and loaded using those libraries' mechanisms.
`NLTK` is a comprehensive library for building Python programs to work with human language data, offering a wide array of functionalities from simple tokenization to complex parsing and semantic reasoning. This guide introduces the basics, but exploring NLTKs documentation and tutorials can provide deeper insights into handling various NLP tasks.