Add tech_docs/llm/transformer_llm.md
This commit is contained in:
85
tech_docs/llm/transformer_llm.md
Normal file
85
tech_docs/llm/transformer_llm.md
Normal file
@@ -0,0 +1,85 @@
|
|||||||
|
When working with Transformers in libraries like Hugging Face's `transformers`, dictionaries and indices play crucial roles in handling data efficiently. Here's a concise explanation with rich context:
|
||||||
|
|
||||||
|
### Dictionaries in Transformers
|
||||||
|
|
||||||
|
1. **Tokenization:**
|
||||||
|
- Tokenizers convert text into tokens, outputting a dictionary with keys such as `input_ids`, `attention_mask`, and `token_type_ids`.
|
||||||
|
- Example:
|
||||||
|
```python
|
||||||
|
from transformers import AutoTokenizer
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
||||||
|
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
|
||||||
|
print(inputs)
|
||||||
|
# {'input_ids': tensor([[ 101, 7592, 1010, 2129, 2024, 2017, 102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1]])}
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Model Outputs:**
|
||||||
|
- Model outputs are often dictionaries containing elements like `logits`, `hidden_states`, and `attentions`.
|
||||||
|
- Example:
|
||||||
|
```python
|
||||||
|
from transformers import AutoModelForSequenceClassification
|
||||||
|
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
|
||||||
|
outputs = model(**inputs)
|
||||||
|
print(outputs)
|
||||||
|
# SequenceClassifierOutput(loss=None, logits=tensor([[0.2438, -0.1436]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Training and Configuration:**
|
||||||
|
- Training arguments and configurations are managed through dictionaries.
|
||||||
|
- Example:
|
||||||
|
```python
|
||||||
|
from transformers import Trainer, TrainingArguments
|
||||||
|
training_args = TrainingArguments(
|
||||||
|
output_dir='./results', num_train_epochs=3, per_device_train_batch_size=16,
|
||||||
|
per_device_eval_batch_size=16, warmup_steps=500, weight_decay=0.01, logging_dir='./logs'
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Indices in Transformers
|
||||||
|
|
||||||
|
1. **Token Indices:**
|
||||||
|
- Each token is assigned a unique index. The tokenizer maps tokens to these indices.
|
||||||
|
- Example:
|
||||||
|
```python
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
||||||
|
tokens = tokenizer.tokenize("Hello, how are you?")
|
||||||
|
token_indices = tokenizer.convert_tokens_to_ids(tokens)
|
||||||
|
print(token_indices)
|
||||||
|
# [7592, 1010, 2129, 2024, 2017, 102]
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Positional Encoding:**
|
||||||
|
- Positional indices are used to maintain the order of tokens in the sequence, crucial for the Transformer's attention mechanism.
|
||||||
|
|
||||||
|
### Practical Example
|
||||||
|
|
||||||
|
Combining dictionaries and indices in a text classification task:
|
||||||
|
|
||||||
|
1. **Tokenization:**
|
||||||
|
```python
|
||||||
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
||||||
|
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
|
||||||
|
text = "Transformers are powerful models for NLP tasks."
|
||||||
|
inputs = tokenizer(text, return_tensors="pt")
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Model Inference:**
|
||||||
|
```python
|
||||||
|
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased")
|
||||||
|
outputs = model(**inputs)
|
||||||
|
logits = outputs.logits
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Post-Processing:**
|
||||||
|
```python
|
||||||
|
import torch
|
||||||
|
predicted_class = torch.argmax(logits, dim=1).item()
|
||||||
|
print(f"Predicted class: {predicted_class}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Summary
|
||||||
|
|
||||||
|
- **Dictionaries**: Used for managing complex data (e.g., tokenized inputs, model outputs, configurations).
|
||||||
|
- **Indices**: Used to represent tokens and positions, enabling efficient encoding and decoding.
|
||||||
|
|
||||||
|
Together, they facilitate the efficient processing and manipulation of text data in Transformer models.
|
||||||
Reference in New Issue
Block a user