Update smma/grant_starting.md

2025-07-30 21:56:32 -05:00
parent ab01e981e2
commit 99894a4622
1 changed files with 572 additions and 1 deletions
--- a/smma/grant_starting.md
+++ b/smma/grant_starting.md
@@ -615,3 +615,574 @@ sources:
 ```
 This layer just moves bytes around. Zero business logic. Want me to detail the validation layer next?
 ---
 **Validation Layer:**
 ```python
 class DataValidator:
    def __init__(self, source_type):
        self.source_type = source_type
        self.validation_rules = self.load_rules()
    def validate_batch(self, batch_id):
        """Run all validations on a batch"""
        results = ValidationResults(batch_id)
        # Structure validation
        results.add(self.validate_structure())
        # Content validation  
        results.add(self.validate_content())
        # Business rules validation
        results.add(self.validate_business_rules())
        return results
 class ValidationResults:
    def __init__(self, batch_id):
        self.batch_id = batch_id
        self.errors = []
        self.warnings = []
        self.stats = {}
        self.is_valid = True
 ```
 **Validation Types:**
 **1. Structure Validation**
 ```python
 def validate_xml_structure(self, xml_data):
    # Schema validation against XSD
    # Required elements present
    # Data types correct
    pass
 def validate_csv_structure(self, csv_data):
    # Expected columns present
    # Header row format
    # Row count reasonable
    pass
 ```
 **2. Content Validation**
 ```python
 def validate_content_quality(self, records):
    # Null/empty critical fields
    # Date formats and ranges
    # Numeric field sanity checks
    # Text encoding issues
    pass
 ```
 **3. Business Rules Validation**
 ```python
 def validate_business_rules(self, records):
    # Deadline dates in future
    # Award amounts reasonable ranges
    # Agency codes exist in lookup tables
    # CFDA numbers valid format
    pass
 ```
 **Validation Schema:**
 ```sql
 validation_results (
    id, batch_id, validation_type, status,
    error_count, warning_count, record_count,
    validation_details JSONB, created_at
 )
 validation_errors (
    id, batch_id, record_id, error_type,
    error_message, field_name, field_value,
    severity, created_at
 )
 ```
 **Configurable Rules:**
 ```yaml
 grants_gov_rules:
  required_fields: [title, agency, deadline, amount]
  date_fields:
    deadline: 
      min_future_days: 1
      max_future_days: 730
  amount_fields:
    min_value: 1000
    max_value: 50000000
 usa_spending_rules:
  # Different rules per source
 ```
 **Validation Actions:**
 - **PASS**: Process normally
 - **WARN**: Process but flag issues  
 - **FAIL**: Block processing, alert operators
 - **QUARANTINE**: Isolate problematic records
 **Key Features:**
 - **Non-destructive**: Never modifies raw data
 - **Auditable**: Track what failed and why
 - **Configurable**: Rules can change without code changes
 - **Granular**: Per-record and batch-level validation
 The validator just says "good/bad/ugly" - doesn't fix anything. That's the normalizer's job.
 ---
 **Normalization Layer:**
 ```python
 class DataNormalizer:
    def __init__(self, source_type):
        self.source_type = source_type
        self.field_mappings = self.load_field_mappings()
        self.transformations = self.load_transformations()
    def normalize_batch(self, batch_id):
        """Convert raw validated data to standard schema"""
        raw_records = self.get_validated_records(batch_id)
        normalized_records = []
        for record in raw_records:
            try:
                normalized = self.normalize_record(record)
                normalized_records.append(normalized)
            except Exception as e:
                self.log_normalization_error(record.id, e)
        return self.store_normalized_records(normalized_records)
 class RecordNormalizer:
    def normalize_record(self, raw_record):
        """Transform single record to standard format"""
        normalized = {}
        # Field mapping
        for std_field, raw_field in self.field_mappings.items():
            normalized[std_field] = self.extract_field(raw_record, raw_field)
        # Data transformations
        normalized = self.apply_transformations(normalized)
        # Generate derived fields
        normalized = self.add_derived_fields(normalized)
        return normalized
 ```
 **Field Mapping Configs:**
 ```yaml
 grants_gov_mappings:
  title: "OpportunityTitle"
  agency: "AgencyName" 
  deadline: "CloseDate"
  amount: "AwardCeiling"
  description: "Description"
  cfda_number: "CFDANumbers"
 usa_spending_mappings:
  recipient_name: "recipient_name"
  award_amount: "federal_action_obligation"
  agency: "awarding_agency_name"
  award_date: "action_date"
 ```
 **Data Transformations:**
 ```python
 class FieldTransformers:
    @staticmethod
    def normalize_agency_name(raw_agency):
        # "DEPT OF HEALTH AND HUMAN SERVICES" → "HHS"
        # Handle common variations, abbreviations
        pass
    @staticmethod  
    def parse_amount(raw_amount):
        # Handle "$1,000,000", "1000000.00", "1M", etc.
        # Return standardized decimal
        pass
    @staticmethod
    def parse_date(raw_date):
        # Handle multiple date formats
        # Return ISO format
        pass
    @staticmethod
    def extract_naics_codes(description_text):
        # Parse NAICS codes from text
        # Return list of codes
        pass
 ```
 **Standard Schema (Target):**
 ```sql
 normalized_opportunities (
    id, source, source_id, title, agency_code, 
    agency_name, amount_min, amount_max, deadline,
    description, opportunity_type, cfda_number,
    naics_codes, set_asides, geographic_scope,
    created_at, updated_at, batch_id
 )
 normalized_awards (
    id, source, source_id, recipient_name, 
    recipient_type, award_amount, award_date,
    agency_code, agency_name, award_type,
    description, naics_code, place_of_performance,
    created_at, batch_id
 )
 ```
 **Normalization Tracking:**
 ```sql
 normalization_results (
    id, batch_id, source_records, normalized_records,
    error_records, transformation_stats JSONB,
    processing_time, created_at
 )
 normalization_errors (
    id, batch_id, source_record_id, error_type,
    error_message, field_name, raw_value, 
    created_at
 )
 ```
 **Key Features:**
 - **Lossy but Reversible**: Can always trace back to raw data
 - **Configurable**: Field mappings via config files
 - **Extensible**: Easy to add new transformations
 - **Consistent**: Same output schema regardless of source
 - **Auditable**: Track what transformations were applied
 **Error Handling:**
 - **Best Effort**: Extract what's possible, flag what fails
 - **Partial Records**: Save normalized fields even if some fail
 - **Recovery**: Can re-run normalization with updated rules
 ---
 **Enrichment Engine Interface:**
 ```python
 class EnrichmentEngine:
    def __init__(self):
        self.processors = self.load_processors()
        self.dependency_graph = self.build_dependency_graph()
    def enrich_batch(self, batch_id, processor_names=None):
        """Run enrichment processors on normalized batch"""
        processors = processor_names or self.get_enabled_processors()
        execution_order = self.resolve_dependencies(processors)
        results = EnrichmentResults(batch_id)
        for processor_name in execution_order:
            processor = self.processors[processor_name]
            try:
                result = processor.process_batch(batch_id)
                results.add_processor_result(processor_name, result)
            except Exception as e:
                results.add_error(processor_name, e)
        return results
 class BaseEnrichmentProcessor:
    """Abstract base for all enrichment processors"""
    name = None
    depends_on = []  # Other processors this depends on
    output_tables = []  # What tables this writes to
    def process_batch(self, batch_id):
        """Process a batch of normalized records"""
        records = self.get_normalized_records(batch_id)
        enriched_data = []
        for record in records:
            enriched = self.process_record(record)
            if enriched:
                enriched_data.append(enriched)
        return self.store_enriched_data(enriched_data)
    def process_record(self, record):
        """Override this - core enrichment logic"""
        raise NotImplementedError
 ```
 **Sample Enrichment Processors:**
 ```python
 class DeadlineUrgencyProcessor(BaseEnrichmentProcessor):
    name = "deadline_urgency"
    output_tables = ["opportunity_metrics"]
    def process_record(self, opportunity):
        if not opportunity.deadline:
            return None
        days_remaining = (opportunity.deadline - datetime.now()).days
        urgency_score = self.calculate_urgency_score(days_remaining)
        return {
            'opportunity_id': opportunity.id,
            'days_to_deadline': days_remaining,
            'urgency_score': urgency_score,
            'urgency_category': self.categorize_urgency(days_remaining)
        }
 class AgencySpendingPatternsProcessor(BaseEnrichmentProcessor):
    name = "agency_patterns"
    depends_on = ["historical_awards"]  # Needs historical data first
    output_tables = ["agency_metrics"]
    def process_record(self, opportunity):
        agency_history = self.get_agency_history(opportunity.agency_code)
        return {
            'agency_code': opportunity.agency_code,
            'avg_award_amount': agency_history.avg_amount,
            'typical_award_timeline': agency_history.avg_timeline,
            'funding_seasonality': agency_history.seasonal_patterns,
            'competition_level': agency_history.avg_applicants
        }
 class CompetitiveIntelProcessor(BaseEnrichmentProcessor):
    name = "competitive_intel"
    depends_on = ["agency_patterns", "historical_awards"]
    output_tables = ["opportunity_competition"]
    def process_record(self, opportunity):
        similar_opps = self.find_similar_opportunities(opportunity)
        winner_patterns = self.analyze_winner_patterns(similar_opps)
        return {
            'opportunity_id': opportunity.id,
            'estimated_applicants': winner_patterns.avg_applicants,
            'win_rate_by_org_type': winner_patterns.win_rates,
            'typical_winner_profile': winner_patterns.winner_characteristics,
            'competition_score': self.calculate_competition_score(winner_patterns)
        }
 ```
 **Enrichment Storage Schema:**
 ```sql
 -- Opportunity-level enrichments
 opportunity_metrics (
    opportunity_id, days_to_deadline, urgency_score,
    competition_score, success_probability, 
    created_at, processor_version
 )
 -- Agency-level enrichments  
 agency_metrics (
    agency_code, avg_award_amount, funding_cycles,
    payment_reliability, bureaucracy_score,
    created_at, processor_version
 )
 -- Historical patterns
 recipient_patterns (
    recipient_id, win_rate, specialties,
    avg_award_size, geographic_focus,
    created_at, processor_version
 )
 ```
 **Configuration-Driven Processing:**
 ```yaml
 enrichment_config:
  enabled_processors:
    - deadline_urgency
    - agency_patterns
    - competitive_intel
  processor_settings:
    deadline_urgency:
      urgency_thresholds: [7, 30, 90]
    competitive_intel:
      similarity_threshold: 0.8
      lookback_years: 3
 ```
 **Key Features:**
 - **Modular**: Each processor is independent
 - **Dependency-Aware**: Processors run in correct order
 - **Versioned**: Track which version of logic created what data
 - **Configurable**: Enable/disable processors per client
 - **Reprocessable**: Can re-run enrichments with new logic
 - **Incremental**: Only process new/changed records
 **Processor Registry:**
 ```python
 class ProcessorRegistry:
    processors = {}
    @classmethod
    def register(cls, processor_class):
        cls.processors[processor_class.name] = processor_class
    @classmethod
    def get_processor(cls, name):
        return cls.processors[name]()
 # Auto-discovery of processors
@ProcessorRegistry.register
 class MyCustomProcessor(BaseEnrichmentProcessor):
    # Implementation
    pass
 ```
 This interface lets you plug in any enrichment logic without touching the core pipeline. Want to see how the API layer consumes all this enriched data?
 ---
 **Core API Endpoints:**
 ## **Opportunity Discovery APIs**
 ```
 GET /api/v1/opportunities
  - Live grant/contract opportunities
  - Filters: keywords, agency, amount_range, deadline_range, location, naics, cfda
  - Sort: deadline, amount, relevance_score, competition_score
  - Pagination: limit, offset
  - Response: opportunities + enrichment data
 GET /api/v1/opportunities/{id}
  - Full opportunity details + all enrichments
  - Related opportunities (similar/agency/category)
  - Historical context (agency patterns, similar awards)
 GET /api/v1/opportunities/search
  - Full-text search across titles/descriptions
  - Semantic search capabilities
  - Saved search functionality
 ```
 ## **Historical Intelligence APIs**
 ```
 GET /api/v1/awards
  - Past awards/contracts (USAspending data)
  - Filters: recipient, agency, amount_range, date_range, location
  - Aggregations: by_agency, by_recipient_type, by_naics
 GET /api/v1/awards/trends
  - Spending trends over time
  - Agency funding patterns
  - Market size analysis by category
 GET /api/v1/recipients/{id}/history
  - Complete award history for organization
  - Success patterns, specializations
  - Competitive positioning
 ```
 ## **Market Intelligence APIs**
 ```
 GET /api/v1/agencies
  - Agency profiles with spending patterns
  - Funding cycles, preferences, reliability scores
 GET /api/v1/agencies/{code}/opportunities
  - Current opportunities from specific agency
  - Historical patterns, typical award sizes
 GET /api/v1/market/analysis
  - Market sizing by sector/naics/keyword
  - Competition density analysis
  - Funding landscape overview
 ```
 ## **Enrichment & Scoring APIs**
 ```
 GET /api/v1/opportunities/{id}/score
  - Custom scoring based on client profile
  - Fit score, competition score, success probability
 POST /api/v1/opportunities/batch-score
  - Score multiple opportunities at once
  - Client-specific scoring criteria
 GET /api/v1/competitive-intel
  - Who wins what types of awards
  - Success patterns by organization characteristics
 ```
 ## **Alert & Monitoring APIs**
 ```
 POST /api/v1/alerts
  - Create custom alert criteria
  - Email/webhook delivery options
 GET /api/v1/alerts/{id}/results
  - Recent matches for saved alert
  - Historical performance of alert criteria
 POST /api/v1/watchlist
  - Monitor specific agencies/programs/competitors
 ```
 ## **Analytics & Reporting APIs**
 ```
 GET /api/v1/analytics/dashboard
  - Client-specific dashboard data
  - Opportunity pipeline, success metrics
 GET /api/v1/reports/market-summary
  - Periodic market analysis reports
  - Funding landscape changes
 POST /api/v1/reports/custom
  - Generate custom analysis reports
  - Export capabilities (PDF/Excel)
 ```
 **API Response Format:**
 ```json
 {
  "data": [...],
  "meta": {
    "total": 1250,
    "page": 1,
    "per_page": 50,
    "filters_applied": {...},
    "data_freshness": "2024-01-15T10:30:00Z"
  },
  "enrichments": {
    "competition_scores": true,
    "agency_patterns": true,
    "deadline_urgency": true
  }
 }
 ```
 **Authentication & Rate Limiting:**
 - API key authentication
 - Usage-based pricing tiers
 - Rate limits by subscription level
 - Client-specific data access controls
 **Key Value Props:**
 - **Speed**: Pre-processed, indexed, ready to query
 - **Intelligence**: Enriched beyond raw government data  
 - **Relevance**: Sophisticated filtering and scoring
 - **Insights**: Historical patterns and competitive intelligence
 - **Automation**: Alerts and monitoring capabilities
 This API design gives clients everything from basic opportunity search to sophisticated competitive intelligence - all the value-add layers on top of the raw government data.