Basic Factuality (Fundamental Concepts)
At the foundational level, we prioritize data quality and early-stage fact-checking.
Dataset Curation & Filtering
We only use high-quality datasets from credible and verifiable sources like Wikipedia, peer-reviewed journals, and official government documents.
Content moderation tools help us eliminate biased, outdated, or unreliable data before training even begins.
Knowledge Grounding in Pretraining
Pretraining on structured datasets like Wikidata or filtered Common Crawl ensures clean, meaningful input.
Rule-based validations keep out speculative or opinion-heavy content.
Explicit Fact Checking via External Databases
LLM outputs are checked against trusted databases like:
- Wikipedia
- Knowledge Graphs (Google, DBpedia)
- Retrieval-Augmented Generation (RAG) pipelines
Heuristic-Based Factuality Checking
Simple NLP heuristics flag contradictions and inconsistencies.
Confidence scores help identify responses that might need human review.