About CorpusCraft
Our Mission
CorpusCraft is dedicated to democratizing corpus linguistics research by providing powerful, accessible analysis tools that work directly in your browser. We believe that sophisticated linguistic analysis shouldn't require expensive software installations, technical expertise, or institutional resources.
Built for Researchers
CorpusCraft serves the global linguistics community, including:
Academic Researchers
Studying language patterns, discourse analysis, and linguistic variation
Graduate Students
Conducting corpus-based research for theses and dissertations
Digital Humanists
Analyzing historical texts and cultural patterns
Language Professionals
Teachers, translators, and lexicographers preparing materials
What Makes CorpusCraft Unique
Browser-Based Platform
No installation required. Access your research from anywhere, on any device. Your data stays secure and private.
Professional Statistical Tools
Real corpus linguistics measures: collocation statistics (MI, t-score, Dice, Log-likelihood), lexical diversity (TTR, MTLD, HD-D), readability indices, and keyness analysis.
Hybrid AI Strategy
Smart use of GPT-4o-mini for efficient bulk operations and GPT-5.1 for advanced analysis. Cost-effective AI that delivers real insights.
Comprehensive Multi-Language NLP
Advanced linguistic analysis for 8 languages (English, Spanish, Russian, French, German, Chinese, Japanese, and Arabic) including complete token analysis, sentence segmentation, noun chunks, morphological features (gender, case, number, tense), lemma frequencies, stopword filtering, POS tagging, dependency parsing, and named entity recognition with professional export formats.
Export Everything
All analysis results exportable to PDF, Word, Excel, and CSV. Your research, your formats.
Technology & Architecture
CorpusCraft is built on modern, robust technologies chosen for performance, reliability, and research needs:
- FastAPI (Python) - High-performance backend for rapid analysis
- SQLite with FTS5 - Lightning-fast full-text search optimized for linguistic queries
- spaCy - State-of-the-art NLP processing with trained language models
- OpenAI GPT Models - Advanced AI capabilities for semantic analysis
- HTMX & Tailwind CSS - Responsive, modern interface without bloat
Privacy & Security: Your corpus data is stored securely and never shared. We use industry-standard encryption and authentication. You maintain full ownership of your research data.
Fair, Transparent Pricing
Our token-based pricing model ensures sustainability while keeping costs fair and predictable:
Free Tier
Perfect for learning and small projects. Always free, no credit card required.
Pay for What You Use
Token-based limits ensure you only pay for the features you actually need.
Our profit margins (72-92%) allow us to continuously improve CorpusCraft, add new features, and provide reliable support to the research community. We're committed to long-term sustainability over short-term gains.
Core Features
Search & Analysis
- • Full-text search with operators
- • KWIC concordance
- • Regex pattern matching
- • Frequency analysis
- • N-gram extraction
- • Collocation statistics
Statistical Analysis
- • Readability indices (6 formulas)
- • Lexical diversity (TTR, MTLD, HD-D)
- • Keyness analysis
- • Effect size calculations
- • Normalized frequencies
AI & NLP
- • 18 AI-powered features
- • Theme discovery
- • Sentiment analysis
- • Semantic similarity
- • POS tagging
- • Named entity recognition
Looking Forward
CorpusCraft is continuously evolving based on researcher feedback. Planned features include:
- Additional language support (French, German, Chinese)
- Advanced concordance filtering and annotation tools
- Corpus comparison and diachronic analysis enhancements
- API access for programmatic corpus analysis
- Integration with popular citation managers
Get in Touch
We value feedback from the research community and are here to help you succeed.
Documentation: Complete User Guide
Support: support@corpuscraft.org
Feature Requests: feedback@corpuscraft.org
We typically respond to support requests within 24 hours during business days.
About the Developer
CorpusCraft is developed and maintained by Yaroslav Mar, a translator and MA student in Fundamental and Applied Linguistics at HSE University.
As a linguist and developer, Yaroslav understands the research challenges that corpus linguistics scholars face and is committed to building tools that make linguistic analysis more accessible and powerful.
Built with dedication for the global linguistics research community.
CorpusCraft © 2025 - Empowering Language Research