About CorpusCraft

Our Mission

CorpusCraft is dedicated to democratizing corpus linguistics research by providing powerful, accessible analysis tools that work directly in your browser. We believe that sophisticated linguistic analysis shouldn't require expensive software installations, technical expertise, or institutional resources.

Built for Researchers

CorpusCraft serves the global linguistics community, including:

Academic Researchers

Studying language patterns, discourse analysis, and linguistic variation

Graduate Students

Conducting corpus-based research for theses and dissertations

Digital Humanists

Analyzing historical texts and cultural patterns

Language Professionals

Teachers, translators, and lexicographers preparing materials

What Makes CorpusCraft Unique

Browser-Based Platform

No installation required. Access your research from anywhere, on any device. Your data stays secure and private.

Professional Statistical Tools

Real corpus linguistics measures: collocation statistics (MI, t-score, Dice, Log-likelihood), lexical diversity (TTR, MTLD, HD-D), readability indices, and keyness analysis.

Hybrid AI Strategy

Smart use of GPT-4o-mini for efficient bulk operations and GPT-5.1 for advanced analysis. Cost-effective AI that delivers real insights.

Comprehensive Multi-Language NLP

Advanced linguistic analysis for 8 languages (English, Spanish, Russian, French, German, Chinese, Japanese, and Arabic) including complete token analysis, sentence segmentation, noun chunks, morphological features (gender, case, number, tense), lemma frequencies, stopword filtering, POS tagging, dependency parsing, and named entity recognition with professional export formats.

Export Everything

All analysis results exportable to PDF, Word, Excel, and CSV. Your research, your formats.

Technology & Architecture

CorpusCraft is built on modern, robust technologies chosen for performance, reliability, and research needs:

  • FastAPI (Python) - High-performance backend for rapid analysis
  • SQLite with FTS5 - Lightning-fast full-text search optimized for linguistic queries
  • spaCy - State-of-the-art NLP processing with trained language models
  • OpenAI GPT Models - Advanced AI capabilities for semantic analysis
  • HTMX & Tailwind CSS - Responsive, modern interface without bloat

Privacy & Security: Your corpus data is stored securely and never shared. We use industry-standard encryption and authentication. You maintain full ownership of your research data.

Fair, Transparent Pricing

Our token-based pricing model ensures sustainability while keeping costs fair and predictable:

Free Tier

Perfect for learning and small projects. Always free, no credit card required.

Pay for What You Use

Token-based limits ensure you only pay for the features you actually need.

Our profit margins (72-92%) allow us to continuously improve CorpusCraft, add new features, and provide reliable support to the research community. We're committed to long-term sustainability over short-term gains.

Core Features

Search & Analysis

  • • Full-text search with operators
  • • KWIC concordance
  • • Regex pattern matching
  • • Frequency analysis
  • • N-gram extraction
  • • Collocation statistics

Statistical Analysis

  • • Readability indices (6 formulas)
  • • Lexical diversity (TTR, MTLD, HD-D)
  • • Keyness analysis
  • • Effect size calculations
  • • Normalized frequencies

AI & NLP

  • • 18 AI-powered features
  • • Theme discovery
  • • Sentiment analysis
  • • Semantic similarity
  • • POS tagging
  • • Named entity recognition

Looking Forward

CorpusCraft is continuously evolving based on researcher feedback. Planned features include:

  • Additional language support (French, German, Chinese)
  • Advanced concordance filtering and annotation tools
  • Corpus comparison and diachronic analysis enhancements
  • API access for programmatic corpus analysis
  • Integration with popular citation managers

Get in Touch

We value feedback from the research community and are here to help you succeed.

Documentation: Complete User Guide

Support: support@corpuscraft.org

Feature Requests: feedback@corpuscraft.org

We typically respond to support requests within 24 hours during business days.

About the Developer

CorpusCraft is developed and maintained by Yaroslav Mar, a translator and MA student in Fundamental and Applied Linguistics at HSE University.

As a linguist and developer, Yaroslav understands the research challenges that corpus linguistics scholars face and is committed to building tools that make linguistic analysis more accessible and powerful.

Built with dedication for the global linguistics research community.

CorpusCraft © 2025 - Empowering Language Research