SynthoHive Documentation¶
SynthoHive is a production-grade synthetic data engine that generates high-fidelity, privacy-preserving synthetic data for complex relational databases. It maintains referential integrity across multi-table schemas, preserves statistical correlations, and provides automated PII handling.
What you'll find¶
- Getting Started: Install SynthoHive and run your first synthesis in minutes.
- Concepts: Understand the architecture and data flow behind the engine.
- Guides: Step-by-step instructions for fitting models, relational generation, embeddings, sampling, privacy, and validation.
- Demos: Runnable walk-throughs mirroring the
examples/demosfolder. - API Reference: Auto-generated documentation for the interface, core models, privacy, relational, validation, and connectors modules.
- Config Examples: Copy-paste configurations for common scenarios.
- Troubleshooting: Solutions for common issues.
Quick install¶
pip install synthohive pyspark pandas pyarrow
Build docs locally¶
pip install .[docs]
mkdocs serve
Deploy to GitHub Pages:
mkdocs gh-deploy --force