SynthoHive Documentation¶

SynthoHive is a production-grade synthetic data engine that generates high-fidelity, privacy-preserving synthetic data for complex relational databases. It maintains referential integrity across multi-table schemas, preserves statistical correlations, and provides automated PII handling.

What you'll find¶

Getting Started: Install SynthoHive and run your first synthesis in minutes.
Concepts: Understand the architecture and data flow behind the engine.
Guides: Step-by-step instructions for fitting models, relational generation, embeddings, sampling, privacy, and validation.
Demos: Runnable walk-throughs mirroring the examples/demos folder.
API Reference: Auto-generated documentation for the interface, core models, privacy, relational, validation, and connectors modules.
Config Examples: Copy-paste configurations for common scenarios.
Troubleshooting: Solutions for common issues.

Quick install¶

pip install synthohive pyspark pandas pyarrow

Build docs locally¶

pip install .[docs]
mkdocs serve

Deploy to GitHub Pages:

mkdocs gh-deploy --force