Modernizing Our Data Platform: Building a BigQuery Lakehouse

March 4, 2026

Our Director of Data, Yasen Nestorov, walks us through the vision behind our new BigQuery lakehouse.

Why did you decide to move the company to a BigQuery-based data lakehouse architecture?

Yasen: We chose to move because our legacy setup had reached its limits. It required significant manual work, lacked proper governance, and made complex analytics or machine learning difficult to scale. BigQuery gives us a fully managed foundation that removes infrastructure overhead and lets us focus on building data products rather than maintaining systems.

A key driver for us was the ability to decouple storage from compute, allowing each to scale independently. That flexibility is essential as our data volumes grow, and our analytics needs become more dynamic. We can scale processing power when we need heavy computation, without over-provisioning storage, and vice versa.

This transformation also strengthens our ability to build trust in data across executive leadership. With a reliable, well-governed platform, leaders can make decisions with confidence, knowing the data is consistent, timely, and high-quality.

Ultimately, this move supports our long-term strategy around advanced analytics, AI, and automation, where systems can proactively generate insights or propose actions. We wanted a platform that is fast, reliable, easy to govern, and capable of powering everything from operational dashboards to machine learning. BigQuery delivers all of that in a single ecosystem.

For people who may not be familiar with a lakehouse architecture, how would you describe what you’re building?

Yasen: A lakehouse combines the best parts of a data lake and a data warehouse in one unified environment for raw data, processed transformations, analytics, and machine learning. Raw data lands in a central location, then moves through refined layers, creating a platform where data engineers, analysts, and AI engineers work together seamlessly within the same governance framework, with full lineage tracking and version-controlled pipelines. This setup makes data more accessible, reliable, and ready for insights across the business.

What made BigQuery the right platform for this transformation?

Yasen: BigQuery removes most operational burden that slows teams down. There are no servers to manage, no capacity planning, and no complicated scaling policies. It handles massive datasets with ease, provides fast SQL performance, and integrates natively with governance, transformations, and ML tools. This also makes it cost-efficient, since we only pay for what we use, without the overhead of managing infrastructure.

It also reduces dependency on rare expert profiles. Not every organization can easily hire senior data engineers with deep infrastructure knowledge. We wanted a platform where analysts can build dashboards and explore data without needing specialized engineers beside them. BigQuery helps us make the platform accessible, scalable, and future-proof.

Can you walk us through the core components of your architecture?

Yasen: We designed the architecture around simplicity, clarity, and ease of onboarding. Data is extracted from production systems and first lands in a centralized storage layer. From there, it moves into BigQuery as structured or semi-structured raw tables.

Transformations run entirely within BigQuery using GCP-native ELT tooling, which provides automated testing, modular development, environment separation, and clean dependency management. Versioning is maintained through a centralized system that tracks all changes and deployments.

The platform operates across distinct but connected layers: a data lake for raw and lightly processed data, a dedicated testing environment for validating new pipelines and models, a warehouse optimized for analytics and dashboards, and a feature store that supports machine learning workloads. Each layer is designed to flow naturally into the next and to support clear development and operational practices.

Pipelines follow production-grade orchestration processes aligned with our “pipelines as production software” mindset. This ensures reliability, maintainability, and strong observability. A monitoring system tracks execution and triggers alerts for any failures or anomalies, enabling timely visibility and proactive maintenance. The entire architecture follows the principles of the Google Cloud Well-Architected Framework to ensure reliability, performance, security, and operational excellence.

Security and governance are strict: everything is documented, tracked, and controlled so new team members can onboard confidently and safely. And while the current team is relatively smaller, every process and architectural choice is designed to scale efficiently as we grow.

Phase 1 of the project has been completed. What did that include?

Yasen: Phase 1 focused on moving a major part of our business-critical pipelines into the new architecture and establishing the full back-end foundation. We implemented consistent development processes, creating multiple environments with automated deployments, and embedding quality checks throughout the pipelines.

We are not just migrating. We are rebuilding and enriching data models so end-users - analysts and BI teams - can work more efficiently, with fewer joins and faster queries. This sets the stage to shift focus from maintenance to generating real business value.

Phase 1 laid the groundwork for reliable, maintainable, and scalable data operations that can support future growth and more advanced analytics initiatives.

How do you ensure governance, lineage, and data quality across the platform?

Yasen: Governance is built in from the start. We use centralized cataloging and policy enforcement tools to manage data consistently across the platform. Transformations include automated quality checks, and lineage is tracked automatically so anyone can see where a dataset comes from and how it was produced.

Access is controlled through a strict role-based permission model, and documentation is kept thorough and up to date. We also leverage AI assistants to support tasks like writing documentation and assisting with orchestration coding - always following strict security rules and never accessing sensitive data.

Quality assurance is a top priority. Mistakes early on could be very costly down the line, so we enforce strict standards to ensure the data is accurate, reliable, and trustworthy.

Cost control is always a concern in cloud analytics. How are you approaching BigQuery cost optimization?

Yasen: BigQuery can be very cost-efficient when used thoughtfully. We always tailor optimizations to the intended use of each dataset. For example, tables in the lake are designed for transformation-heavy workloads, as the ELT pipelines are the primary users. In contrast, tables in the warehouse and the feature store are optimized for analytics queries, dashboards, and machine learning models, which are the main consumers in those layers.

To achieve this, we apply a range of cost-efficient design strategies, including partitioning, clustering, incremental processing, and table denormalization. We also use slot reservations for predictable compute needs, materialized views for frequently accessed logic, and query caching to avoid unnecessary rescans.

The goal is to provide teams with a fast, responsive environment while controlling spending - so analysts and ML practitioners can explore and query data confidently without scanning unnecessary terabytes.

What challenges did you encounter during the transition?

Yasen: One of the main challenges is keeping the business running smoothly while modernizing the data platform. This means we must maintain the legacy system even as we build the new BigQuery-based architecture. To manage this, we follow a dual-track delivery model: focusing on building a robust back-end foundation, while maintaining existing developments and planning for the next phase of rebuilding data feeds, application integrations, and business dashboards.

Compliance and regulatory requirements are also critical – our solutions must meet all standards while continuing to deliver reliable service.

Manpower is another challenge. We intentionally develop everything in-house because deep business domain knowledge is essential for designing an architecture that truly fits our needs. Contractors can help, but they often lack the nuanced understanding and focus required for an optimal solution.

What role does AI play in your long-term roadmap?

Yasen: AI is central to our vision. The lakehouse provides the foundation for machine learning pipelines, from model training to deployment. Our vision is data-centric rather than model-centric: even the best algorithms won’t deliver value if the underlying data isn’t clean, structured, and ready for use.

We have several ML models planned for the near term, including personalized CRM journeys, fraud preventions, affiliate traffic optimizations, customer segmentations, churn prediction, early VIP client detection, and product recommendation engines.

Longer term, our vision is agentic BI - systems that proactively detect patterns, generate insights, propose decisions, and even automate actions across domains. Most data teams in the industry are still reactive, constantly juggling tasks and priorities with little time to step back, analyze deeply, or see the bigger picture. We want to break that cycle. Everything we are building today - from governance to architecture - is designed to move us toward a future where insights are not just delivered but anticipated.

Final question. What excites you most about this transformation?

Yasen: What excites me most is the shift from a reactive, maintenance-heavy setup to a proactive, value-driven platform. We’re building an environment that scales naturally, automates routine tasks, and enables advanced analytics and AI.

But it’s more than just modernizing infrastructure. We’re empowering analysts, reducing dependency on rare technical specialists, and creating a platform where teams can focus on insight, innovation, and impact instead of firefighting. This transformation will fundamentally change how the company works with data and unlock value for years to come.