Regional Data Warehouse for Collective Impact

Overview

Along with my team, I designed and implemented a regional data warehouse on Google BigQuery that unified voter outreach data from over 80 partner organizations and voter turnout data from 6 counties. The system standardized submissions from VAN, EveryAction, Impactive, and alternate submission types to create a single, source of truth for engagement and turnout analysis. This project transformed fragmented data into an integrated infrastructure supporting over 2 million incoming records and real-time insight across the network.

The Challenge

Before this project, organizations across the region were submitting voter contact data in different formats and through multiple tools. The lack of consistency made it difficult to track progress, measure turnout, and demonstrate collective impact to funders. Partners worked in silos without a shared understanding of who was being reached or how their efforts connected.

The technical challenges were significant:

  • How do you build a system that can ingest data from over 80 independent sources with little to no standardization? Each organization used different platforms, exported data differently, and had varying levels of technical capacity. The system need to be flexible enough to accept what partners could actually provide while rigorous enough to maintain data quality.

  • How do you process millions of records in near-real-time during peak seasons? During peak GOTV season, incoming data volumes spiked dramatically. Field directors needed current data to make time-critical decisions about staffing and outreach. Delays in the system make it useless for real-time strategy. Equally critical, inconsistent data could result in incorrect or missing records, undermining trust in the entire system.

Technical Approach

I led the design and implementation of a regional data warehouse on Google BigQuery. Our team collaborated with The Movement Cooperative (TMC) to align warehouse architecture and ensure compatibility. We developed partner submission templates, validation checks, and human-centered governance framework to ensure accuracy and security across all participating organizations. Working closely with partners, we connected field data to geographic and voter file information, creating a unified system that linked outreach data across counties and tactics.

Data Ingestion & ETL Pipeline

  • partner submission templates

  • API connections

  • Organizations with technical capacity could use APIs; smaller groups could upload CSVs through our data submission portal.

  • Created a tracking document for data submissions and matching. Partner specific staging tables preserved raw submissions before matching. Partners got quick feedback rather than waiting days to hear their data couldn’t be processed.

  • Introduced and covered the licensing costs for Impactive for organizations without their own voter outreach tool.

Data Transformation & Standardization

Geographic Integration

Results

Operational Transformation

  • Increase compliant data submission from less than 10% of participating organizations, to over 80%.

  • Increase in data accuracy

Technical Performance

  • Processed over 2 million voter records and over 4 million outreach records across 6 counties.

Strategic Impact

  • geographic targeting analysis

  • supported real-time resource allocation decisions during critical campaign periods.

Cultural Shift
The infrastructure fundamentally changed how we approached data. Partners who initially resisted submission became advocates. The system created a shared language for discussing impact across organizations that had previously worked in isolation. The most satisfying moment wasn’t reaching 2 million records ingested, it was watching our election manager pull up a dashboard in a meeting, identify a coverage gap in real-time, and organizations actively organize to close the gap.

The warehouse now integrates over 2 million voter records, from 80+ organizations, across multiple years, providing a shared foundation for learning and accountability.

Key Lessons

Governance enables adoption.

Flexibility compounds over time.

Observability prevents fires.

Tech Stack

Google BigQuery (data warehouse), SQL (complex queries, CTEs, spatial joins), dbt (transformation logic and modelling), VAN/EveryAction, Impactive, Salesforce, Tableau, ArcGIS, and Basecamp (for project management)

Next
Next

Building a Cross-Sector Data Infrastructure for Vaccine Distribution