Case Study

When the Data Isn’t the Problem

How governance, trust, and change management built the infrastructure that technology alone couldn’t

Building a regional data warehouse for 80+ organizations sounds like a technical challenge. It was. But the harder problem was convincing a fragmented coalition to change their workflows, share their data, and trust a centralized system. This is what it took to get there.

The Starting Assumption

Before the 2024 election cycle, I was part of the support team coordinating voter outreach data for a regional coalition of 80+ partner organizations across 6 counties. Each ran its own field program, used its own platform (VAN, EveryAction, Impactive, MobilizeAmerica) and submitted data in whatever format made sense to them: PDFs, spreadsheets, Word documents. My role, as Director of Data Infrastructure, was to build the shared system that would make sense of all of it.

The case for a regional data warehouse was obvious. The path to getting there was not.

The Challenge

On paper, the problem looked like a data quality issue. Less than 10% of partner organizations were submitting the individual-level contact data needed for meaningful analysis. My team was spending 15 to 20 hours a week manually cleaning and merging what came in. Reporting lagged weeks behind field activity. Partners had no shared picture of who was being reached, which neighborhoods were covered, or how their efforts connected.

Framing this as a data quality problem would have produced the wrong solution.

1

Fragmented Data Submission

Less than 10% of partner organizations were submitting individual-level contact data needed for meaningful analysis. Without individual-level records, there was no way to understand who was being reached, identify the overlap between organizations, or connect outreach activity to actual voter turnout.

2

The real problem was that the support team was asking organizations to change their workflows, expose their data, and trust a centralized system, without yet having given them a reason to. That's not a pipeline problem. That's a change management problem. The technical architecture had to serve that reality, not the other way around.

Designing with Intention

My first instinct was to focus on the data model. After the first year, I understood that was the wrong place to start.

Before partners would submit meaningful data, they needed answers to three questions: Who can see my contact records? How will my data be used? What happens if something goes wrong? My team developed comprehensive data sharing agreements and privacy policies before asking anyone to change their submission process. I implemented row-level security so organizations could only access their own records alongside aggregated regional metrics. No organization could see another's individual voter contacts. I built separate datasets for different classification levels (Public, Internal, Restricted, Confidential) with differentiated access controls and automatic PII redaction from visualizations.

This wasn't bureaucratic overhead. It was the precondition for participation. Once partners understood the rules and felt safe, technical adoption moved quickly.

With that foundation in place, I turned to the technical build, designing it with the same principle in mind: meet partners where they were, not where I wished they were.

Organizations with technical capacity used automated API imports from Impactive and EveryAction; smaller groups uploaded CSVs through a portal. I developed the dbt transformation logic that standardized records across all of these formats into a unified BigQuery warehouse. Portal submissions automatically generated tracking tickets so partners could see exactly where their data was in the process and flag issues themselves.

That visibility changed the dynamic. Data quality problems became collaborative troubleshooting instead of adversarial audits. Partners fixed issues in days rather than weeks. And because they could see their data moving through the system, they stayed engaged with it.

By integrating county voter files and ArcGIS shapefiles, the data could be rolled up from individual contacts to precincts to neighborhoods to counties. Partners could ask questions at whatever level made sense for their work: How many doors were knocked in this neighborhood? Which organizations are working the same turf? Where are the gaps?

Observable Shifts

The operational shift was measurable. Compliant submissions went from less than 10% to over 80% of participating organizations. Manual processing dropped from 15 to 20 hours a week to near zero. Partners who had been working weeks behind their own field activity now had real-time visibility into it. By the 2024 cycle, the system was processing over 2 million voter records and 4 million outreach records across 6 counties, unified from VAN, EveryAction, Impactive, and custom formats.

  
      Before
      After
    
      Data compliance
      Less than 10% of organizations submitting usable data
      80%+ compliant submissions
    
      Processing time
      15–20 hrs/week manual cleaning
      Near zero
    
      Reporting lag
      Weeks behind field activity
      Real-time visibility
    
      Scale
      Fragmented, county-level silos
      2M+ voter records, 4M+ outreach records across 6 counties

The cultural shift was harder to measure and more consequential. Organizations that had initially refused to participate became internal advocates. A shared language emerged for talking about coverage, gaps, and reach across groups that had previously worked in isolation.

The most meaningful moment in this project wasn't reaching 2 million records. It was watching the coalition's election manager pull up a dashboard mid-meeting, identify a coverage gap in real time, and watching organizations actively coordinate to close it. The warehouse made that possible. The trust made them willing to use it.

The system has continued through the 2025 cycle, each year adding cumulative context that makes the analysis more valuable.

Manual Processes,
Delayed Decisions

The data support team spend the bulk of its time manually cleaning and merging submissions, to the tune of 15 to 20 hours a week. This led to a several week delay. By the time reporting was ready, field activity had already moved on. Decisions were being made on outdated information, if they were being made on data at all.

Lessons to Carry Forward

The numbers above matter. But the more durable takeaways from this project weren't in the data; they were in the decisions that shaped how the data got there. These lessons came from friction, from things that didn't work the first time, and from a project that forced a reckoning with the limits of purely technical thinking. They apply to any effort where you're asking people to change how they work in service of a shared goal.

1. Change management is as technical as code. Getting 80+ organizations to change their workflows was the hardest engineering problem in this project, harder than the data modeling and harder than the API integrations. I learned to treat adoption like a product design challenge: reduce friction, optimize feedback loops, design for the user's actual capacity rather than the one you wish they had. Measuring adoption with the same rigor as system performance was what made the difference.

2. Observability protects everyone. Validation dashboards felt like a nice-to-have in year one. They became essential. Real-time visibility meant partners caught data quality issues themselves before they reached analysis. The upfront investment paid for itself many times over, and it communicated something important: that the support team respected the partners' role in maintaining data integrity, not just its own.

3. Build for continuity from the start. The warehouse now spans multiple years. That durability required decisions made early about data modeling, governance, and documentation that weren't urgent in the moment but became foundational later. The pressure to ship is real. The decisions you make in year one follow you.

What didn't change is also worth naming directly.

Standardizing data across organizations with very different capacities meant some groups' work was consistently more visible in the system than others. Smaller organizations with less technical infrastructure submitted less data, which meant their outreach was underrepresented in coalition-wide analysis, not because it mattered less, but because the system reflected capacity, not effort. That imbalance is something better-resourced support and more intentional partner investment could address in a future cycle.

The other constraint was the clock. This work was tied to election cycles. The deadlines were fixed, field programs ran whether the infrastructure was ready or not, and every decision about what to build next was made under pressure. There is always more to do than a campaign cycle has room for, and some of what gets deferred doesn't get revisited. That's the nature of this work, and it's worth being honest about.

If you're building data infrastructure for a coalition, a network, or any multi-stakeholder environment, the technical architecture is the easier half of the problem. The harder half is earning the trust that makes the system worth building. That work doesn't show up in a data model, but it determines whether one ever gets used.

The technical architecture behind this system, including BigQuery data modeling, dbt transformation logic, voter file matching, and the geographic integration layer, is the subject of a companion post.

Tech Stack: Google BigQuery · dbt · SQL · VAN/EveryAction · Impactive · ArcGIS · Tableau · Salesforce · Basecamp