From Participation to Cohesion: Measuring Affiliate Health through Engagement Pathways and Network Structure

Project
From Participation to Cohesion
Measuring Affiliate Health through Engagement Pathways and Network Structure
This project pairs a ladder of engagement with social network analysis to move beyond participation counts and toward a relational understanding of affiliate health. By combining member progression with network structure, it becomes possible to identify bottlenecks, fragmentation, and opportunities for stronger, more sustainable engagement.

Overview

Most organizations track who showed up. This project asks what happened after they did.

Working with a synthetic dataset modeled on common patterns in distributed organizations, I designed a dual framework that combines a ladder of engagement with social network analysis to move beyond participation counts and toward a relational understanding of affiliate health. The ladder measures how deeply members are embedded in the affiliate over time. The network measures how members are connected to each other and where the structural risks live. The result is a methodology that makes visible what attendance data cannot: relationships, sustained engagement, and sustainability.

The core challenge was to move from measuring activity to understanding how engagement actually functions within each affiliate. Based on available information, I decided to posit three critera and questions:

1
Connection
How were members connected to one another?
2
Sustainability
Was member engagement sustained over time?
3
Dependency Risk
Did affiliates rely too heavily on only a handful of people?

The project spans data modeling in dbt, network analysis in Python using NetworkX, visualization in Kumu, and an affiliate health report that translates technical findings into plain language for organizational decision-makers.

Note on data: This project uses a synthetic dataset modeled on common patterns in distributed organizations. The goal is to demonstrate the framework and analytical approach, rather than evaluate a specific organization.

Approach

The goal was to move from counting activity to understanding how engagement actually functions inside each affiliate. To do that, I built a dual framework that combines a ladder of engagement with social network analysis. Neither tool alone is sufficient. The ladder tells you where a member stands in their relationship to the organization. The network tells you how members are connected to each other. Together, they produce a more complete picture of affiliate health than participation data alone.

The project moves through six steps:

1. Define the framework.

2. Build the data model

3. Transform three source tables into one usable dataset. using dbt.

4.Analyze network structure.

5. Visualize the network.

6. Score affiliate and report findings.

Step 1: Define the framework

The framework has two components. The ladder tells you where a member stands in their relationship to the organization. The network tells you how members are connected to each other. Neither is sufficient on its own. Together they produce a more complete picture of affiliate health than participation data alone.

The Engagement Ladder

Members move through the affiliate like steps on a staircase, each level representing a deeper investment in the work and the people around them. The ladder has five levels: Observer, Explorer, Regular, Bridge, and Anchor. A member's level is not self-reported. Instead it is assigned based on two observable signals: how often they attend events relative to how many were available, and how many people they have referred into the affiliate.

The Engagement Ladder
Level Definition Example Attendance Rate Referral Count
Observer Member, but not attending Long time listener, first time... well, not yet 0% 0
Explorer Attends occasionally Shows up when the topic is right 1-75% 0
Regular Engages consistently You could set your watch by them >75% 0
Bridge Consistent and grows the network The one who says "you two should meet" >75% 1-2
Anchor Consistent and brings in the most people The reason half the room is there >75% 3+

Attendance rate is calculated by dividing the number of events a member attended by the total number of events their affiliate held during their membership period. This accounts for the fact that affiliates vary in how active they are: a member in Phoenix had fewer opportunities to attend than a member in Atlanta, so raw counts alone would be misleading. Referral count is the number of members whose referred_by field points to that member. It measures network growth directly: who is actually bringing people in. Founding members with no referred_by record of their own are not penalized. What matters is whether they are growing the network, not whether someone recruited them.

The Network Layer

Where the ladder asks how connected a member is to the organization, the network asks how connected members are to each other. The network layer sits alongside the ladder as a separate lens, not a replacement for it. A member can be deeply embedded in the organization — attending consistently, referring others in — while remaining isolated from their peers. The ladder would not surface that. The network does.

Why Two is Better Than One

The ladder and the network answer different questions. The ladder tells you about a member's relationship to the institution. The network tells you about a member's relationship to everyone else in it. An affiliate can have strong ladder distribution and a fragile network. It can have a dense network and an engagement distribution weighted toward the bottom. Looking at either one alone produces an incomplete picture. Looking at both together makes visible what neither could show on its own: whether the affiliate is healthy, where the risks live, and what is worth paying attention to.

Step 2: Build the Data Model

The dataset is built from five tables: three source tables that live in BigQuery and two derived tables produced by the dbt pipeline. The distinction matters. Interactions and referrals are not collected directly. They are derived from what members actually did rather than what they reported.

The Three Source Tables

The Members table holds individual profiles: who joined, when they joined, and who brought them in. The referred_by field is stored as a member name rather than an ID, making the data human-readable at the source. The dbt pipeline resolves it back to a member ID for analysis.

The Events table records each affiliate event, its type, and when it took place. The Attendance table records who showed up to what. One row per member per event attended. It is the simplest table in the model and the most analytically powerful, because everything downstream, such as interactions, ladder levels, network structure, is derived from it.

Data Dictionaries

Field
Type
Description
member_id
Integer
Unique identifier for each member
member_name
String
Member full name
affiliate_id
String
The affiliate the member belongs to
join_date
Date
Date the member joined the affiliate
Field
Type
Description
event_id
Integer
Unique identifier for each event
affiliate_id
String
The affiliate that hosted the event
event_name
String
Descriptive name of the event
event_type
String
Monthly Meeting, Training, Social, Working Session, or Annual Convening
event_date
Date
Date the event took place
Field
Type
Description
attendance_id
Integer
Unique identifier for each attendance record
member_id
Integer
The member who attended; foreign key to Members
event_id
Integer
The event attended; foreign key to Events
Field
Type
Description
member_id_1
Integer
First member in the pair; foreign key to Members
member_id_2
Integer
Second member in the pair; foreign key to Members
affiliate_id
String
The affiliate both members belong to
co_attendance_count
Integer
Number of shared events
interaction_weight
Integer
Strength of tie on a scale of 1 to 5
Field
Type
Description
referral_id
Integer
Unique identifier for each referral record
referrer_id
Integer
The member who made the referral; foreign key to Members
referred_id
Integer
The member who was referred; foreign key to Members
affiliate_id
String
The affiliate both members belong to
join_date
Date
Date the referred member joined

The Two Derived Tables

The Interactions table is derived from the Attendance table. Two members who attended the same event share a co-attendance record. The more events they share, the stronger the tie. Interaction weight runs from one to five, mapped from the number of shared events. A weight of one looks like two members who crossed paths once. A weight of five looks like two members who keep showing up together.

The Referrals table is derived from the Members table by extracting the referred_by relationship into a dedicated edge list. Each row represents a directed connection: the member who made the introduction and the member who joined as a result. This is the directed network. Arrows point from referrer to recruit.

How the Tables Relate

Together the five tables capture two distinct views of the affiliate. The co-attendance network shows how relationships function in the present. The referral network shows how the affiliate grew over time. Neither view alone is sufficient. Both together tell the full story.

Step 3: Sources Tables to Analytical Dataset

Three source tables entered the pipeline. Two analytical outputs came out. Everything in between is documented, reproducible, and version controlled.

The pipeline is built in dbt and runs against BigQuery. It moves through three layers: staging, intermediate, and marts. Each layer has a specific job. Staging cleans and casts the raw source data. Intermediate derives the analytical building blocks. Marts produce the final outputs used for analysis and reporting.

Staging

The staging layer pulls directly from the three source tables in BigQuery and prepares them for downstream use. Each model casts fields to the correct data types and handles one specific transformation: the Members staging model resolves the referred_by field from a member name back to a member ID via a self-join. This is the only place in the pipeline where that resolution happens, which means every downstream model can treat referred_by as a reliable foreign key.

dbt Lineage Staging
with source as (
select * from {{ source('Affiliate_Health', 'Attendance') }}),
    staged as (select
        cast(attendance_id as int64)    as attendance_id,
        cast(member_id as int64)        as member_id,
        cast(event_id as int64)         as event_id
    from source)
select * from staged;

Intermediate

The intermediate layer is where the analytical work happens. Four models build progressively on each other:

  • int_referrals extracts the referred_by relationship from the Members table into a clean directed edge list.
  • int_attendance_metrics calculates attendance count and attendance rate per member, accounting for variation in how many events each affiliate hosted.
  • int_interactions derives the undirected co-attendance network from the Attendance table, assigning an interaction weight of one to five based on how many events each pair of members shared.
  • int_member_metrics combines attendance rate and referral count into a single member-level metrics table, which feeds directly into ladder level assignment in the mart layer.
dbt Lineage Intermediate
with attendance as (select * from {{ ref('stg_attendance') }}),
members as (
    select * from {{ ref('stg_members') }}),
pairs as (
    select 
        least(a1.member_id, a2.member_id) as member_id_1,
        greatest(a1.member_id, a2.member_id) as member_id_2, 
        a1.event_id
    from attendance as a1
    inner join attendance as a2
        on  a1.event_id = a2.event_id
        and a1.member_id < a2.member_id),
co_attendance as (
    select 
        member_id_1, 
        member_id_2, 
        count(event_id) as co_attendance_count
        from pairs
        group by member_id_1, member_id_2),
with_affiliate as ( 
    select
        c.member_id_1,
        c.member_id_2,
        m.affiliate_id,
        c.co_attendance_count,
    case
        when c.co_attendance_count = 1             then 1
        when c.co_attendance_count = 2             then 2
        when c.co_attendance_count between 3 and 4 then 3
        when c.co_attendance_count between 5 and 6 then 4
        else 5
        end as interaction_weight
    from co_attendance as c
    inner join members as m
        on c.member_id_1 = m.member_id)
select * from with_affiliate;

Marts

The mart layer produces two final tables. fct_member_levels assigns a ladder level to each member based on the two-signal calculation: attendance rate and referral count. fct_affiliate_health scores each affiliate across three dimensions: engagement distribution, network structure, and referral pattern.

Each member's ladder level is assigned based on two signals: how often they attend events relative to how many were available, and how many people they have referred into the affiliate.

dbt Lineage Marts
with member_metrics as (select * from {{ ref('int_member_metrics') }}),
leveled as (
    select
        member_id,
        member_name,
        affiliate_id,
        join_date,
        referred_by_name,
        referred_by_id,
        attendance_count,
        events_available,
        attendance_rate,
        referral_count,
    case
        when attendance_rate = 0                                        then 'Observer'
        when attendance_rate > 0     and attendance_rate < 0.75         then 'Explorer'
        when attendance_rate >= 0.75 and referral_count = 0             then 'Regular'
        when attendance_rate >= 0.75 and referral_count between 1 and 2 then 'Bridge' 
        when attendance_rate >= 0.75 and referral_count >= 3            then 'Anchor'
        end                             
        as ladder_level
    from member_metrics)
select * from leveled

Step 4: Analyze Network Structure

The dbt pipeline produces the data. The network analysis produces the insight. Social network analysis (SNA) is a method for studying relationships. It treats people as nodes and the connections between them as edges. The structure those connections form reveals things about the group that individual-level data cannot: (1) who sits at the center, (2) who bridges otherwise disconnected clusters, and (3) where the network is fragile.

The subsequent analysis relies on the following concepts:

SNA Concept Explanation
Nodes Represents an individual within the network, in this case the member within the affiliate.
Edges Representes the relationships between two nodes in the network.
Degree Centrality Measures the number of direct connections a node (member) has within the network. A member with high degree centrality is connected to many others. In a healthy affiliate, degree centrality is distributed across multiple members. In a fragile one, it concentrates in a small number of people.
Betweenness Centrality Identifies members who serve as bridges between different parts of the network. A high betweenness score means the network depends on that member to connect others. If that member leaves, those connections break. A score of 1.0 means every path in the network runs through a single person.
Network Density Measures the proportion of actual to possible connections exist in the network. A dense network, score of 1.0, means every member is connected to each other, whereas a score of 0.0 means no one is connected.
Components (communities) A group of members connected to each other but not to anyone outside their group, often used to identify sub-groups within the larger network.

The dbt pipeline calculated who each member is and how often they showed up (ladder level, attendance rate, referral count), NetworkX takes that output and asks how are they related to each other (density, betweenness, degree centrality, components).

Build the graph

This is where the dbt outputs become the inputs. Members become nodes. Co-attendance interactions become edges, with interaction weight carried forward as edge strength.

import pandas as pd 
import networkx as nx

# Filter to co-attendance edges only
    co_attendance = edges[edges['edge_type'] == 'co-attendance'].copy()

# Store affiliate summaries
    summary = []
    for affiliate in sorted(nodes['type'].unique()):
        aff_nodes = nodes[nodes['type'] == affiliate]
        aff_node_ids = aff_nodes['member_id'].tolist()
        aff_edges = co_attendance[co_attendance['affiliate_id'] == affiliate]
# Build graph
    G = nx.Graph()
    G.add_nodes_from(aff_node_ids)
    for _, row in aff_edges.iterrows():
        G.add_edge(row['from_id'], row['to_id'], weight=row['weight'])
# Network metrics
    density = round(nx.density(G), 3)
    components = nx.number_connected_components(G)
    betweenness = nx.betweenness_centrality(G, weight='weight')
    degree = nx.degree_centrality(G)

    top_bc_node = max(betweenness, key=betweenness.get)
    top_bc_name = nodes[nodes['member_id'] == top_bc_node]['label'].values[0]
    top_bc_score = round(betweenness[top_bc_node], 3)
    top_bc_level = nodes[nodes['member_id'] == top_bc_node]['ladder_level'].values[0]

    avg_degree = round(sum(degree.values()) / len(degree), 3)

# Engagement distribution
    aff_levels = aff_nodes['ladder_level'].value_counts().to_dict()
    total = len(aff_node_ids)
    highly_engaged = aff_levels.get('Regular', 0) + aff_levels.get('Bridge', 0) + aff_levels.get('Anchor', 0)
    pct_highly_engaged = round(highly_engaged / total * 100, 1)

# Referral metrics
    aff_referrals = edges[
        (edges['edge_type'] == 'referral') &
        (edges['affiliate_id'] == affiliate)]
    total_referrals = len(aff_referrals)
    unique_referrers = aff_referrals['from_id'].nunique()
    summary.append({
        'Affiliate': affiliate,
        'Members': total,
        'Density': density,
        'Components': components,
        'Avg Degree Centrality': avg_degree,
        'Top BC Node': top_bc_name,
        'Top BC Level': top_bc_level,
        'Top BC Score': top_bc_score,
        'Observer': aff_levels.get('Observer', 0),
        'Explorer': aff_levels.get('Explorer', 0),
        'Regular': aff_levels.get('Regular', 0),
        'Bridge': aff_levels.get('Bridge', 0),
        'Anchor': aff_levels.get('Anchor', 0),
        'Pct Highly Engaged': pct_highly_engaged,
        'Total Referrals': total_referrals,
        'Unique Referrers': unique_referrers})
    summary_df = pd.DataFrame(summary)

# Average co-attendance interaction weight per affiliate
    avg_weight = co_attendance.groupby('affiliate_id')['weight'].agg(
    avg_interaction_weight='mean',
    total_edges='count').round(3).reset_index()
Metric Definition
Nodes Represents an individual within the network, in this case the member within the affiliate.
Edges Representes the relationships between two nodes in the network.
Density Measures the proportion of possible connections that actually exist in the network. A score of 1.0 means every member is connected to every other member. A score of 0 means no one is connected to anyone. High density indicates an affiliate where members are broadly connected. Low density indicates one where connections are sparse or concentrated.
Degree centrality Measures how many connections each member holds relative to the maximum possible. A member with high degree centrality is connected to many others. In a healthy affiliate, degree centrality is distributed across multiple members. In a fragile one, it concentrates in a small number of people.
Betweenness centrality Identifies members who serve as bridges between different parts of the network. A high betweenness score means the network depends on that member to connect others. If that member leaves, those connections break. A score of 1.0 means every path in the network runs through a single person.
Components A group of members connected to each other but not to anyone outside their group. One component means everyone in the affiliate is reachable through the network. More components signal fragmentation: isolated clusters with no ties to the rest of the affiliate.

With fct_member_levels and int_interactions in hand, each affiliate can be modeled as a graph: members as nodes, co-attendance interactions as edges, and interaction weight as edge strength. The analysis was conducted in NetworkX in Google Colab.

Four metrics describe the structure of each affiliate's network. Together they answer the three core questions this project was designed to address.

The output is a metrics table with one row per member, enriched with degree centrality and betweenness centrality, alongside affiliate-level summaries of density and components. That enriched dataset is what feeds the Kumu visualization in Step 5 and the affiliate health scoring in Step 6.

Step 5: Visualize the Network

The metrics tell you what is happening in each affiliate. The visualization makes it visible. The network maps were built in Kumu, a web-based tool for social network analysis and relationship mapping.

The enriched nodes file from the NetworkX analysis served as the element import: one row per member with label, affiliate, ladder level, attendance rate, referral count, degree centrality, and betweenness centrality. The co-attendance edges served as the connection import, with interaction weight mapped to connection strength so thicker lines indicate stronger ties. The referral edges were imported as a separate connection layer, directed from referrer to recruit. That produces two distinct network views for each affiliate. The co-attendance network shows how relationships function in the present: who is connected to whom and how strongly. The referral network shows how the affiliate grew over time: who brought whom in and whether that activity is distributed or concentrated.

Node color reflects ladder level, using a consistent palette across all five affiliates so the maps are directly comparable. Node size can be scaled to betweenness centrality, making the bridges and bottlenecks immediately visible without reading a single number. The visual contrast between affiliates is stark even before any decoration is applied. Atlanta's map is dense and distributed. Chicago's is a hub and spoke with one node at the center of everything. Denver fractures into disconnected clusters. The structure of each affiliate is legible at a glance in a way that a metrics table never is.

Step 6: Summary Affiliate Health and Report Findings

The pipeline, the network analysis, and the visualization all build toward one output: a health profile for each affiliate that names what is working, what is fragile, and where the risk lives. Each affiliate is evaluated across the three questions from earlier:

1
How connected is each affiliate?
Density and components describe whether the network holds together and how broadly connections are distributed across members. A high density means most members share at least one connection. A low component count means the network is not fracturing into isolated clusters. Together they tell you whether the affiliate operates as a cohesive unit or a collection of disconnected groups.
2
Is it sustainable?
Engagement distribution describes how members are spread across the five ladder levels. A healthy affiliate has members at every level, with a meaningful number of Regulars, Bridges, and Anchors. An affiliate weighted toward Observers and Explorers is thin, even if its headcount looks fine. The tenure analysis adds a second layer: not just where members stand today, but whether their engagement deepened over time.
3
Where is the dependency risk?
Betweenness centrality, strength of tie, and referral pattern combine to identify where the affiliate relies too heavily on a small number of people. A high betweenness score means one member is the bridge between everyone else. A low unique referrer count means one person is doing most of the recruiting. Either concentration is a fragility risk.

These three questions do not produce a single score. A number would flatten what is actually a nuanced picture. Instead each affiliate receives a health profile that holds all three dimensions together and names what they reveal in combination.

The five affiliate health reports

The findings are documented in five affiliate health reports, one per affiliate, each structured around the same three questions. The reports are written in plain language for organizational decision-makers rather than data practitioners. The goal is not to present the methodology but to surface what it found and what it means for the affiliate.

[Link or embed to the five affiliate health reports]

The five affiliates were designed to represent distinct health profiles: Atlanta as the healthy baseline, Chicago as a leader bottleneck, Denver as fragmented, Seattle as moderately healthy, and Phoenix as emerging. The contrast between them is where the framework earns its value. Headcount alone would not distinguish these five affiliates. The combination of connectivity, sustained engagement, and dependency risk makes the differences visible and makes the risks actionable.

Comparing Affiliates
The five affiliates in this dataset were designed to represent distinct health profiles. The contrast between them is where the framework earns its value. Headcount alone would not distinguish these affiliates. The combination of connectivity, sustained engagement, and dependency risk makes the differences visible and the risks actionable.

Affiliate How Connected? Is Engagement Sustained? Where is the Risk?
Atlanta High density, 2 components 72% highly engaged Low — risk distributed across 9 referrers
Seattle Moderate density, 3 components 57% highly engaged Moderate — some dependency risk
Phoenix Low density, 5 components 33% highly engaged Emerging — thin but not broken
Chicago Low density, 1 component 7% highly engaged Critical — one person holds the entire network
Denver Very low density, 7 components 0% highly engaged Severe — fragmented with little to no path to reconnection

Atlanta and Chicago tell the starkest story. Both have 15 or more members. Both have active referral networks. But where Atlanta's connections are distributed and its engagement runs deep, Chicago's entire network runs through one person. Remove that person and the affiliate has no connections at all. That is the difference between a healthy affiliate and a fragile one, and it is invisible in a headcount report.

What Makes this Possible

Participation data tells you who showed up. This framework tells you what happened after they did. That shift, from counting activity to understanding how engagement actually functions, is what makes the difference between a report that describes a problem and one that names where the risk lives and what to do about it.

For a federated organization managing multiple affiliates, that distinction matters. A headcount report treats all affiliates as comparable. This framework makes the differences visible. An affiliate with 15 members and a density of 0.889 is not the same as an affiliate with 15 members and a density of 0.133. One is resilient. The other is one departure away from collapse. Without a relational lens, both look the same in a spreadsheet.

What it makes possible in practice

Early risk identification. A bottleneck or fragmentation signal surfaces before a key member leaves, not after. The framework gives organizational leaders time to act rather than react.

Targeted support. An affiliate weighted toward Observers and Explorers needs different support than one weighted toward Regulars and Bridges. Knowing where members stand makes it possible to design interventions that meet affiliates where they are.

Leadership succession planning. When betweenness centrality is concentrated in one person, that person's departure is a structural risk. Naming that risk explicitly creates the conditions for intentional succession planning before it becomes a crisis.

Referral strategy. Knowing which members are bringing others in, and whether that activity is distributed or concentrated, makes it possible to invest in the right people rather than assuming growth will happen organically.

Take it Further

This project uses a synthetic dataset to demonstrate the framework. Applied to real organizational data, the methodology becomes a repeatable evaluation tool. A few directions worth exploring:

Tracking change over time. Running the analysis at regular intervals would surface whether affiliates are growing healthier or more fragile, and whether interventions are having the intended effect.

Integrating survey data. Co-attendance captures whether members showed up together. It does not capture the quality of those interactions. Pairing the network analysis with a short relational survey would add a layer of depth that attendance data alone cannot provide.

Building a live dashboard. The dbt pipeline is already running against BigQuery. Connecting it to a visualization layer would make the affiliate health metrics available in real time rather than as a periodic report.

The framework is designed to be extensible. The ladder levels, the interaction weights, and the health dimensions can all be adjusted to reflect the specific context of a given organization. What stays constant is the underlying question: not who showed up, but what happened after they did.

Next
Next

When the Data Finally Matches the Work: Building Evaluation Capacity across a Civic Engagement Network