Building a Guest WiFi Data Pipeline: Architecture Guide
Key Takeaways: A guest WiFi data pipeline transforms raw network events (probe requests, RADIUS sessions, portal submissions) into actionable marketing intelligence. The pipeline has five stages: collection, authentication, enrichment, storage, and activation. MyWiFi Networks handles the entire pipeline for resellers — from AP integration through campaign delivery — processing 75M+ guest connections across 54+ countries. Understanding the architecture helps resellers troubleshoot deployments, sell to technical buyers, and design integrations that extract maximum value from WiFi data.
Every guest WiFi connection generates data. A probe request from a device walking past the venue. A RADIUS session record when they connect. A portal submission when they authenticate. Session accounting records throughout their visit. A disconnect event when they leave.
The question is whether that data flows into a pipeline that produces marketing intelligence or evaporates as unprocessed log entries.
According to McKinsey's 2025 State of Data report, companies that build structured data pipelines from physical-world interactions achieve 2.4x higher customer retention rates than those relying on digital-only channels. WiFi data is the most accessible physical-world data source for brick-and-mortar venues — and building the pipeline correctly determines whether it delivers on that potential.
Pipeline architecture overview
The guest WiFi data pipeline has five stages. Each stage transforms raw signals into progressively more valuable data assets.
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
COLLECTION → AUTHENTICATION → ENRICHMENT → STORAGE → ACTIVATION
Probe requests Portal login Profile merge Time-series Email campaigns
RADIUS sessions Email/SMS/WA Device fingerprint Guest profiles SMS messages
AP associations Social login Visit history Session store Webhook triggers
DHCP events Payment auth Segmentation Analytics DB Ad retargeting
Stage 1: Data collection
Data collection begins before the guest interacts with any portal. The access point generates network-level events that form the foundation of presence analytics.
Probe requests
Every WiFi-enabled device periodically broadcasts probe requests — frames that ask "is there a network nearby?" These frames contain the device's MAC address and supported capabilities. Access points that support probe request logging capture these frames from all devices in range, including those that never connect to the network.
Probe request data enables footfall analytics: how many devices (and by inference, people) are in the vicinity, how long they stay, and what percentage actually connect. According to Cisco's 2025 WiFi Deployment Guide, the ratio of detected probing devices to connected devices typically ranges from 3:1 in small venues to 10:1 in high-traffic public spaces.
Privacy note: MAC address randomization (default on iOS 14+, Android 10+) now applies to probe requests. Randomized MACs in probes mean probe-based analytics count unique events rather than unique devices. Cisco Meraki's CMX API and similar presence platforms apply statistical modeling to estimate unique visitors from randomized probe data.
RADIUS session events
When a device associates with the SSID and initiates authentication, the access point sends a RADIUS Access-Request. This is the first event in the authenticated data pipeline. The session begins formally when the RADIUS server returns an Access-Accept and the AP sends an Accounting-Start record.
DHCP fingerprinting
During the DHCP handshake (device requesting an IP address), the device sends a DHCP fingerprint — a set of parameters that identify the device type, operating system, and sometimes the manufacturer. DHCP fingerprinting provides device identification without requiring any guest interaction.
According to Fingerbank's 2025 device database, DHCP fingerprinting correctly identifies the device OS with 91% accuracy and the device manufacturer with 96% accuracy.
AP association metadata
The access point records which AP the device associated with, the signal strength (RSSI), and the radio band (2.4 GHz, 5 GHz, 6 GHz). In multi-AP venues, the AP association determines the guest's physical zone. Signal strength data can indicate proximity to the AP — closer devices show stronger RSSI values.
Stage 2: Authentication
Authentication is where anonymous network data becomes identified guest data. The captive portal is the collection mechanism, and the authentication method determines what data is captured.
Portal submission processing
When a guest completes the captive portal login, the portal platform processes the submission:
- •Input validation — email format check, phone number normalization, social token verification
- •Deduplication — check if this guest has been seen before (by email, phone, or social ID)
- •Profile creation or update — new guests get a profile created; returning guests get their existing profile updated with a new visit record
- •RADIUS authorization — the portal tells the RADIUS server to authorize the device MAC, and the AP grants internet access
Identity resolution
A single guest may authenticate differently across visits: email on the first visit, social login on the second, WhatsApp on the third. Identity resolution merges these into a single guest profile.
MyWiFi resolves identity using a priority chain: verified email → verified phone → social platform ID → device fingerprint. When a new authentication shares any of these identifiers with an existing profile, the profiles are merged.
This is critical for visit frequency metrics. Without identity resolution, a guest who uses email on visit one and Facebook on visit two appears as two separate guests. With resolution, it is correctly counted as one guest with two visits.
MAC-to-identity binding
The authenticated identity (email, phone, social profile) is bound to the device MAC address for the current session. Because MAC addresses randomize per-network on modern devices, this binding is session-scoped — it identifies the guest during this visit but cannot be used to identify them on future visits without re-authentication.
The "Welcome Back" feature in MyWiFi uses a persistent cookie or browser storage to bypass re-authentication for returning guests on the same device. When the cookie is present, the system silently re-authenticates the guest and updates their profile with a new visit without showing the portal.
Stage 3: Data enrichment
Raw authentication and session data is enriched with derived metrics, device intelligence, and behavioral segmentation.
Device fingerprinting
The device's user agent string, DHCP fingerprint, and HTTP headers are parsed to extract:
- •Device type (smartphone, tablet, laptop, wearable)
- •Operating system and version (iOS 19.1, Android 16, Windows 11)
- •Device manufacturer and model (Apple iPhone 16 Pro, Samsung Galaxy S26)
- •Browser (Safari, Chrome, Samsung Internet)
According to DeviceAtlas's 2025 Mobile Web Intelligence Report, accurate device fingerprinting enables venue operators to understand their guest demographic: iPhone-dominant venues correlate with higher average spend (an Apple retail study reported iPhone users spend 2.4x more on in-app purchases than Android users, suggesting higher disposable income).
Visit history computation
Each new session is appended to the guest's visit history. The enrichment layer computes:
- •Visit frequency — visits per week/month/quarter
- •Average dwell time — mean session duration across all visits
- •Visit regularity — standard deviation of time between visits (identifies "every Tuesday" regulars vs. sporadic visitors)
- •Recency — days since last visit (key churn predictor)
Behavioral segmentation
Based on enriched visit data, guests are automatically segmented:
| Segment | Criteria | Marketing Implication |
|---|---|---|
| New | 1 visit | Welcome sequence, first-visit offer |
| Returning | 2-5 visits | Loyalty program invitation |
| Regular | 6+ visits, weekly cadence | VIP recognition, referral incentives |
| Lapsed | No visit in 30+ days | Win-back campaign |
| High-dwell | Avg session > 60 min | Upsell opportunities, premium content |
| Drive-by | Avg session < 5 min | Different from true guests — may be staff or passersby |
Social profile enrichment
When guests authenticate via social login, the OAuth response may include profile photo, gender, age range, and interests. This data enriches the guest profile without additional form fields. Facebook Graph API returns name, email, and profile photo. Google returns name, email, and locale. LinkedIn returns name, email, job title, and company.
Stage 4: Storage
The data pipeline requires multiple storage layers optimized for different access patterns.
Time-series store (sessions)
Session data (start time, end time, duration, bandwidth, AP) is time-series data. It is written once and queried by time range. Time-series databases (InfluxDB, TimescaleDB) or time-partitioned SQL tables handle this efficiently.
MyWiFi stores session data in a time-partitioned PostgreSQL architecture on AWS. Queries for "sessions in the last 7 days at location X" scan only the relevant partition rather than the entire session history.
Profile store (guests)
Guest profiles are mutable documents updated on every visit. The profile store needs fast reads by guest ID, efficient search across multiple fields (email, phone, name), and support for nested data (visit history array, tag array, custom fields).
Analytics store (aggregates)
Pre-computed aggregates power dashboard widgets without querying raw data. Hourly, daily, and weekly rollups of connection counts, unique visitors, average dwell time, and capture rates are computed by background jobs and stored in an analytics table. Dashboard queries hit the pre-computed aggregates, not the raw session table.
This three-layer storage model (time-series sessions, mutable profiles, pre-computed aggregates) is common across analytics platforms. According to a 2025 Databricks survey, 73% of analytics applications use a similar tiered storage architecture.
Data retention
Storage must align with data retention policy. GDPR requires a defined retention period with automated deletion. MyWiFi supports configurable retention periods per location — the platform automatically purges guest data older than the configured threshold. Typical retention periods: 12 months for active marketing, 24 months for analytics, 36 months for regulatory compliance. See our data retention policy template for implementation guidance.
Stage 5: Activation
Activation is where stored data produces business outcomes. The pipeline's value is measured at this stage.
Email marketing automation
Guest profiles with verified email addresses feed automated email campaigns:
- •Welcome email — triggered by
guest.newevent, sent 1 hour after first visit - •Return incentive — triggered when a guest has not visited in 14 days
- •Birthday offer — triggered by date match on birthday field
- •Review request — triggered 24 hours after a visit of 30+ minutes
According to Mailchimp's 2025 Email Marketing Benchmarks, automated triggered emails produce 8x higher open rates (45% vs. 21%) and 6x higher click rates than broadcast campaigns. WiFi-triggered emails outperform even those benchmarks because they are tied to a real-world action (visiting a venue).
SMS / WhatsApp campaigns
Verified phone numbers from SMS OTP or WhatsApp login enable mobile messaging campaigns. WhatsApp messages see 98% open rates (Meta Business Messaging Report, 2025), compared to 21% for email and 45% for SMS.
MyWiFi's marketing automation workflows support all three channels with trigger-based sequencing.
Ad retargeting
Guest email lists sync to Facebook Custom Audiences and Google Customer Match for paid advertising retargeting. A venue's WiFi guest list becomes a targetable advertising audience.
According to Meta's 2025 Advertising Performance Report, Custom Audiences built from verified first-party data produce 3.2x higher ROAS (Return on Ad Spend) than interest-based targeting.
Webhook-driven integrations
For clients with existing marketing infrastructure, webhooks push real-time events to external systems. The guest.new webhook can trigger workflows in HubSpot, Salesforce, ActiveCampaign, or any HTTP endpoint. See the API integration guide for implementation details.
Analytics and reporting
Aggregated pipeline data feeds analytics dashboards: daily traffic trends, capture rate by authentication method, dwell time distribution, peak hour heatmaps, visit frequency cohorts, and revenue attribution. Automated reports can be scheduled for client delivery.
Pipeline reliability
Handling data gaps
Network interruptions, AP reboots, and connectivity issues create data gaps. A reliable pipeline accounts for this:
- •RADIUS accounting interim updates (every 5 minutes by default) ensure that even if the session-stop record is lost, the pipeline has data up to the last interim update
- •Session timeout detection — if no accounting records arrive for a session within 2x the interim interval, the pipeline infers a disconnect
- •Replay capability — some hardware vendors (Meraki, Aruba) buffer accounting records locally and replay them when WAN connectivity is restored
Deduplication
Network events can produce duplicate records (AP failover, RADIUS retransmission). The pipeline deduplicates using session ID and event timestamp. Duplicate detection prevents inflated session counts and bandwidth metrics.
Data quality monitoring
Production pipelines need monitoring for data quality:
- •Capture rate anomaly detection — if a location's capture rate drops below its 30-day average by more than 2 standard deviations, alert the reseller (likely indicates a portal or AP configuration issue)
- •Session duration outliers — sessions exceeding 24 hours are likely stale sessions from unprocessed disconnect events
- •Authentication failure rate — a spike in authentication failures may indicate a RADIUS misconfiguration or portal error
Scaling considerations
Small deployment (1-10 locations, < 1,000 guests/month)
The pipeline can run on a single server or serverless architecture. Database load is minimal. Batch processing (hourly aggregation) is sufficient for analytics.
Medium deployment (10-100 locations, 10,000-100,000 guests/month)
Database partitioning becomes important. Session tables should be time-partitioned. Analytics aggregation should be incremental (compute only the new data, not reprocess the full history). API rate limits and webhook delivery become concurrent.
Large deployment (100+ locations, 1M+ guests/month)
Stream processing (Kafka, Kinesis) replaces batch processing for real-time analytics. Database sharding by location or region distributes load. Webhook delivery needs a queue (SQS, RabbitMQ) to handle burst traffic. MyWiFi's platform architecture handles this scale for Enterprise resellers.
Building vs. buying
Resellers face a build-vs-buy decision at every pipeline stage. The analysis:
Build the pipeline from scratch: Requires RADIUS server management, portal development, database engineering, analytics computation, and campaign delivery infrastructure. Realistic development timeline: 6-12 months for a minimum viable pipeline. Ongoing maintenance: 1-2 full-time engineers.
Buy (use MyWiFi): The entire five-stage pipeline is managed. Portal configuration takes minutes. Analytics are pre-computed. Campaign automation is built in. API access enables custom extensions at Stage 5 without rebuilding Stages 1-4.
Hybrid: Use MyWiFi for Stages 1-4 (collection through storage), then use the API and webhooks to build custom Stage 5 (activation) integrations tailored to each client's stack.
The hybrid approach is where most sophisticated resellers land. The platform handles the engineering-heavy pipeline infrastructure; the reseller adds value through custom integrations, vertical-specific workflows, and white-labeled reporting.
FAQ
How does MAC randomization affect the data pipeline? MAC randomization means the device MAC changes per network connection. The pipeline cannot use MAC alone as a persistent guest identifier. Captive portal authentication provides the persistent identity. The pipeline binds the randomized MAC to the authenticated profile for session-scoped tracking.
What happens to pipeline data if a guest opts out? The pipeline must support data deletion. When a guest unsubscribes or requests data deletion under GDPR/CCPA, all stages of the pipeline must purge the guest's data — profile, session history, campaign records, and any cached copies. MyWiFi's GDPR-mode automates this.
How real-time is the data pipeline? Portal authentication data is available immediately (sub-second). RADIUS accounting data is available at the interim update interval (typically 5 minutes). Pre-computed analytics update on a schedule (hourly for traffic data, daily for demographic breakdowns). Webhooks fire in real time on portal events.
Can the pipeline handle offline periods? If the WAN connection between the AP and cloud platform drops, guest authentication will fail (guests cannot connect to the portal). Some hardware supports a local fallback mode — granting access without authentication during outages. Session data may be buffered locally and replayed when connectivity returns, depending on the hardware vendor.
What data volume does a typical deployment generate? A single location with 500 daily guests generates approximately 500 guest records per day, 2,000-5,000 session events per day (including interim updates), and 10-50 MB of raw data per day. A 100-location deployment generates approximately 500 GB of raw data per year before aggregation and retention policies.