Data Model Reference (Enterprise Single Tenant)

Last updated 16 days ago

Introduction

If your organization is on an Enterprise Single Tenant plan, you have direct access to your analytics data in Google BigQuery. This gives you the flexibility to:

Build custom dashboards in your BI tool of choice (Looker, Tableau, Power BI, Metabase, etc.)
Run advanced SQL queries for deep-dive analysis
Export data for compliance, auditing, or data science workflows
Integrate allmates.ai data with your existing data warehouse

This guide provides a complete reference to the mates_analytics BigQuery dataset, including all tables, views, and relationships.

Architecture Overview

Data Flow

Firestore (Live Data) ↓ Eventarc triggers Cloud Functions (Change Capture) ↓ Streaming inserts all_raw_changelog (Raw Events) ↓ Hourly MERGE queries tbl_*_latest (Physical Tables) ↓ SQL Views dim_* (Dimensions) + fact_* (Facts) ↓ Your BI Tool / Custom Queries

Key Properties


GCP Project	Provided by allmates.ai (e.g., `allmates-enterprise-{your-org}`)
Dataset	`mates_analytics`
Region	`gcp project region`
Refresh Cadence	Every 1 hour (incremental updates)
Schema Style	Kimball Dimensional Model (star schema)
Data Retention	Full history (no automatic deletion)

Note: Your dedicated GCP project ID will be provided by allmates.ai during onboarding. Throughout this guide, we use {YOUR_GCP_PROJECT} as a placeholder.

Getting Started

1. Access Requirements

To query the data, you need:

GCP Project ID — Provided by allmates.ai
IAM Role: roles/bigquery.dataViewer on the mates_analytics dataset
IAM Role: roles/bigquery.jobUser on the project to run queries
Service Account Key (JSON) for programmatic access

Contact your allmates.ai account manager to request access credentials.

2. Connecting to BigQuery

From the BigQuery Console

Go to BigQuery Console
Select your project: {YOUR_GCP_PROJECT} (provided by allmates.ai)
Navigate to dataset: mates_analytics
Browse tables and views in the left sidebar

From a BI Tool

Most BI tools support BigQuery via:

Native connector (Looker, Tableau, Power BI, Metabase)
JDBC/ODBC driver (for other tools)

Connection settings:

Project ID: {YOUR_GCP_PROJECT} (provided by allmates.ai)
Dataset: mates_analytics
Location: europe-west1
Authentication: Service account JSON key (provided by allmates.ai)

3. Multi-Tenant Data Isolation

All data is scoped by organization_id.

⚠️ Critical: Always filter on organization_id in your queries to:

Avoid scanning data from other organizations (if you have multi-org access)
Leverage BigQuery clustering for performance
Respect data isolation

sql

SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` WHERE organization_id = 'YOUR_ORG_ID' AND event_date >= CURRENT_DATE - 30

How to find your organization_id:

Check the URL in the allmates.ai app: app.allmates.ai/o/{organization_id}/...
Or query: SELECT organization_id, display_name FROM dim_organization

Data Model Structure

The mates_analytics dataset contains three types of objects:

1. Physical Tables (`tbl_*_latest`)

18 tables that store the latest state of each entity from Firestore.

Refreshed hourly via MERGE operations
Contain raw data with minimal transformations
Include audit columns: last_synced_at, last_operation, firestore_path

Naming convention: tbl_{entity}_latest

Examples:

tbl_organization_latest
tbl_message_latest
tbl_agent_latest

2. Dimension Views (`dim_*`)

13 views that sit on top of physical tables and add:

Date derivations (e.g., created_date, updated_date)
Extracted fields (e.g., connection_id from Firestore references)
Cleaner, more stable interfaces for joins

Naming convention: dim_{entity}

Examples:

dim_organization
dim_member
dim_agent

Best practice: Always join facts to dimensions, not to physical tables.

3. Fact Views (`fact_*`)

12 views that contain business metrics and are the primary tables for analytics.

Each fact has a defined grain (what one row represents).

Naming convention: fact_{metric}

Examples:

fact_message_tokens — Token usage and cost per AI message
fact_conversation — Conversation-level aggregated metrics
fact_tool_calls — Tool invocation events
fact_message_cost — Unified, canonical cost per message (real provider billing when available, otherwise estimated from historized model prices)

Best practice: Start your queries from fact views for pre-computed metrics.

Key Tables Reference

Core Entities


`dim_organization`	Organizations (tenants)	One row per organization
`dim_workspace`	Workspaces within organizations	One row per workspace
`dim_project`	Projects (conversations)	One row per conversation
`dim_user`	Global user accounts	One row per user
`dim_member`	Organization memberships	One row per user-org pair
`dim_agent`	AI agents (Mates)	One row per agent
`dim_model`	LLM model configurations	One row per model
`dim_connection`	API connections to LLM providers	One row per connection
`dim_tool_instance`	Tool instances	One row per tool

Fact Tables


`fact_message_tokens`	Token usage and cost per AI message	One row per agent message	`input_tokens`, `output_tokens`, `estimated_cost_usd`
`fact_message`	All messages with lineage tracking	One row per message (human + agent)	`message_count`, `chain_depth`, `message_type`
`fact_conversation`	Conversation-level aggregations	One row per conversation	`total_message_count`, `unique_human_participants`, `agent_to_human_ratio`
`fact_tool_calls`	Tool invocation events	One row per tool call	`status`, `duration_seconds`, `cost`
`fact_attachment`	File attachment metrics	One row per attachment	`file_size_bytes`, `token_count`, `page_count`
`fact_subscription_snapshot`	Active subscriptions (MRR/ARR)	One row per active subscription	`mrr_amount`, `arr_amount`
`fact_meter_usage`	Polar meter usage	One row per meter record	`consumed_units`, `balance_units`

Entity Relationships

Hierarchy

Organization ├── Workspace │ └── Project (Conversation) │ ├── Message │ ├── Attachment │ └── Tool Call ├── Member (User in Organization) ├── Agent (Mate) ├── Model ├── Connection └── Tool Instance

Key Foreign Keys

`workspace`	`organization_id`	`organization.organization_id`
`project`	`workspace_id`	`workspace.workspace_id`
`member`	`organization_id`	`organization.organization_id`
`member`	`user_id`	`user.user_id`
`agent`	`organization_id`	`organization.organization_id`
`agent`	`model_id`	`model.model_id`
`message`	`project_id`	`project.project_id`
`message`	`sender_member_id`	`member.member_id`
`tools_call`	`tool_instance_id`	`tool_instance.tool_instance_id`
`tools_call`	`agent_id`	`agent.agent_id`

Common Use Cases & Queries

Note: Replace {YOUR_GCP_PROJECT} with your actual GCP project ID provided by allmates.ai.

1. Token Usage by Organization (Last 30 Days)

Use case: Track AI consumption and costs across organizations.

sql

SELECT o.display_name AS organization_name, COUNT(*) AS message_count, SUM(f.input_tokens) AS total_input_tokens, SUM(f.output_tokens) AS total_output_tokens, SUM(f.total_tokens) AS total_tokens, ROUND(SUM(f.estimated_cost_usd), 2) AS total_cost_usd FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` f JOIN `{YOUR_GCP_PROJECT}.mates_analytics.dim_organization` o USING (organization_id) WHERE f.event_date >= CURRENT_DATE - 30 GROUP BY o.display_name ORDER BY total_cost_usd DESC

2. Daily Active Users

Use case: Track user engagement over time.

sql

SELECT event_date, COUNT(DISTINCT sender_member_id) AS active_users, COUNT(*) AS total_messages, ROUND(COUNT(*) / COUNT(DISTINCT sender_member_id), 1) AS messages_per_user FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message` WHERE organization_id = 'YOUR_ORG_ID' AND sender_type = 'human' AND event_date >= CURRENT_DATE - 30 GROUP BY event_date ORDER BY event_date

3. Most Active Conversations

Use case: Identify high-engagement conversations.

sql

SELECT project_id, conversation_name, total_message_count, unique_human_participants, unique_agents_solicited, TIMESTAMP_DIFF(last_message_at, first_message_at, HOUR) AS duration_hours, activity_status, engagement_category FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_conversation` WHERE organization_id = 'YOUR_ORG_ID' AND activity_status IN ('ACTIVE_NOW', 'ACTIVE_TODAY') ORDER BY total_message_count DESC LIMIT 20

4. Tool Performance Analysis

Use case: Monitor tool reliability and performance.

sql

SELECT tool_name, COUNT(*) AS call_count, COUNTIF(is_successful) AS success_count, ROUND(COUNTIF(is_successful) / COUNT(*) * 100, 1) AS success_rate_pct, ROUND(AVG(duration_seconds), 2) AS avg_duration_sec, ROUND(SUM(cost), 2) AS total_cost_usd FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_tool_calls` WHERE organization_id = 'YOUR_ORG_ID' AND event_date >= CURRENT_DATE - 7 GROUP BY tool_name ORDER BY call_count DESC

5. Cost by LLM Model

Use case: Understand which models are driving costs.

sql

SELECT provider, physical_model_label AS model_name, COUNT(*) AS message_count, SUM(total_tokens) AS total_tokens, ROUND(SUM(final_total_cost_usd), 2) AS total_cost_usd, ROUND(AVG(total_tokens), 0) AS avg_tokens_per_response FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_cost` WHERE organization_id = 'YOUR_ORG_ID' AND created_date >= CURRENT_DATE - 30 GROUP BY provider, physical_model_label ORDER BY total_cost_usd DESC

6. Multi-Agent Chain Analysis

Use case: Understand agent-to-agent collaboration patterns.

sql

SELECT project_id, COUNT(*) AS total_agent_messages, COUNTIF(is_agent_response) AS direct_responses, COUNTIF(is_in_agent_chain) AS chain_messages, MAX(messages_since_last_human) AS max_chain_depth, ROUND(AVG(messages_since_last_human), 1) AS avg_chain_depth FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message` WHERE organization_id = 'YOUR_ORG_ID' AND event_date >= CURRENT_DATE - 7 AND sender_type = 'agent' GROUP BY project_id HAVING COUNT(*) > 10 ORDER BY max_chain_depth DESC LIMIT 20

7. File Upload Trends

Use case: Track file usage and storage consumption.

sql

SELECT event_date, COUNT(*) AS files_uploaded, ROUND(SUM(file_size_mb), 2) AS total_size_mb, ROUND(AVG(file_size_mb), 2) AS avg_file_size_mb, SUM(token_count) AS total_tokens_extracted FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_attachment` WHERE organization_id = 'YOUR_ORG_ID' AND event_date >= CURRENT_DATE - 30 GROUP BY event_date ORDER BY event_date

8. Monthly Recurring Revenue (MRR)

Use case: Track subscription revenue.

sql

SELECT o.display_name AS organization_name, f.product_name, SUM(f.mrr_amount) AS total_mrr_usd, SUM(f.arr_amount) AS total_arr_usd, COUNT(*) AS active_subscriptions FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_subscription_snapshot` f JOIN `{YOUR_GCP_PROJECT}.mates_analytics.dim_organization` o USING (organization_id) WHERE f.is_active = TRUE GROUP BY o.display_name, f.product_name ORDER BY total_mrr_usd DESC

Performance Optimization Tips

1. Always Filter on `organization_id`

All tables are clustered on organization_id as the first clustering column. Always include it in your WHERE clause for optimal performance.

sql

-- ✅ Good: Uses clustering SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` WHERE organization_id = 'YOUR_ORG_ID' AND event_date >= CURRENT_DATE - 7 -- ❌ Bad: Full table scan SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` WHERE sender_display_name = 'Alice'

2. Use Date Filters

Most fact tables include an event_date column. Always filter on date ranges to reduce data scanned.

sql

-- ✅ Good: Scans only 30 days WHERE event_date >= CURRENT_DATE - 30 -- ❌ Bad: Scans entire history WHERE event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)

3. Query Fact Views, Not Physical Tables

Fact views pre-compute joins, categories, and metrics. They are optimized for analytics queries.

sql

-- ✅ Good: Use fact view SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` WHERE organization_id = 'YOUR_ORG_ID' -- ❌ Bad: Manual joins SELECT m.*, mm.input_tokens, mm.output_tokens FROM `{YOUR_GCP_PROJECT}.mates_analytics.tbl_message_latest` m LEFT JOIN `{YOUR_GCP_PROJECT}.mates_analytics.tbl_message_metadata_latest` mm ON m.message_id = mm.message_id WHERE m.organization_id = 'YOUR_ORG_ID'

4. Leverage Clustering Columns

Each table has specific clustering columns for optimal query performance.

Key clustering columns:


`tbl_message_latest`	`organization_id`, `project_id`
`tbl_agent_latest`	`organization_id`, `status`
`tbl_model_latest`	`organization_id`, `provider`
`tbl_tools_call_latest`	`organization_id`, `status`

Example:

sql

-- ✅ Good: Uses both clustering columns SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.tbl_message_latest` WHERE organization_id = 'YOUR_ORG_ID' AND project_id = 'PROJECT_123' -- ⚠️ OK: Uses first clustering column only SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.tbl_message_latest` WHERE organization_id = 'YOUR_ORG_ID'

5. Use `LIMIT` for Exploratory Queries

When exploring data, always use LIMIT to avoid scanning large datasets.

sql

-- ✅ Good: Limits results SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` WHERE organization_id = 'YOUR_ORG_ID' LIMIT 100 -- ❌ Bad: Scans entire table SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` WHERE organization_id = 'YOUR_ORG_ID'

Data Freshness & Updates

Refresh Schedule

Physical tables (tbl_*_latest): Updated every 1 hour via MERGE operations
Dimension views (dim_*): Real-time (views on top of physical tables)
Fact views (fact_*): Real-time (views on top of physical tables)

Data Latency

Typical latency: 1–2 hours from event occurrence to availability in BigQuery
Peak latency: Up to 3 hours during high-volume periods

Checking Last Update

Each physical table has a last_synced_at column showing when the row was last updated.

sql

SELECT organization_id, MAX(last_synced_at) AS last_update FROM `{YOUR_GCP_PROJECT}.mates_analytics.tbl_message_latest` WHERE organization_id = 'YOUR_ORG_ID' GROUP BY organization_id

Cost Estimation

Pricing Model

BigQuery charges based on:

Storage: $0.02 per GB per month (long-term storage: $0.01 per GB)
Queries: $5 per TB scanned

Typical Query Costs


Single org, 7 days	~10 MB	$0.00005
Single org, 30 days	~50 MB	$0.00025
Single org, 1 year	~500 MB	$0.0025
All orgs, 30 days	~5 GB	$0.025

Cost optimization tips:

Always filter on organization_id and event_date
Use LIMIT for exploratory queries
Avoid SELECT * — specify only needed columns
Use materialized views for frequently run queries

Appendix: Complete Table List

Physical Tables (21)

tbl_organization_latest
tbl_workspace_latest
tbl_project_latest
tbl_user_latest
tbl_member_latest
tbl_agent_latest
tbl_model_latest
tbl_connection_latest
tbl_tool_instance_latest
tbl_message_latest
tbl_message_metadata_latest
tbl_chat_metadata_latest
tbl_tools_call_latest
tbl_attachment_latest
tbl_polar_customer_latest
tbl_polar_subscription_latest
tbl_polar_meter_latest
tbl_invitation_latest
tbl_model_daily
tbl_polar_event_latest
tbl_polar_credit_purchase

Dimension Views (14)

dim_organization
dim_workspace
dim_project
dim_user
dim_member
dim_agent
dim_model
dim_connection
dim_tool_instance
dim_provider
dim_customer
dim_subscription
dim_invitation
dim_org_pricing

Fact Views (12)

fact_message_tokens
fact_message
fact_conversation
fact_tool_calls
fact_chat_tokens
fact_attachment
fact_subscription_snapshot
fact_meter_usage
fact_message_errors
fact_message_cost
fact_polar_credit_usage
fact_org_daily_credit_price

Reference Tables (1)

ref_credit_pricing

Introduction

Architecture Overview

Data Flow

Key Properties

Getting Started

1. Access Requirements

2. Connecting to BigQuery

From the BigQuery Console

From a BI Tool

3. Multi-Tenant Data Isolation

Data Model Structure

1. Physical Tables (tbl_*_latest)

2. Dimension Views (dim_*)

3. Fact Views (fact_*)

Key Tables Reference

Core Entities

Fact Tables

Entity Relationships

Hierarchy

Key Foreign Keys

Common Use Cases & Queries

1. Token Usage by Organization (Last 30 Days)

2. Daily Active Users

3. Most Active Conversations

4. Tool Performance Analysis

5. Cost by LLM Model

6. Multi-Agent Chain Analysis

7. File Upload Trends

8. Monthly Recurring Revenue (MRR)

Performance Optimization Tips

1. Always Filter on organization_id

2. Use Date Filters

3. Query Fact Views, Not Physical Tables

4. Leverage Clustering Columns

5. Use LIMIT for Exploratory Queries

Data Freshness & Updates

Refresh Schedule

Data Latency

Checking Last Update

Cost Estimation

Pricing Model

Typical Query Costs

Appendix: Complete Table List

Physical Tables (21)

Dimension Views (14)

Fact Views (12)

Reference Tables (1)

1. Physical Tables (`tbl_*_latest`)

2. Dimension Views (`dim_*`)

3. Fact Views (`fact_*`)

1. Always Filter on `organization_id`

5. Use `LIMIT` for Exploratory Queries