Data Model Reference (Enterprise Single Tenant)

Last updated 9 days ago

Introduction

If your organization is on an Enterprise Single Tenant plan, you have direct access to your analytics data in Google BigQuery. This gives you the flexibility to:

  • Build custom dashboards in your BI tool of choice (Looker, Tableau, Power BI, Metabase, etc.)

  • Run advanced SQL queries for deep-dive analysis

  • Export data for compliance, auditing, or data science workflows

  • Integrate allmates.ai data with your existing data warehouse

This guide provides a complete reference to the mates_analytics BigQuery dataset, including all tables, views, and relationships.

Architecture Overview

Data Flow

Firestore (Live Data) ↓ Eventarc triggers Cloud Functions (Change Capture) ↓ Streaming inserts all_raw_changelog (Raw Events) ↓ Hourly MERGE queries tbl_*_latest (Physical Tables) ↓ SQL Views dim_* (Dimensions) + fact_* (Facts) ↓ Your BI Tool / Custom Queries

Key Properties

GCP Project

Provided by allmates.ai (e.g., allmates-enterprise-{your-org})

Dataset

mates_analytics

Region

gcp project region

Refresh Cadence

Every 1 hour (incremental updates)

Schema Style

Kimball Dimensional Model (star schema)

Data Retention

Full history (no automatic deletion)

Note: Your dedicated GCP project ID will be provided by allmates.ai during onboarding. Throughout this guide, we use {YOUR_GCP_PROJECT} as a placeholder.

Getting Started

1. Access Requirements

To query the data, you need:

  • GCP Project ID — Provided by allmates.ai

  • IAM Role: roles/bigquery.dataViewer on the mates_analytics dataset

  • IAM Role: roles/bigquery.jobUser on the project to run queries

  • Service Account Key (JSON) for programmatic access

Contact your allmates.ai account manager to request access credentials.

2. Connecting to BigQuery

From the BigQuery Console

  1. Go to BigQuery Console

  2. Select your project: {YOUR_GCP_PROJECT} (provided by allmates.ai)

  3. Navigate to dataset: mates_analytics

  4. Browse tables and views in the left sidebar

From a BI Tool

Most BI tools support BigQuery via:

  • Native connector (Looker, Tableau, Power BI, Metabase)

  • JDBC/ODBC driver (for other tools)

Connection settings:

  • Project ID: {YOUR_GCP_PROJECT} (provided by allmates.ai)

  • Dataset: mates_analytics

  • Location: europe-west1

  • Authentication: Service account JSON key (provided by allmates.ai)

3. Multi-Tenant Data Isolation

All data is scoped by organization_id.

⚠️ Critical: Always filter on organization_id in your queries to:

  • Avoid scanning data from other organizations (if you have multi-org access)

  • Leverage BigQuery clustering for performance

  • Respect data isolation

sql

SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` WHERE organization_id = 'YOUR_ORG_ID' AND event_date >= CURRENT_DATE - 30

How to find your organization_id:

  • Check the URL in the allmates.ai app: app.allmates.ai/o/{organization_id}/...

  • Or query: SELECT organization_id, display_name FROM dim_organization

Data Model Structure

The mates_analytics dataset contains three types of objects:

1. Physical Tables (tbl_*_latest)

18 tables that store the latest state of each entity from Firestore.

  • Refreshed hourly via MERGE operations

  • Contain raw data with minimal transformations

  • Include audit columns: last_synced_at, last_operation, firestore_path

Naming convention: tbl_{entity}_latest

Examples:

  • tbl_organization_latest

  • tbl_message_latest

  • tbl_agent_latest

2. Dimension Views (dim_*)

13 views that sit on top of physical tables and add:

  • Date derivations (e.g., created_date, updated_date)

  • Extracted fields (e.g., connection_id from Firestore references)

  • Cleaner, more stable interfaces for joins

Naming convention: dim_{entity}

Examples:

  • dim_organization

  • dim_member

  • dim_agent

Best practice: Always join facts to dimensions, not to physical tables.

3. Fact Views (fact_*)

9 views that contain business metrics and are the primary tables for analytics.

Each fact has a defined grain (what one row represents).

Naming convention: fact_{metric}

Examples:

  • fact_message_tokens — Token usage and cost per AI message

  • fact_conversation — Conversation-level aggregated metrics

  • fact_tool_calls — Tool invocation events

Best practice: Start your queries from fact views for pre-computed metrics.

Key Tables Reference

Core Entities

dim_organization

Organizations (tenants)

One row per organization

dim_workspace

Workspaces within organizations

One row per workspace

dim_project

Projects (conversations)

One row per conversation

dim_user

Global user accounts

One row per user

dim_member

Organization memberships

One row per user-org pair

dim_agent

AI agents (Mates)

One row per agent

dim_model

LLM model configurations

One row per model

dim_connection

API connections to LLM providers

One row per connection

dim_tool_instance

Tool instances

One row per tool

Fact Tables

fact_message_tokens

Token usage and cost per AI message

One row per agent message

input_tokens, output_tokens, estimated_cost_usd

fact_message

All messages with lineage tracking

One row per message (human + agent)

message_count, chain_depth, message_type

fact_conversation

Conversation-level aggregations

One row per conversation

total_message_count, unique_human_participants, agent_to_human_ratio

fact_tool_calls

Tool invocation events

One row per tool call

status, duration_seconds, cost

fact_attachment

File attachment metrics

One row per attachment

file_size_bytes, token_count, page_count

fact_subscription_snapshot

Active subscriptions (MRR/ARR)

One row per active subscription

mrr_amount, arr_amount

fact_meter_usage

Polar meter usage

One row per meter record

consumed_units, balance_units

Entity Relationships

Hierarchy

Organization ├── Workspace │ └── Project (Conversation) │ ├── Message │ ├── Attachment │ └── Tool Call ├── Member (User in Organization) ├── Agent (Mate) ├── Model ├── Connection └── Tool Instance

Key Foreign Keys

workspace

organization_id

organization.organization_id

project

workspace_id

workspace.workspace_id

member

organization_id

organization.organization_id

member

user_id

user.user_id

agent

organization_id

organization.organization_id

agent

model_id

model.model_id

message

project_id

project.project_id

message

sender_member_id

member.member_id

tools_call

tool_instance_id

tool_instance.tool_instance_id

tools_call

agent_id

agent.agent_id

Common Use Cases & Queries

Note: Replace {YOUR_GCP_PROJECT} with your actual GCP project ID provided by allmates.ai.

1. Token Usage by Organization (Last 30 Days)

Use case: Track AI consumption and costs across organizations.

sql

SELECT o.display_name AS organization_name, COUNT(*) AS message_count, SUM(f.input_tokens) AS total_input_tokens, SUM(f.output_tokens) AS total_output_tokens, SUM(f.total_tokens) AS total_tokens, ROUND(SUM(f.estimated_cost_usd), 2) AS total_cost_usd FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` f JOIN `{YOUR_GCP_PROJECT}.mates_analytics.dim_organization` o USING (organization_id) WHERE f.event_date >= CURRENT_DATE - 30 GROUP BY o.display_name ORDER BY total_cost_usd DESC

2. Daily Active Users

Use case: Track user engagement over time.

sql

SELECT event_date, COUNT(DISTINCT sender_member_id) AS active_users, COUNT(*) AS total_messages, ROUND(COUNT(*) / COUNT(DISTINCT sender_member_id), 1) AS messages_per_user FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message` WHERE organization_id = 'YOUR_ORG_ID' AND sender_type = 'human' AND event_date >= CURRENT_DATE - 30 GROUP BY event_date ORDER BY event_date

3. Most Active Conversations

Use case: Identify high-engagement conversations.

sql

SELECT project_id, conversation_name, total_message_count, unique_human_participants, unique_agents_solicited, TIMESTAMP_DIFF(last_message_at, first_message_at, HOUR) AS duration_hours, activity_status, engagement_category FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_conversation` WHERE organization_id = 'YOUR_ORG_ID' AND activity_status IN ('ACTIVE_NOW', 'ACTIVE_TODAY') ORDER BY total_message_count DESC LIMIT 20

4. Tool Performance Analysis

Use case: Monitor tool reliability and performance.

sql

SELECT tool_name, COUNT(*) AS call_count, COUNTIF(is_successful) AS success_count, ROUND(COUNTIF(is_successful) / COUNT(*) * 100, 1) AS success_rate_pct, ROUND(AVG(duration_seconds), 2) AS avg_duration_sec, ROUND(SUM(cost), 2) AS total_cost_usd FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_tool_calls` WHERE organization_id = 'YOUR_ORG_ID' AND event_date >= CURRENT_DATE - 7 GROUP BY tool_name ORDER BY call_count DESC

5. Cost by LLM Model

Use case: Understand which models are driving costs.

sql

SELECT m.provider AS provider, m.label AS model_name, COUNT(*) AS message_count, SUM(f.total_tokens) AS total_tokens, ROUND(SUM(f.estimated_cost_usd), 2) AS total_cost_usd, ROUND(AVG(f.total_tokens), 0) AS avg_tokens_per_response FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` f JOIN `{YOUR_GCP_PROJECT}.mates_analytics.dim_model` m ON f.model_id = m.model_id WHERE f.organization_id = 'YOUR_ORG_ID' AND f.event_date >= CURRENT_DATE - 30 GROUP BY m.provider, m.label ORDER BY total_cost_usd DESC

6. Multi-Agent Chain Analysis

Use case: Understand agent-to-agent collaboration patterns.

sql

SELECT project_id, COUNT(*) AS total_agent_messages, COUNTIF(is_agent_response) AS direct_responses, COUNTIF(is_in_agent_chain) AS chain_messages, MAX(messages_since_last_human) AS max_chain_depth, ROUND(AVG(messages_since_last_human), 1) AS avg_chain_depth FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message` WHERE organization_id = 'YOUR_ORG_ID' AND event_date >= CURRENT_DATE - 7 AND sender_type = 'agent' GROUP BY project_id HAVING COUNT(*) > 10 ORDER BY max_chain_depth DESC LIMIT 20

7. File Upload Trends

Use case: Track file usage and storage consumption.

sql

SELECT event_date, COUNT(*) AS files_uploaded, ROUND(SUM(file_size_mb), 2) AS total_size_mb, ROUND(AVG(file_size_mb), 2) AS avg_file_size_mb, SUM(token_count) AS total_tokens_extracted FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_attachment` WHERE organization_id = 'YOUR_ORG_ID' AND event_date >= CURRENT_DATE - 30 GROUP BY event_date ORDER BY event_date

8. Monthly Recurring Revenue (MRR)

Use case: Track subscription revenue.

sql

SELECT o.display_name AS organization_name, f.product_name, SUM(f.mrr_amount) AS total_mrr_usd, SUM(f.arr_amount) AS total_arr_usd, COUNT(*) AS active_subscriptions FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_subscription_snapshot` f JOIN `{YOUR_GCP_PROJECT}.mates_analytics.dim_organization` o USING (organization_id) WHERE f.is_active = TRUE GROUP BY o.display_name, f.product_name ORDER BY total_mrr_usd DESC

Performance Optimization Tips

1. Always Filter on organization_id

All tables are clustered on organization_id as the first clustering column. Always include it in your WHERE clause for optimal performance.

sql

-- ✅ Good: Uses clustering SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` WHERE organization_id = 'YOUR_ORG_ID' AND event_date >= CURRENT_DATE - 7 -- ❌ Bad: Full table scan SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` WHERE sender_display_name = 'Alice'

2. Use Date Filters

Most fact tables include an event_date column. Always filter on date ranges to reduce data scanned.

sql

-- ✅ Good: Scans only 30 days WHERE event_date >= CURRENT_DATE - 30 -- ❌ Bad: Scans entire history WHERE event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)

3. Query Fact Views, Not Physical Tables

Fact views pre-compute joins, categories, and metrics. They are optimized for analytics queries.

sql

-- ✅ Good: Use fact view SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` WHERE organization_id = 'YOUR_ORG_ID' -- ❌ Bad: Manual joins SELECT m.*, mm.input_tokens, mm.output_tokens FROM `{YOUR_GCP_PROJECT}.mates_analytics.tbl_message_latest` m LEFT JOIN `{YOUR_GCP_PROJECT}.mates_analytics.tbl_message_metadata_latest` mm ON m.message_id = mm.message_id WHERE m.organization_id = 'YOUR_ORG_ID'

4. Leverage Clustering Columns

Each table has specific clustering columns for optimal query performance.

Key clustering columns:

tbl_message_latest

organization_id, project_id

tbl_agent_latest

organization_id, status

tbl_model_latest

organization_id, provider

tbl_tools_call_latest

organization_id, status

Example:

sql

-- ✅ Good: Uses both clustering columns SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.tbl_message_latest` WHERE organization_id = 'YOUR_ORG_ID' AND project_id = 'PROJECT_123' -- ⚠️ OK: Uses first clustering column only SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.tbl_message_latest` WHERE organization_id = 'YOUR_ORG_ID'

5. Use LIMIT for Exploratory Queries

When exploring data, always use LIMIT to avoid scanning large datasets.

sql

-- ✅ Good: Limits results SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` WHERE organization_id = 'YOUR_ORG_ID' LIMIT 100 -- ❌ Bad: Scans entire table SELECT * FROM `{YOUR_GCP_PROJECT}.mates_analytics.fact_message_tokens` WHERE organization_id = 'YOUR_ORG_ID'

Data Freshness & Updates

Refresh Schedule

  • Physical tables (tbl_*_latest): Updated every 1 hour via MERGE operations

  • Dimension views (dim_*): Real-time (views on top of physical tables)

  • Fact views (fact_*): Real-time (views on top of physical tables)

Data Latency

  • Typical latency: 1–2 hours from event occurrence to availability in BigQuery

  • Peak latency: Up to 3 hours during high-volume periods

Checking Last Update

Each physical table has a last_synced_at column showing when the row was last updated.

sql

SELECT organization_id, MAX(last_synced_at) AS last_update FROM `{YOUR_GCP_PROJECT}.mates_analytics.tbl_message_latest` WHERE organization_id = 'YOUR_ORG_ID' GROUP BY organization_id

Cost Estimation

Pricing Model

BigQuery charges based on:

  • Storage: $0.02 per GB per month (long-term storage: $0.01 per GB)

  • Queries: $5 per TB scanned

Typical Query Costs

Single org, 7 days

~10 MB

$0.00005

Single org, 30 days

~50 MB

$0.00025

Single org, 1 year

~500 MB

$0.0025

All orgs, 30 days

~5 GB

$0.025

Cost optimization tips:

  • Always filter on organization_id and event_date

  • Use LIMIT for exploratory queries

  • Avoid SELECT * — specify only needed columns

  • Use materialized views for frequently run queries

Appendix: Complete Table List

Physical Tables (18)

  1. tbl_organization_latest

  2. tbl_workspace_latest

  3. tbl_project_latest

  4. tbl_user_latest

  5. tbl_member_latest

  6. tbl_agent_latest

  7. tbl_model_latest

  8. tbl_connection_latest

  9. tbl_tool_instance_latest

  10. tbl_message_latest

  11. tbl_message_metadata_latest

  12. tbl_chat_metadata_latest

  13. tbl_tools_call_latest

  14. tbl_attachment_latest

  15. tbl_polar_customer_latest

  16. tbl_polar_subscription_latest

  17. tbl_polar_meter_latest

  18. tbl_invitation_latest

Dimension Views (13)

  1. dim_organization

  2. dim_workspace

  3. dim_project

  4. dim_user

  5. dim_member

  6. dim_agent

  7. dim_model

  8. dim_connection

  9. dim_tool_instance

  10. dim_provider

  11. dim_customer

  12. dim_subscription

  13. dim_invitation

Fact Views (9)

  1. fact_message_tokens

  2. fact_message

  3. fact_conversation

  4. fact_tool_calls

  5. fact_chat_tokens

  6. fact_attachment

  7. fact_subscription_snapshot

  8. fact_meter_usage

  9. fact_message_errors

Reference Tables (1)

  1. ref_model_pricing