SQL-CTE-Common-Table-Expression-详解
目录
SQL CTE (Common Table Expression) 详解
[
『Java分布式系统开发:从理论到实践』征文活动
10w+人浏览
289人参与
](
)
CTE (Common Table Expression) 详解
目录
基础概念
定义
CTE(Common Table Expression,公用表表达式)是SQL中的"命名临时结果集",通过 WITH
关键字定义,仅在当前查询中生效。
核心作用:
- 简化复杂查询:将复杂逻辑分解为多个步骤
- 提高可读性:使SQL语句更易理解和维护
- 复用子查询结果:避免重复计算相同的子查询
本质特性
- 非物理存储:不是物理表,不存储在磁盘上
- 临时性:查询执行过程中生成的虚拟结果集
- 作用域限制:仅在定义它的查询语句中有效
- 自动销毁:查询结束后自动清理
基本语法结构
WITH cte_name [(column_list)] AS (
-- CTE定义查询
SELECT ...
)
-- 主查询
SELECT ... FROM cte_name ...;
CTE类型详解
非递归CTE(普通CTE)
特点
- 使用
WITH
关键字 - 单一子查询,不引用自身
- 一次性执行,结果供主查询使用
基础示例
-- 示例1:计算订单统计信息
WITH order_stats AS (
SELECT
AVG(amount) as avg_amount,
MAX(amount) as max_amount,
COUNT(*) as total_orders
FROM orders
WHERE order_date >= '2024-01-01'
)
SELECT
o.order_id,
o.amount,
os.avg_amount,
CASE
WHEN o.amount > os.avg_amount THEN '高于平均'
ELSE '低于平均'
END as amount_category
FROM orders o
CROSS JOIN order_stats os
WHERE o.order_date >= '2024-01-01';
多个CTE示例
-- 示例2:多个CTE协同工作
WITH
high_value_customers AS (
SELECT customer_id, SUM(amount) as total_spent
FROM orders
GROUP BY customer_id
HAVING SUM(amount) > 10000
),
recent_orders AS (
SELECT customer_id, COUNT(*) as recent_order_count
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY customer_id
)
SELECT
c.customer_name,
hvc.total_spent,
COALESCE(ro.recent_order_count, 0) as recent_orders
FROM customers c
JOIN high_value_customers hvc ON c.customer_id = hvc.customer_id
LEFT JOIN recent_orders ro ON c.customer_id = ro.customer_id
ORDER BY hvc.total_spent DESC;
递归CTE
核心结构
递归CTE必须包含两个部分:
- 锚点成员(Anchor Member):递归的起始点,非递归查询
- 递归成员(Recursive Member):引用CTE自身的查询
执行逻辑
- 执行锚点成员,获得初始结果集
- 递归成员使用当前结果集查询新数据
- 将新结果添加到结果集中
- 重复步骤2-3,直到递归成员返回空结果
- 返回完整的结果集
基础递归示例
-- 示例1:生成数字序列
WITH RECURSIVE number_series AS (
-- 锚点成员:起始值
SELECT 1 as n
UNION ALL
-- 递归成员:递增逻辑
SELECT n + 1
FROM number_series
WHERE n < 10 -- 终止条件
)
SELECT * FROM number_series;
树形结构查询示例
-- 示例2:组织架构查询(查找某员工及其所有下属)
WITH RECURSIVE employee_hierarchy AS (
-- 锚点成员:指定的管理者
SELECT
employee_id,
employee_name,
manager_id,
0 as level,
CAST(employee_name AS VARCHAR(1000)) as path
FROM employees
WHERE employee_id = 1001 -- 起始员工ID
UNION ALL
-- 递归成员:查找下属
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
eh.level + 1,
CAST(eh.path || ' -> ' || e.employee_name AS VARCHAR(1000))
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
WHERE eh.level < 5 -- 防止无限递归
)
SELECT
employee_id,
employee_name,
level,
path as hierarchy_path
FROM employee_hierarchy
ORDER BY level, employee_name;
向上追溯示例
-- 示例3:向上追溯管理链
WITH RECURSIVE management_chain AS (
-- 锚点成员:指定员工
SELECT
employee_id,
employee_name,
manager_id,
0 as level_up
FROM employees
WHERE employee_id = 2001 -- 起始员工
UNION ALL
-- 递归成员:查找上级管理者
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
mc.level_up + 1
FROM employees e
JOIN management_chain mc ON e.employee_id = mc.manager_id
)
SELECT * FROM management_chain ORDER BY level_up;
语法与执行机制
PostgreSQL CTE执行机制
物化控制
PostgreSQL提供了对CTE物化的精确控制:
-- 强制物化(默认行为)
WITH cte_name AS MATERIALIZED (
SELECT expensive_calculation() FROM large_table
)
SELECT * FROM cte_name
UNION ALL
SELECT * FROM cte_name; -- 复用已计算的结果
-- 禁止物化(内联优化)
WITH cte_name AS NOT MATERIALIZED (
SELECT * FROM small_table WHERE condition
)
SELECT * FROM cte_name WHERE additional_condition;
执行计划分析
-- 查看CTE执行计划
EXPLAIN (ANALYZE, BUFFERS)
WITH sales_summary AS (
SELECT
product_id,
SUM(quantity) as total_quantity,
SUM(amount) as total_amount
FROM sales
WHERE sale_date >= '2024-01-01'
GROUP BY product_id
)
SELECT
p.product_name,
ss.total_quantity,
ss.total_amount
FROM products p
JOIN sales_summary ss ON p.product_id = ss.product_id;
递归CTE的终止机制
自动终止条件
- 递归成员返回空结果集
- 达到系统递归深度限制
- 满足用户定义的终止条件
防止无限递归的策略
-- 策略1:使用计数器限制递归深度
WITH RECURSIVE limited_recursion AS (
SELECT id, parent_id, name, 0 as depth
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.parent_id, c.name, lr.depth + 1
FROM categories c
JOIN limited_recursion lr ON c.parent_id = lr.id
WHERE lr.depth < 10 -- 限制最大深度
)
SELECT * FROM limited_recursion;
-- 策略2:使用路径检测避免循环
WITH RECURSIVE path_tracking AS (
SELECT
id,
parent_id,
name,
ARRAY[id] as path
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT
c.id,
c.parent_id,
c.name,
pt.path || c.id
FROM categories c
JOIN path_tracking pt ON c.parent_id = pt.id
WHERE NOT (c.id = ANY(pt.path)) -- 避免循环
)
SELECT * FROM path_tracking;
性能考虑与优化
CTE vs 子查询性能对比
何时使用CTE
-- ✅ 推荐:需要多次引用相同结果时
WITH expensive_calc AS (
SELECT
customer_id,
complex_calculation(data) as result
FROM large_table
WHERE complex_condition
)
SELECT c1.customer_id, c1.result, c2.result
FROM expensive_calc c1
JOIN expensive_calc c2 ON c1.customer_id = c2.customer_id + 1;
-- ❌ 不推荐:简单的一次性查询
SELECT * FROM (
SELECT * FROM small_table WHERE simple_condition
) subquery;
性能优化技巧
1. 合理使用索引
-- 确保递归CTE中的连接字段有索引
CREATE INDEX idx_categories_parent_id ON categories(parent_id);
CREATE INDEX idx_employees_manager_id ON employees(manager_id);
-- 在递归查询中使用索引友好的条件
WITH RECURSIVE category_tree AS (
SELECT id, parent_id, name, 0 as level
FROM categories
WHERE id = 1 -- 使用主键,利用主键索引
UNION ALL
SELECT c.id, c.parent_id, c.name, ct.level + 1
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.id -- 利用外键索引
WHERE ct.level < 5
)
SELECT * FROM category_tree;
2. 控制递归深度
-- 设置合理的递归深度限制
SET max_stack_depth = '2MB'; -- PostgreSQL
-- 或在查询中使用WHERE条件限制深度
3. 优化数据类型和字段选择
-- ✅ 只选择必要的字段
WITH RECURSIVE slim_hierarchy AS (
SELECT id, parent_id, level -- 只选择必要字段
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.parent_id, sh.level + 1
FROM categories c
JOIN slim_hierarchy sh ON c.parent_id = sh.id
WHERE sh.level < 10
)
SELECT sh.id, sh.level, c.name -- 在最后再JOIN获取详细信息
FROM slim_hierarchy sh
JOIN categories c ON sh.id = c.id;
内存使用优化
-- 大数据量递归查询的分批处理
WITH RECURSIVE batch_process AS (
SELECT id, parent_id, name, 0 as level, 0 as batch_num
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.parent_id, c.name, bp.level + 1,
CASE WHEN bp.level % 1000 = 0 THEN bp.batch_num + 1
ELSE bp.batch_num END
FROM categories c
JOIN batch_process bp ON c.parent_id = bp.id
WHERE bp.level < 10000 AND bp.batch_num < 10
)
SELECT * FROM batch_process;
跨数据库支持
主流数据库CTE支持对比
数据库 | 非递归CTE | 递归CTE | 关键差异 | 版本要求 |
---|---|---|---|---|
PostgreSQL | ✅ | ✅ (WITH RECURSIVE ) | 标准实现,支持物化控制 | 8.4+ |
MySQL | ✅ | ✅ (WITH RECURSIVE ) | 8.0后支持,语法与PostgreSQL一致 | 8.0+ |
SQL Server | ✅ | ✅ (WITH ) | 递归不需要RECURSIVE关键字 | 2005+ |
Oracle | ✅ | ✅ (WITH ) | 支持子查询因子化 | 9i+ |
SQLite | ✅ | ✅ (WITH RECURSIVE ) | 轻量实现 | 3.8.3+ |
数据库特定语法示例
SQL Server
-- SQL Server递归CTE(无需RECURSIVE关键字)
WITH employee_cte AS (
-- 锚点成员
SELECT employee_id, manager_id, employee_name, 0 as level
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- 递归成员
SELECT e.employee_id, e.manager_id, e.employee_name, ec.level + 1
FROM employees e
INNER JOIN employee_cte ec ON e.manager_id = ec.employee_id
)
SELECT * FROM employee_cte
OPTION (MAXRECURSION 100); -- SQL Server特有的递归限制语法
Oracle
-- Oracle的CTE(子查询因子化)
WITH
sales_data AS (
SELECT product_id, SUM(amount) as total_sales
FROM sales
WHERE sale_date >= DATE '2024-01-01'
GROUP BY product_id
),
product_info AS (
SELECT product_id, product_name, category_id
FROM products
)
SELECT pi.product_name, sd.total_sales
FROM product_info pi
JOIN sales_data sd ON pi.product_id = sd.product_id
ORDER BY sd.total_sales DESC;
MySQL 8.0+
-- MySQL递归CTE
WITH RECURSIVE fibonacci AS (
SELECT 0 as n, 0 as fib_n, 1 as fib_n_plus_1
UNION ALL
SELECT n + 1, fib_n_plus_1, fib_n + fib_n_plus_1
FROM fibonacci
WHERE n < 20
)
SELECT n, fib_n FROM fibonacci;
兼容性处理策略
旧版本MySQL替代方案
-- MySQL 5.x 使用临时表替代CTE
-- 替代普通CTE
CREATE TEMPORARY TABLE temp_order_stats AS
SELECT AVG(amount) as avg_amount FROM orders;
SELECT o.*, t.avg_amount
FROM orders o
CROSS JOIN temp_order_stats t
WHERE o.amount > t.avg_amount;
DROP TEMPORARY TABLE temp_order_stats;
-- 替代递归CTE(使用存储过程)
DELIMITER //
CREATE PROCEDURE GetEmployeeHierarchy(IN root_id INT)
BEGIN
CREATE TEMPORARY TABLE temp_hierarchy (
employee_id INT,
level INT
);
INSERT INTO temp_hierarchy VALUES (root_id, 0);
SET @level = 0;
WHILE ROW_COUNT() > 0 AND @level < 10 DO
INSERT INTO temp_hierarchy
SELECT e.employee_id, @level + 1
FROM employees e
JOIN temp_hierarchy th ON e.manager_id = th.employee_id
WHERE th.level = @level;
SET @level = @level + 1;
END WHILE;
SELECT * FROM temp_hierarchy;
DROP TEMPORARY TABLE temp_hierarchy;
END //
DELIMITER ;
实际应用场景
1. 数据分析与报表
销售漏斗分析
WITH sales_funnel AS (
SELECT
'Leads' as stage,
COUNT(*) as count,
1 as stage_order
FROM leads
WHERE created_date >= '2024-01-01'
UNION ALL
SELECT
'Qualified Leads' as stage,
COUNT(*) as count,
2 as stage_order
FROM leads
WHERE status = 'qualified' AND created_date >= '2024-01-01'
UNION ALL
SELECT
'Opportunities' as stage,
COUNT(*) as count,
3 as stage_order
FROM opportunities
WHERE created_date >= '2024-01-01'
UNION ALL
SELECT
'Closed Won' as stage,
COUNT(*) as count,
4 as stage_order
FROM opportunities
WHERE status = 'won' AND created_date >= '2024-01-01'
),
funnel_with_conversion AS (
SELECT
stage,
count,
stage_order,
LAG(count) OVER (ORDER BY stage_order) as previous_count,
CASE
WHEN LAG(count) OVER (ORDER BY stage_order) > 0
THEN ROUND(count::DECIMAL / LAG(count) OVER (ORDER BY stage_order) * 100, 2)
ELSE 100.0
END as conversion_rate
FROM sales_funnel
)
SELECT
stage,
count,
conversion_rate || '%' as conversion_rate
FROM funnel_with_conversion
ORDER BY stage_order;
同期群分析(Cohort Analysis)
WITH customer_cohorts AS (
SELECT
customer_id,
DATE_TRUNC('month', MIN(order_date)) as cohort_month
FROM orders
GROUP BY customer_id
),
customer_activities AS (
SELECT
cc.cohort_month,
DATE_TRUNC('month', o.order_date) as activity_month,
COUNT(DISTINCT o.customer_id) as active_customers
FROM customer_cohorts cc
JOIN orders o ON cc.customer_id = o.customer_id
GROUP BY cc.cohort_month, DATE_TRUNC('month', o.order_date)
),
cohort_table AS (
SELECT
cohort_month,
activity_month,
active_customers,
EXTRACT(EPOCH FROM (activity_month - cohort_month)) / (30 * 24 * 60 * 60) as month_number
FROM customer_activities
)
SELECT
cohort_month,
month_number,
active_customers,
FIRST_VALUE(active_customers) OVER (
PARTITION BY cohort_month
ORDER BY month_number
) as cohort_size,
ROUND(
active_customers::DECIMAL /
FIRST_VALUE(active_customers) OVER (
PARTITION BY cohort_month
ORDER BY month_number
) * 100, 2
) as retention_rate
FROM cohort_table
ORDER BY cohort_month, month_number;
2. 层级数据处理
权限系统递归查询
-- 查询用户的所有有效权限(包括继承的权限)
WITH RECURSIVE user_permissions AS (
-- 直接权限
SELECT
up.user_id,
up.permission_id,
p.permission_name,
'direct' as permission_source,
0 as inheritance_level
FROM user_permissions up
JOIN permissions p ON up.permission_id = p.permission_id
WHERE up.user_id = :user_id
UNION ALL
-- 角色继承的权限
SELECT
ur.user_id,
rp.permission_id,
p.permission_name,
'role:' || r.role_name as permission_source,
1 as inheritance_level
FROM user_roles ur
JOIN roles r ON ur.role_id = r.role_id
JOIN role_permissions rp ON r.role_id = rp.role_id
JOIN permissions p ON rp.permission_id = p.permission_id
WHERE ur.user_id = :user_id
UNION ALL
-- 角色层级继承的权限
SELECT
up.user_id,
rp.permission_id,
p.permission_name,
'inherited_role:' || pr.role_name as permission_source,
up.inheritance_level + 1
FROM user_permissions up
JOIN user_roles ur ON up.user_id = ur.user_id
JOIN role_hierarchy rh ON ur.role_id = rh.child_role_id
JOIN roles pr ON rh.parent_role_id = pr.role_id
JOIN role_permissions rp ON pr.role_id = rp.role_id
JOIN permissions p ON rp.permission_id = p.permission_id
WHERE up.inheritance_level < 3 -- 限制继承深度
)
SELECT DISTINCT
permission_id,
permission_name,
MIN(inheritance_level) as min_inheritance_level,
STRING_AGG(DISTINCT permission_source, ', ') as sources
FROM user_permissions
GROUP BY permission_id, permission_name
ORDER BY min_inheritance_level, permission_name;
分类目录管理
-- 移动分类及其所有子分类到新的父分类下
WITH RECURSIVE category_subtree AS (
-- 要移动的分类及其子分类
SELECT id, parent_id, name, 0 as level
FROM categories
WHERE id = :category_to_move
UNION ALL
SELECT c.id, c.parent_id, c.name, cs.level + 1
FROM categories c
JOIN category_subtree cs ON c.parent_id = cs.id
),
update_plan AS (
SELECT
cs.id,
CASE
WHEN cs.level = 0 THEN :new_parent_id
ELSE cs.parent_id
END as new_parent_id
FROM category_subtree cs
)
UPDATE categories
SET parent_id = up.new_parent_id,
updated_at = CURRENT_TIMESTAMP
FROM update_plan up
WHERE categories.id = up.id;
3. 时间序列数据处理
生成时间序列并填充缺失数据
WITH RECURSIVE date_series AS (
SELECT DATE '2024-01-01' as date_val
UNION ALL
SELECT date_val + INTERVAL '1 day'
FROM date_series
WHERE date_val < DATE '2024-12-31'
),
daily_sales AS (
SELECT
DATE(order_date) as sale_date,
SUM(amount) as daily_amount,
COUNT(*) as daily_orders
FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2025-01-01'
GROUP BY DATE(order_date)
)
SELECT
ds.date_val,
COALESCE(dsales.daily_amount, 0) as amount,
COALESCE(dsales.daily_orders, 0) as orders,
-- 计算7天移动平均
AVG(COALESCE(dsales.daily_amount, 0)) OVER (
ORDER BY ds.date_val
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) as moving_avg_7_days
FROM date_series ds
LEFT JOIN daily_sales dsales ON ds.date_val = dsales.sale_date
ORDER BY ds.date_val;
会话分析
-- 分析用户会话,定义30分钟无活动为会话结束
WITH RECURSIVE user_sessions AS (
SELECT
user_id,
event_time,
event_type,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_time) as rn,
event_time as session_start,
1 as session_id
FROM user_events
WHERE user_id = :user_id
AND event_time >= :start_date
UNION ALL
SELECT
ue.user_id,
ue.event_time,
ue.event_type,
us.rn + 1,
CASE
WHEN ue.event_time - us.event_time > INTERVAL '30 minutes'
THEN ue.event_time
ELSE us.session_start
END,
CASE
WHEN ue.event_time - us.event_time > INTERVAL '30 minutes'
THEN us.session_id + 1
ELSE us.session_id
END
FROM user_events ue
JOIN user_sessions us ON ue.user_id = us.user_id
AND ue.event_time > us.event_time
WHERE ue.user_id = :user_id
AND ue.event_time >= :start_date
AND us.rn = (SELECT MAX(rn) FROM user_sessions WHERE user_id = us.user_id)
)
SELECT
session_id,
session_start,
MAX(event_time) as session_end,
COUNT(*) as event_count,
MAX(event_time) - session_start as session_duration
FROM user_sessions
GROUP BY session_id, session_start
ORDER BY session_start;
最佳实践
1. 命名规范
-- ✅ 推荐:使用描述性的CTE名称
WITH
high_value_customers AS (...),
recent_orders AS (...),
product_performance AS (...)
-- ❌ 避免:使用模糊的名称
WITH
cte1 AS (...),
temp AS (...),
data AS (...)
2. 结构化组织
-- ✅ 推荐:按逻辑顺序组织多个CTE
WITH
-- 基础数据提取
raw_sales_data AS (
SELECT customer_id, product_id, amount, sale_date
FROM sales
WHERE sale_date >= '2024-01-01'
),
-- 数据聚合
customer_totals AS (
SELECT customer_id, SUM(amount) as total_spent
FROM raw_sales_data
GROUP BY customer_id
),
-- 分类标记
customer_segments AS (
SELECT
customer_id,
total_spent,
CASE
WHEN total_spent > 10000 THEN 'VIP'
WHEN total_spent > 5000 THEN 'Premium'
ELSE 'Standard'
END as segment
FROM customer_totals
)
-- 最终查询
SELECT
c.customer_name,
cs.total_spent,
cs.segment
FROM customers c
JOIN customer_segments cs ON c.customer_id = cs.customer_id
ORDER BY cs.total_spent DESC;
3. 递归CTE最佳实践
始终包含终止条件
-- ✅ 推荐:明确的终止条件
WITH RECURSIVE hierarchy AS (
SELECT id, parent_id, name, 0 as level
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT c.id, c.parent_id, c.name, h.level + 1
FROM categories c
JOIN hierarchy h ON c.parent_id = h.id
WHERE h.level < 10 -- 明确的深度限制
AND c.parent_id IS NOT NULL -- 防止NULL值问题
)
SELECT * FROM hierarchy;
循环检测
-- ✅ 推荐:检测和防止循环引用
WITH RECURSIVE safe_hierarchy AS (
SELECT
id,
parent_id,
name,
0 as level,
ARRAY[id] as path
FROM categories
WHERE parent_id IS NULL
UNION ALL
SELECT
c.id,
c.parent_id,
c.name,
sh.level + 1,
sh.path || c.id
FROM categories c
JOIN safe_hierarchy sh ON c.parent_id = sh.id
WHERE sh.level < 20
AND NOT (c.id = ANY(sh.path)) -- 防止循环
)
SELECT id, name, level, array_to_string(path, ' -> ') as path
FROM safe_hierarchy;
4. 性能优化最佳实践
合理使用索引
-- 为递归查询创建合适的索引
CREATE INDEX idx_categories_parent_id ON categories(parent_id);
CREATE INDEX idx_categories_id_parent_id ON categories(id, parent_id);
-- 复合索引用于复杂递归查询
CREATE INDEX idx_employees_manager_dept ON employees(manager_id, department_id);
限制结果集大小
-- ✅ 推荐:在CTE中尽早过滤数据
WITH filtered_orders AS (
SELECT customer_id, amount, order_date
FROM orders
WHERE order_date >= '2024-01-01' -- 尽早过滤
AND status = 'completed'
AND amount > 0
),
customer_stats AS (
SELECT
customer_id,
COUNT(*) as order_count,
SUM(amount) as total_amount
FROM filtered_orders -- 使用已过滤的数据
GROUP BY customer_id
)
SELECT * FROM customer_stats
WHERE order_count >= 5; -- 进一步过滤
常见陷阱与注意事项
1. 递归CTE陷阱
无限递归
-- ❌ 危险:可能导致无限递归
WITH RECURSIVE dangerous_recursion AS (
SELECT 1 as n
UNION ALL
SELECT n + 1 FROM dangerous_recursion -- 没有终止条件!
)
SELECT * FROM dangerous_recursion;
-- ✅ 安全:包含终止条件
WITH RECURSIVE safe_recursion AS (
SELECT 1 as n
UNION ALL
SELECT n + 1 FROM safe_recursion WHERE n < 100
)
SELECT * FROM safe_recursion;
循环引用数据
-- 处理可能存在循环引用的数据
-- 假设categories表中存在循环引用:A -> B -> C -> A
-- ❌ 问题:可能导致无限递归
WITH RECURSIVE bad_hierarchy AS (
SELECT id, parent_id, name FROM categories WHERE id = 1
UNION ALL
SELECT c.id, c.parent_id, c.name
FROM categories c
JOIN bad_hierarchy bh ON c.parent_id = bh.id
)
SELECT * FROM bad_hierarchy;
-- ✅ 解决:使用路径跟踪防止循环
WITH RECURSIVE good_hierarchy AS (
SELECT id, parent_id, name, ARRAY[id] as path
FROM categories WHERE id = 1
UNION ALL
SELECT c.id, c.parent_id, c.name, gh.path || c.id
FROM categories c
JOIN good_hierarchy gh ON c.parent_id = gh.id
WHERE NOT (c.id = ANY(gh.path))
)
SELECT id, parent_id, name FROM good_hierarchy;
2. 性能陷阱
过度使用CTE
-- ❌ 过度使用:简单查询不需要CTE
WITH simple_cte AS (
SELECT * FROM users WHERE status = 'active'
)
SELECT * FROM simple_cte WHERE age > 18;
-- ✅ 直接查询更简单高效
SELECT * FROM users
WHERE status = 'active' AND age > 18;
大数据量递归
-- ❌ 问题:大数据量递归可能导致内存溢出
WITH RECURSIVE large_hierarchy AS (
SELECT id, parent_id, name FROM large_table WHERE parent_id IS NULL
UNION ALL
SELECT lt.id, lt.parent_id, lt.name
FROM large_table lt
JOIN large_hierarchy lh ON lt.parent_id = lh.id
)
SELECT * FROM large_hierarchy;
-- ✅ 解决:分批处理或限制深度
WITH RECURSIVE controlled_hierarchy AS (
SELECT id, parent_id, name, 0 as level FROM large_table WHERE parent_id IS NULL
UNION ALL
SELECT lt.id, lt.parent_id, lt.name, ch.level + 1
FROM large_table lt
JOIN controlled_hierarchy ch ON lt.parent_id = ch.id
WHERE ch.level < 5 -- 限制深度
)
SELECT * FROM controlled_hierarchy;
3. 数据类型陷阱
UNION ALL类型不匹配
-- ❌ 问题:数据类型不匹配
WITH RECURSIVE type_mismatch AS (
SELECT 1 as id, 'root' as name -- name是VARCHAR
UNION ALL
SELECT id + 1, id + 1 FROM type_mismatch WHERE id < 5 -- name变成了INTEGER
)
SELECT * FROM type_mismatch;
-- ✅ 解决:确保类型一致
WITH RECURSIVE type_consistent AS (
SELECT 1 as id, 'root' as name
UNION ALL
SELECT id + 1, CAST(id + 1 AS VARCHAR) FROM type_consistent WHERE id < 5
)
SELECT * FROM type_consistent;
4. NULL值处理
-- ✅ 正确处理NULL值
WITH RECURSIVE null_safe_hierarchy AS (
SELECT id, parent_id, name, 0 as level
FROM categories
WHERE parent_id IS NULL -- 明确处理NULL
UNION ALL
SELECT c.id, c.parent_id, c.name, nsh.level + 1
FROM categories c
JOIN null_safe_hierarchy nsh ON c.parent_id = nsh.id
WHERE c.parent_id IS NOT NULL -- 防止NULL值问题
AND nsh.level < 10
)
SELECT * FROM null_safe_hierarchy;
高级用法
1. CTE与窗口函数结合
-- 计算每个产品的销售趋势
WITH monthly_sales AS (
SELECT
product_id,
DATE_TRUNC('month', order_date) as month,
SUM(amount) as monthly_amount
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY product_id, DATE_TRUNC('month', order_date)
),
sales_with_trends AS (
SELECT
product_id,
month,
monthly_amount,
LAG(monthly_amount) OVER (PARTITION BY product_id ORDER BY month) as prev_month_amount,
AVG(monthly_amount) OVER (
PARTITION BY product_id
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) as moving_avg_3_months
FROM monthly_sales
)
SELECT
p.product_name,
swt.month,
swt.monthly_amount,
swt.moving_avg_3_months,
CASE
WHEN swt.prev_month_amount IS NULL THEN 'N/A'
WHEN swt.monthly_amount > swt.prev_month_amount THEN 'Increasing'
WHEN swt.monthly_amount < swt.prev_month_amount THEN 'Decreasing'
ELSE 'Stable'
END as trend
FROM sales_with_trends swt
JOIN products p ON swt.product_id = p.product_id
ORDER BY p.product_name, swt.month;
2. 递归CTE生成复杂序列
生成斐波那契数列
WITH RECURSIVE fibonacci AS (
SELECT
1 as n,
0::BIGINT as fib_current,
1::BIGINT as fib_next
UNION ALL
SELECT
n + 1,
fib_next,
fib_current + fib_next
FROM fibonacci
WHERE n < 50 AND fib_next < 9223372036854775807 -- 防止溢出
)
SELECT n, fib_current as fibonacci_number
FROM fibonacci;
生成工作日序列
WITH RECURSIVE business_days AS (
SELECT DATE '2024-01-01' as business_date
WHERE EXTRACT(DOW FROM DATE '2024-01-01') BETWEEN 1 AND 5
UNION ALL
SELECT
CASE
WHEN EXTRACT(DOW FROM business_date + 1) = 6 THEN business_date + 3 -- 跳过周末
WHEN EXTRACT(DOW FROM business_date + 1) = 0 THEN business_date + 2
ELSE business_date + 1
END
FROM business_days
WHERE business_date < DATE '2024-12-31'
),
business_days_with_holidays AS (
SELECT bd.business_date
FROM business_days bd
LEFT JOIN holidays h ON bd.business_date = h.holiday_date
WHERE h.holiday_date IS NULL -- 排除节假日
)
SELECT business_date FROM business_days_with_holidays ORDER BY business_date;
3. CTE用于数据清洗和转换
-- 复杂的数据清洗流程
WITH
-- 第一步:基础数据清洗
cleaned_raw_data AS (
SELECT
customer_id,
TRIM(UPPER(customer_name)) as customer_name,
CASE
WHEN email ~* '^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$'
THEN LOWER(email)
ELSE NULL
END as email,
CASE
WHEN phone ~ '^\d{10,15}$' THEN phone
ELSE REGEXP_REPLACE(phone, '[^\d]', '', 'g')
END as phone
FROM raw_customer_data
WHERE customer_name IS NOT NULL
),
-- 第二步:去重处理
deduplicated_data AS (
SELECT DISTINCT ON (customer_name, email)
customer_id,
customer_name,
email,
phone,
ROW_NUMBER() OVER (PARTITION BY customer_name, email ORDER BY customer_id) as rn
FROM cleaned_raw_data
WHERE email IS NOT NULL
),
-- 第三步:数据验证
validated_data AS (
SELECT
customer_id,
customer_name,
email,
phone,
CASE
WHEN LENGTH(customer_name) < 2 THEN 'Invalid Name'
WHEN email IS NULL THEN 'Invalid Email'
WHEN LENGTH(phone) < 10 THEN 'Invalid Phone'
ELSE 'Valid'
END as validation_status
FROM deduplicated_data
WHERE rn = 1
)
-- 最终结果
SELECT
customer_id,
customer_name,
email,
phone,
validation_status
FROM validated_data
WHERE validation_status = 'Valid';
4. 递归CTE处理图结构
查找图中的所有路径
-- 在有向图中查找从起点到终点的所有路径
WITH RECURSIVE all_paths AS (
-- 起始节点
SELECT
start_node,
end_node,
ARRAY[start_node, end_node] as path,
1 as path_length
FROM graph_edges
WHERE start_node = :start_point
UNION ALL
-- 扩展路径
SELECT
ap.start_node,
ge.end_node,
ap.path || ge.end_node,
ap.path_length + 1
FROM all_paths ap
JOIN graph_edges ge ON ap.end_node = ge.start_node
WHERE NOT (ge.end_node = ANY(ap.path)) -- 避免循环
AND ap.path_length < 10 -- 限制路径长度
)
SELECT
start_node,
end_node,
path,
path_length
FROM all_paths
WHERE end_node = :end_point -- 过滤到目标节点的路径
ORDER BY path_length, path;
总结
CTE的核心价值
- 代码可读性:将复杂查询分解为逻辑清晰的步骤
- 代码复用:在同一查询中多次引用相同的子查询结果
- 递归处理:优雅处理层级和树形结构数据
- 性能优化:通过物化避免重复计算
选择CTE的时机
- 使用CTE:需要多次引用子查询结果、处理递归数据、提高复杂查询可读性
- 避免CTE:简单的一次性查询、对性能要求极高的场景
关键注意事项
- 递归终止:始终包含明确的终止条件
- 循环检测:在可能存在循环的数据中使用路径跟踪
- 性能监控:关注CTE的执行计划和资源使用
- 类型一致:确保UNION ALL中的数据类型匹配
- 索引优化:为递归查询的连接字段创建合适的索引
最佳实践总结
- 使用描述性的CTE名称
- 按逻辑顺序组织多个CTE
- 在CTE中尽早过滤数据
- 合理控制递归深度
- 正确处理NULL值
- 定期监控和优化性能
CTE是SQL中强大而灵活的工具,掌握其正确使用方法能够显著提升SQL查询的质量和可维护性。在实际应用中,应根据具体场景选择合适的CTE类型,并遵循最佳实践以确保查询的正确性和性能。