Hugh's Blog

MySQL 交集、并集与差集

最近碰到一个简单的需求:获取多个分类下的文章,中间有个关联表;有点类似于根据分类表与文章表获取交集 (关联表) 的情况,过程也简单,作下记录。

使用以下的数据来测试:

mysql> SELECT * FROM `table_a`;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
+----+------+

mysql> SELECT * FROM `table_b`;
+----+------+
| id | name |
+----+------+
|  1 | c    |
|  2 | d    |
|  3 | e    |
|  4 | f    |
|  5 | g    |
+----+------+

交集

使用子查询或者 LEFT JOIN 都行。

mysql> SELECT * FROM `table_a` WHERE `name` IN (SELECT `name` FROM `table_b`);
+----+------+
| id | name |
+----+------+
|  3 | c    |
|  4 | d    |
+----+------+

mysql> SELECT a.* FROM `table_a` AS a LEFT JOIN `table_b` AS b USING(name) WHERE b.id IS NOT NULL;
mysql> SELECT a.* FROM `table_a` AS a LEFT JOIN `table_b` AS b ON a.name = b.name WHERE b.id IS NOT NULL;
+----+------+
| id | name |
+----+------+
|  3 | c    |
|  4 | d    |
+----+------+

并集

使用 UNION (ALL) 语句,注意结果列必须相同。

mysql> SELECT * FROM `table_a` UNION SELECT * FROM `table_b`;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
|  3 | c    |
|  4 | d    |
|  1 | c    |
|  2 | d    |
|  3 | e    |
|  4 | f    |
|  5 | g    |
+----+------+

# 指定 `name` 会对结果去重,除非使用 UNION ALL
mysql> SELECT `name` FROM `table_a` UNION SELECT `name` FROM `table_b`;
+------+
| name |
+------+
| a    |
| b    |
| c    |
| d    |
| e    |
| f    |
| g    |
+------+

差集

使用子查询或者 LEFT JOIN 都行。

mysql> SELECT * FROM `table_a` WHERE `name` NOT IN (SELECT `name` FROM `table_b`);
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
+----+------+

mysql> SELECT a.* FROM `table_a` AS a LEFT JOIN `table_b` AS b USING(name) WHERE b.id IS NULL;
mysql> SELECT a.* FROM `table_a` AS a LEFT JOIN `table_b` AS b ON a.name = b.name WHERE b.id IS NULL;
+----+------+
| id | name |
+----+------+
|  1 | a    |
|  2 | b    |
+----+------+

其实都可以用 LEFT JOIN、INNER JOIN 来获取结果集,然后根据 NULL 来判断就行了。

回到最开始的需求:

mysql> SELECT * FROM `category`;
+----+-------+
| id | title |
+----+-------+
|  1 | cat#1 |
|  2 | cat#2 |
|  3 | cat#3 |
|  4 | cat#4 |
+----+-------+

mysql> SELECT * FROM `post`;
+----+--------+
| id | title  |
+----+--------+
|  1 | post#1 |
|  2 | post#2 |
|  3 | post#3 |
|  4 | post#4 |
+----+--------+

mysql> SELECT * FROM `relationship`;
+-----+-----+
| cid | pid |
+-----+-----+
|   2 |   3 |
|   3 |   1 |
|   4 |   2 |
|   3 |   3 |
|   2 |   4 |
+-----+-----+

# 取分类 2、3 下的文章
mysql> SELECT p.* FROM `post` AS p LEFT JOIN `relationship` AS r ON p.id = r.pid WHERE r.cid = 2 AND p.id IN (SELECT pid FROM `relationship` WHERE `cid` = 3) GROUP BY p.id;
+----+--------+
| id | title  |
+----+--------+
|  3 | post#3 |
+----+--------+

看起来还有个小问题,如果很多分类,比如 2、3、4、5 等,那子查询会有点多,语句拼起来也麻烦。