第 6 讲:秒杀系统全链路设计
这一讲是整个高并发系列的集大成之作。
秒杀系统是高并发面试中最高频的场景题,也是检验你是否真正掌握高并发设计的试金石。
很多人被问到"如何设计秒杀系统"时,只会说:
- "用Redis存库存"
- "用消息队列异步下单"
但面试官真正想听的是:
- 为什么要这样设计?
- 每一层怎么防护?
- 超卖怎么彻底解决?
- 数据一致性怎么保证?
- 极端场景下怎么兜底?
这一讲会带你从零开始完整设计一个可以支撑百万QPS的秒杀系统。
一、秒杀系统的核心难点
先理解秒杀的本质
秒杀的特征:
时间极短:可能只有1秒
库存极少:可能只有100个
用户极多:可能有100万用户同时抢
本质:
100万个并发请求,争抢100个库存
成功率 = 100 / 1000000 = 0.01%
99.99%的请求注定失败这带来了几个核心问题:
难点1:瞬时高并发
平时QPS:1000
秒杀开始瞬间QPS:1000000
1000倍的流量突增
-> 数据库直接挂
-> 应用服务器直接挂
-> 全站崩溃难点2:超卖问题
库存:100
并发请求:10000
如果没有并发控制:
-> 多个请求同时读到库存=1
-> 多个请求同时扣减
-> 库存扣成负数
-> 超卖了难点3:数据一致性
Redis库存扣了,但数据库订单没创建
-> 用户扣了但没订单
数据库订单创建了,但库存没有正确扣减
-> 超卖难点4:用户体验
99.99%的用户抢不到
-> 但不能给他们很差的体验
-> 不能让他们等很久
-> 要快速告诉他们结果难点5:恶意请求
黄牛用脚本刷
-> 一个用户发几千个请求
-> 占用系统资源
-> 正常用户被挤出去二、秒杀系统整体架构
架构总览
[用户]
|
[CDN] --- 静态页面缓存
|
[Nginx] --- 接入层限流、IP防刷
|
[API网关] --- 鉴权、用户级限流、风控
|
[秒杀服务] --- 核心逻辑
| |
[Redis] [Kafka]
预扣库存 异步下单
| |
[订单服务]
|
[MySQL] --- 最终落库核心思想:
1. 用CDN + 静态化把大部分流量挡在最外层
2. 用多级限流把流量层层削减
3. 用Redis在内存中预扣库存(不打DB)
4. 用消息队列异步创建订单(削峰)
5. DB只做最终一致性保障三、前端层:把流量挡在最外层
1. 页面静态化 + CDN
商品详情页静态化:
动态页面:
用户请求 -> 服务器 -> 查DB -> 渲染 -> 返回HTML
静态化:
用户请求 -> CDN节点(就近) -> 直接返回HTML文件实现:
1. 商品信息提前生成静态HTML
2. 推送到CDN各节点
3. 用户访问直接命中CDN
4. 不打到源站效果:
100万请求 -> CDN拦截95万 -> 只有5万打到源站2. 按钮防重 + 前端限流
以下为前端 JavaScript 参考代码(仅标注逻辑,非本项目的 Python 实现):
javascript
// 点击按钮后禁用,防止重复点击
let isSubmitting = false;
function seckill(activityId) {
if (isSubmitting) return; // 防重复提交
isSubmitting = true;
document.getElementById('seckillBtn').disabled = true;
document.getElementById('seckillBtn').innerText = '抢购中...';
fetch('/api/seckill', {
method: 'POST',
body: JSON.stringify({ activityId })
})
.then(response => response.json())
.then(data => {
if (data.success) {
showResult('恭喜!抢购成功,订单号:' + data.orderNo);
} else {
showResult(data.message);
isSubmitting = false;
document.getElementById('seckillBtn').disabled = false;
}
});
}3. 验证码(防刷)
秒杀开始前要求用户输入验证码
-> 人工验证通过才能发起请求
-> 刷子脚本无法自动识别验证码
-> 把请求分散到几秒内(削峰)python
def validate_captcha(user_id: str, captcha: str) -> bool:
"""验证码校验"""
key = f"captcha:{user_id}"
correct = redis.get(key)
if correct is None:
return False # 验证码过期
if captcha.lower() != correct.decode().lower():
return False # 验证码错误
redis.delete(key) # 用过即删
return True4. 秒杀令牌(分散流量)
思路:
不直接开放秒杀接口
-> 先发放令牌(有限数量)
-> 只有拿到令牌的用户才能参与秒杀
-> 令牌数量 = 库存 x 3(多一点冗余)
-> 提前过滤掉大部分请求python
import uuid
# 发放令牌(秒杀开始前5秒)
def distribute_tokens(activity_id: int):
activity = activity_dao.get_by_id(activity_id)
token_count = activity["stock"] * 3 # 库存的3倍令牌
for i in range(token_count):
token = f"{activity_id}_{uuid.uuid4().hex[:8]}_{i}"
redis.sadd(f"seckill:tokens:{activity_id}", token)
redis.expire(f"seckill:tokens:{activity_id}", 3600)
# 用户获取令牌
def get_token(activity_id: int, user_id: int) -> str | None:
token = redis.spop(f"seckill:tokens:{activity_id}")
if token is None:
return None # 没有令牌了
# 绑定到用户
redis.setex(f"seckill:user:token:{user_id}:{activity_id}", 300, token)
return token
# 秒杀时校验令牌
def validate_token(user_id: int, activity_id: int, token: str) -> bool:
key = f"seckill:user:token:{user_id}:{activity_id}"
valid_token = redis.get(key)
return valid_token and valid_token.decode() == token四、接入层:Nginx多维度限流
Nginx配置
nginx
http {
# 按IP限流(防止单IP刷)
limit_req_zone $binary_remote_addr
zone=ip_seckill:10m
rate=5r/s; # 单IP每秒5个请求
# 按接口限流(全局流量控制)
limit_req_zone $server_name
zone=global_seckill:10m
rate=2000r/s; # 全局每秒2000个
# 按用户ID限流
limit_req_zone $http_user_id
zone=user_seckill:50m
rate=1r/s; # 单用户每秒1个
server {
location /api/seckill {
# IP限流,突发允许5个,无延迟直接拒绝
limit_req zone=ip_seckill burst=5 nodelay;
# 全局限流
limit_req zone=global_seckill burst=500 nodelay;
# 用户限流
limit_req zone=user_seckill burst=2 nodelay;
# 限流后返回429
limit_req_status 429;
proxy_pass http://seckill_backend;
# 超时控制
proxy_connect_timeout 1s;
proxy_read_timeout 3s;
}
}
}五、网关层:鉴权 + 风控
鉴权(FastAPI 中间件)
python
from fastapi import FastAPI, Request, HTTPException
from starlette.middleware.base import BaseHTTPMiddleware
class AuthMiddleware(BaseHTTPMiddleware):
"""网关鉴权中间件"""
async def dispatch(self, request: Request, call_next):
# 1. 获取Token
token = request.headers.get("Authorization")
if not token:
return JSONResponse({"code": 401, "message": "未登录"}, status_code=401)
# 2. 验证Token(Redis中查)
user_id = redis.get(f"token:{token}")
if user_id is None:
return JSONResponse({"code": 401, "message": "Token失效"}, status_code=401)
# 3. 透传用户ID到下游(注入请求scope)
request.state.user_id = user_id.decode()
return await call_next(request)风控(防刷)
python
class RiskControlMiddleware(BaseHTTPMiddleware):
"""风控中间件"""
async def dispatch(self, request: Request, call_next):
if not request.url.path.startswith("/api/seckill"):
return await call_next(request)
user_id = request.state.user_id
ip = request.client.host
# 1. 黑名单检查
if redis.sismember("risk:blacklist", user_id) or redis.sismember("risk:blacklist:ip", ip):
return JSONResponse({"code": 403, "message": "账号异常"}, status_code=403)
# 2. 用户行为检查(短时间内请求太多)
count = redis.incr(f"risk:behavior:{user_id}")
redis.expire(f"risk:behavior:{user_id}", 60) # 1分钟窗口
if count > 20: # 1分钟内超过20次请求
redis.sadd("risk:flagged", user_id)
return JSONResponse({"code": 429, "message": "操作过于频繁"}, status_code=429)
# 3. 设备指纹检查
device_id = request.headers.get("X-Device-Id")
if device_id and redis.get(f"risk:device:{device_id}") and \
redis.get(f"risk:device:{device_id}").decode() != user_id:
return JSONResponse({"code": 403, "message": "设备异常"}, status_code=403)
return await call_next(request)六、秒杀服务:核心逻辑
整体流程(FastAPI)
python
from fastapi import FastAPI, Request, HTTPException, Depends
app = FastAPI()
@app.post("/api/seckill")
@rate_limit # 限流装饰器
async def seckill(request: SeckillRequest, user_id: int = Depends(get_user_id)):
"""秒杀核心接口"""
# 1. 参数校验
validate_request(request, user_id)
# 2. 用户资格校验
check_user_eligibility(user_id, request.activity_id)
# 3. 活动校验(时间、状态)
check_activity(request.activity_id)
# 4. Redis预扣库存(核心)
deducted = pre_deduct_stock(user_id, request.activity_id)
if not deducted:
return {"code": 400, "message": "库存不足,手慢了"}
# 5. 发送消息到Kafka(异步下单)
send_order_message(user_id, request.activity_id)
# 6. 返回排队中
return {"code": 200, "message": "正在为您抢购,请稍候查看结果"}步骤1:参数校验
python
def validate_request(request: SeckillRequest, user_id: int) -> None:
if not request:
raise HTTPException(400, "参数不能为空")
if not request.activity_id or request.activity_id <= 0:
raise HTTPException(400, "活动ID非法")
if not user_id or user_id <= 0:
raise HTTPException(400, "用户ID非法")步骤2:用户资格校验
python
def check_user_eligibility(user_id: int, activity_id: int) -> None:
# 1. 是否已经参与过这个活动(防止重复购买)
if redis.exists(f"seckill:bought:{activity_id}:{user_id}"):
raise HTTPException(400, "您已参与过此活动")
# 2. 账号是否正常
if redis.sismember("risk:blacklist", user_id):
raise HTTPException(400, "账号异常,无法参与")步骤3:活动校验
python
import time
def check_activity(activity_id: int) -> None:
# 活动信息缓存在Redis(避免每次查DB)
activity_key = f"seckill:activity:{activity_id}"
activity = redis.get(activity_key)
if activity is None:
activity = activity_dao.get_by_id(activity_id)
if activity:
redis.setex(activity_key, 300, json.dumps(activity))
if activity is None:
raise HTTPException(400, "活动不存在")
activity = json.loads(activity) if isinstance(activity, bytes) else activity
if activity["status"] != "RUNNING":
raise HTTPException(400, "活动未开始或已结束")
now_ms = int(time.time() * 1000)
if now_ms < activity["start_time"]:
raise HTTPException(400, "活动尚未开始")
if now_ms > activity["end_time"]:
raise HTTPException(400, "活动已结束")步骤4:Redis预扣库存(核心中的核心)
为什么用Lua脚本?
普通操作:
GET stock -> 判断 -> DECR stock
这三步不是原子的:
线程A:GET stock = 1(库存还有1个)
线程B:GET stock = 1(库存还有1个)
线程A:DECR stock = 0
线程B:DECR stock = -1 <- 超卖!
Lua脚本:在Redis中原子执行,不会被其他命令打断python
LUA_DEDUCT_STOCK = """
-- 1. 检查用户是否已购买
if redis.call('exists', KEYS[2]) == 1 then
return -1 -- 已购买
end
-- 2. 获取当前库存
local stock = tonumber(redis.call('get', KEYS[1]))
-- 3. 判断库存
if stock == nil or stock <= 0 then
return 0 -- 库存不足
end
-- 4. 扣减库存
redis.call('decrby', KEYS[1], 1)
-- 5. 标记用户已购买(30分钟有效)
redis.call('setex', KEYS[2], 1800, '1')
return 1 -- 扣减成功
"""
def pre_deduct_stock(user_id: int, activity_id: int) -> bool:
stock_key = f"seckill:stock:{activity_id}"
bought_key = f"seckill:bought:{activity_id}:{user_id}"
result = redis.eval(LUA_DEDUCT_STOCK, 2, stock_key, bought_key)
if result == -1:
raise HTTPException(400, "您已参与过此活动,请勿重复抢购")
if result == 0:
return False
return TrueRedis库存预热(活动开始前):
python
from apscheduler.schedulers.background import BackgroundScheduler
def preload_seckill_stock() -> None:
"""每5分钟检查,提前预热即将开始的秒杀库存"""
activities = activity_dao.get_upcoming_activities()
now_ms = int(time.time() * 1000)
for activity in activities:
start_time = activity["start_time"]
if start_time - now_ms <= 10 * 60 * 1000: # 10分钟内开始
stock_key = f"seckill:stock:{activity['id']}"
if not redis.exists(stock_key):
redis.set(stock_key, activity["stock"])
redis.expire(stock_key, 7200) # 2小时过期
logger.info(f"预热秒杀库存: activityId={activity['id']}, stock={activity['stock']}")
scheduler = BackgroundScheduler()
scheduler.add_job(preload_seckill_stock, 'interval', minutes=5)
scheduler.start()步骤5:发送消息到Kafka
python
from kafka import KafkaProducer
producer = KafkaProducer(
bootstrap_servers=['localhost:9092'],
acks='all',
retries=3,
value_serializer=lambda v: json.dumps(v).encode()
)
def send_order_message(user_id: int, activity_id: int) -> None:
message = {
"user_id": user_id,
"activity_id": activity_id,
"timestamp": int(time.time() * 1000),
"msg_id": generate_msg_id(), # 消息唯一ID(用于幂等)
}
try:
# 用user_id做Key,保证同一用户的消息有序
future = producer.send(
"seckill-order-topic",
key=str(user_id).encode(),
value=message
)
future.get(timeout=5) # 同步等待确认
except Exception as e:
logger.error(f"消息发送失败: userId={user_id}", exc_info=True)
# 回滚Redis库存
rollback_stock(user_id, activity_id)
raise HTTPException(500, "系统繁忙,请重试")
def rollback_stock(user_id: int, activity_id: int) -> None:
"""库存回滚(Lua原子操作)"""
stock_key = f"seckill:stock:{activity_id}"
bought_key = f"seckill:bought:{activity_id}:{user_id}"
script = """
redis.call('incr', KEYS[1])
redis.call('del', KEYS[2])
return 1
"""
redis.eval(script, 2, stock_key, bought_key)
logger.info(f"库存回滚成功: userId={user_id}, activityId={activity_id}")七、订单服务:消费消息创建订单
消费者设计
python
from kafka import KafkaConsumer
class SeckillOrderConsumer:
"""秒杀订单消费者"""
def __init__(self):
self.consumer = KafkaConsumer(
'seckill-order-topic',
bootstrap_servers=['localhost:9092'],
group_id='seckill-order-group',
enable_auto_commit=False,
max_poll_records=10,
value_deserializer=lambda v: json.loads(v)
)
def run(self) -> None:
for message in self.consumer:
try:
self.process_order(message.value)
self.consumer.commit() # 手动提交
except BusinessException as e:
logger.warning(f"业务异常: {e}")
self.consumer.commit() # 业务异常不重试
except Exception as e:
logger.error("处理失败,等待重试", exc_info=True)
# 不提交,Kafka会重试
def process_order(self, message: dict) -> None:
user_id = message["user_id"]
activity_id = message["activity_id"]
msg_id = message["msg_id"]
# 1. 幂等检查
if message_dedup_dao.exists(msg_id):
logger.warning(f"消息已处理,跳过: msgId={msg_id}")
return
# 2. DB级别兜底检查
if order_dao.exists_by_user_and_activity(user_id, activity_id):
logger.warning(f"订单已存在: userId={user_id}, activityId={activity_id}")
return
# 3. 获取活动信息
activity = activity_dao.get_by_id(activity_id)
# 4. 扣减数据库库存(乐观锁)
affected = activity_dao.deduct_stock(activity_id, activity["version"])
if affected == 0:
raise RetryException("库存扣减冲突,重试")
# 5. 创建订单
order = self._create_order(user_id, activity)
order_dao.insert(order)
# 6. 标记消息已处理
message_dedup_dao.insert(msg_id)
# 7. 通知用户
notify_user(user_id, order)
logger.info(f"订单创建成功: userId={user_id}, orderNo={order['order_no']}")
def _create_order(self, user_id: int, activity: dict) -> dict:
return {
"id": snowflake_id_gen.next_id(),
"order_no": str(snowflake_id_gen.next_id()),
"user_id": user_id,
"activity_id": activity["id"],
"product_id": activity["product_id"],
"product_name": activity["product_name"],
"activity_price": activity["activity_price"],
"status": "CREATED",
"create_time": datetime.now(),
"expire_time": datetime.now() + timedelta(minutes=30),
}幂等处理
python
def is_already_processed(msg_id: str) -> bool:
return message_dedup_dao.exists(msg_id)
def mark_as_processed(msg_id: str) -> None:
try:
message_dedup_dao.insert(msg_id)
except IntegrityError:
pass # 已存在,忽略数据库库存扣减(乐观锁防超卖)
python
def deduct_stock_with_retry(activity_id: int, max_retry: int = 3) -> None:
"""乐观锁扣库存,支持重试"""
for i in range(max_retry):
activity = activity_dao.get_by_id(activity_id)
if activity["stock"] <= 0:
raise BusinessException("库存已售罄")
affected = activity_dao.deduct_stock(activity_id, activity["version"])
if affected > 0:
return # 扣减成功
logger.warning(f"乐观锁冲突,第{i + 1}次重试")
time.sleep(0.05 * (i + 1)) # 递增等待
raise BusinessException("库存扣减失败,请重试")对应的 SQL:
sql
-- 乐观锁扣减
UPDATE seckill_activity
SET stock = stock - 1, version = version + 1
WHERE id = :id AND stock > 0 AND version = :version;八、超时取消:订单未支付自动关闭
延迟消息实现(RocketMQ)
python
def send_cancel_message(order: dict) -> None:
"""创建订单后,发送延迟取消消息"""
msg = Message(
topic="order-cancel-topic",
body=json.dumps(order).encode()
)
msg.set_delay_time_level(16) # Level16 = 30分钟
rocketmq_producer.send(msg)
def on_cancel_message(message: dict) -> None:
"""取消消费者:30分钟后检查"""
order = order_dao.get_by_order_no(message["order_no"])
if order is None:
return
# 只取消未支付的订单
if order["status"] != "CREATED":
logger.info(f"订单已支付或已取消,跳过: orderNo={order['order_no']}")
return
# 取消订单
affected = order_dao.cancel_order(
order["order_no"],
expected_status="CREATED",
new_status="CANCELLED"
)
if affected > 0:
restore_stock(order["activity_id"])
logger.info(f"订单超时取消: orderNo={order['order_no']}")
def restore_stock(activity_id: int) -> None:
"""恢复库存"""
# 1. 恢复DB库存
activity_dao.increase_stock(activity_id)
# 2. 恢复Redis库存
redis.incr(f"seckill:stock:{activity_id}")
logger.info(f"库存恢复: activityId={activity_id}")九、结果查询:用户如何知道抢没抢到
轮询方案
前端 JavaScript 参考代码:
javascript
async function pollResult(userId, activityId) {
let attempts = 0;
const maxAttempts = 10;
const poll = setInterval(async () => {
attempts++;
const resp = await fetch(`/api/seckill/result?userId=${userId}&activityId=${activityId}`);
const data = await resp.json();
if (data.status === 'SUCCESS') {
clearInterval(poll);
showSuccess('抢购成功!订单号:' + data.orderNo);
} else if (data.status === 'FAILED') {
clearInterval(poll);
showFail('很遗憾,未能抢到');
} else if (attempts >= maxAttempts) {
clearInterval(poll);
showFail('查询超时,请刷新页面查看结果');
}
}, 1000);
}后端接口和结果通知:
python
from fastapi import FastAPI
@app.get("/api/seckill/result")
async def get_seckill_result(user_id: int, activity_id: int):
# 1. 查Redis(快)
result = redis.get(f"seckill:result:{activity_id}:{user_id}")
if result and result.decode() == "SUCCESS":
order_no = redis.get(f"seckill:orderNo:{activity_id}:{user_id}")
return {"code": 200, "data": {"status": "SUCCESS", "order_no": order_no.decode()}}
elif result and result.decode() == "FAILED":
return {"code": 200, "data": {"status": "FAILED", "order_no": None}}
# 2. 查DB(兜底)
order = order_dao.get_by_user_and_activity(user_id, activity_id)
if order:
return {"code": 200, "data": {"status": "SUCCESS", "order_no": order["order_no"]}}
return {"code": 200, "data": {"status": "PROCESSING", "order_no": None}}
def notify_user(user_id: int, order: dict) -> None:
"""订单创建后通知结果"""
activity_id = order["activity_id"]
# 写入结果到Redis
redis.setex(f"seckill:result:{activity_id}:{user_id}", 3600, "SUCCESS")
redis.setex(f"seckill:orderNo:{activity_id}:{user_id}", 3600, order["order_no"])
# 推送通知(WebSocket / 短信)
notification_service.notify(user_id, f"恭喜!秒杀成功,订单号:{order['order_no']}")WebSocket实时推送(更好的体验)
python
from fastapi import WebSocket
from collections import defaultdict
class SeckillResultPusher:
"""WebSocket 结果推送"""
def __init__(self):
self.connections: dict[int, WebSocket] = {}
async def register(self, user_id: int, ws: WebSocket):
await ws.accept()
self.connections[user_id] = ws
async def push_result(self, user_id: int, result: dict):
ws = self.connections.get(user_id)
if ws:
await ws.send_json(result)
def unregister(self, user_id: int):
self.connections.pop(user_id, None)
# 订单创建后推送
async def notify_user_realtime(user_id: int, order: dict) -> None:
result = {"status": "SUCCESS", "order_no": order["order_no"]}
await result_pusher.push_result(user_id, result)十、防超卖的完整方案
三层防超卖
第一层:Redis Lua脚本(最快,内存级别)
-> 原子操作,库存不会扣成负数
-> 预扣库存
第二层:乐观锁(数据库级别)
-> stock > 0 AND version = #{version}
-> 确保DB不会超卖
第三层:唯一约束(最终兜底)
-> 订单表按用户+活动建唯一索引
-> 即使有Bug,DB层也能拦截重复订单sql
-- 唯一约束
ALTER TABLE order_info
ADD UNIQUE KEY uk_user_activity (user_id, activity_id);
-- 库存表
CREATE TABLE seckill_activity (
id BIGINT NOT NULL,
stock INT NOT NULL DEFAULT 0 COMMENT '剩余库存',
version INT NOT NULL DEFAULT 0 COMMENT '乐观锁版本号',
...
);超卖场景模拟与验证
python
import threading
from concurrent.futures import ThreadPoolExecutor, as_completed
def test_seckill_concurrency():
"""并发测试秒杀"""
user_count = 10000 # 1万用户
stock = 100 # 100个库存
# 预热库存
redis.set("seckill:stock:1", stock)
success_count = 0
fail_count = 0
lock = threading.Lock()
def do_seckill(user_id: int) -> bool:
try:
return seckill_service.seckill(user_id, 1)
except Exception:
return False
with ThreadPoolExecutor(max_workers=200) as executor:
futures = {executor.submit(do_seckill, i + 1): i for i in range(user_count)}
for future in as_completed(futures):
if future.result():
with lock:
nonlocal success_count
success_count += 1
else:
with lock:
nonlocal fail_count
fail_count += 1
print(f"成功数: {success_count}") # 应该等于100
print(f"失败数: {fail_count}") # 应该等于9900
# 验证库存
stock_left = redis.get("seckill:stock:1")
print(f"剩余库存: {stock_left}") # 应该等于0
assert success_count == stock, f"超卖了! 成功数={success_count}, 库存={stock}"十一、高可用保障
限流和熔断配置
python
# 使用自定义 TokenBucket + CircuitBreaker
# 秒杀接口限流:10000 QPS
seckill_token_bucket = TokenBucket(rate=10000, capacity=20000)
# Redis调用熔断器
redis_breaker = CircuitBreaker(
failure_threshold=5,
timeout=30, # 熔断30秒
half_open_limit=3
)
# Kafka发送熔断器
kafka_breaker = CircuitBreaker(
failure_threshold=10,
timeout=60,
half_open_limit=5
)
# 使用装饰器组合
@app.post("/api/seckill")
@rate_limit(seckill_token_bucket)
@circuit_breaker(redis_breaker)
async def seckill(request: SeckillRequest, user_id: int = Depends(get_user_id)):
...Redis故障降级
python
def pre_deduct_stock(user_id: int, activity_id: int) -> bool:
try:
return _redis_deduct_stock(user_id, activity_id)
except redis.RedisError:
logger.error("Redis故障,降级到数据库扣减")
return _fallback_db_deduct(user_id, activity_id)
def _fallback_db_deduct(user_id: int, activity_id: int) -> bool:
"""降级:直接走数据库(加锁防超卖)"""
lock_key = f"seckill:lock:{activity_id}"
lock_value = str(uuid.uuid4())
# Redis分布式锁(SETNX,超时3秒)
acquired = redis.set(lock_key, lock_value, nx=True, ex=3)
if not acquired:
return False
try:
activity = activity_dao.get_by_id(activity_id)
if activity["stock"] <= 0:
return False
affected = activity_dao.deduct_stock(activity_id, activity["version"])
return affected > 0
finally:
# Lua脚本安全释放锁
redis.eval("if redis.call('get', KEYS[1]) == ARGV[1] then return redis.call('del', KEYS[1]) else return 0 end",
1, lock_key, lock_value)Kafka故障处理
python
def send_order_message(user_id: int, activity_id: int) -> None:
try:
_send_kafka_message(user_id, activity_id)
except Exception as e:
logger.error("Kafka发送失败,降级到本地消息表")
# 写本地消息表
local_msg = {
"topic": "seckill-order-topic",
"body": json.dumps({"user_id": user_id, "activity_id": activity_id}),
"status": "INIT",
"retry_count": 0,
}
local_message_dao.insert(local_msg)
# 后台重发任务
def retry_local_messages() -> None:
messages = local_message_dao.find_pending(limit=100)
for msg in messages:
try:
producer.send(msg["topic"], msg["body"].encode())
msg["status"] = "SENT"
except Exception:
msg["retry_count"] += 1
if msg["retry_count"] >= 5:
msg["status"] = "FAILED"
alert_service.notify_admin(msg)
local_message_dao.update(msg)
scheduler.add_job(retry_local_messages, 'interval', seconds=5)十二、数据库设计
秒杀活动表
sql
CREATE TABLE `seckill_activity` (
`id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '活动ID',
`activity_name` VARCHAR(128) NOT NULL COMMENT '活动名称',
`product_id` BIGINT UNSIGNED NOT NULL COMMENT '商品ID',
`product_name` VARCHAR(255) NOT NULL COMMENT '商品名称(快照)',
`original_price` DECIMAL(12,2) NOT NULL COMMENT '原价',
`activity_price` DECIMAL(12,2) NOT NULL COMMENT '秒杀价',
`total_stock` INT NOT NULL COMMENT '总库存',
`stock` INT NOT NULL COMMENT '剩余库存',
`version` INT NOT NULL DEFAULT 0 COMMENT '乐观锁版本号',
`status` TINYINT NOT NULL DEFAULT 0 COMMENT '状态:0待开始 1进行中 2已结束',
`start_time` DATETIME NOT NULL COMMENT '开始时间',
`end_time` DATETIME NOT NULL COMMENT '结束时间',
`create_time` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
`update_time` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `idx_status_start_time` (`status`, `start_time`)
) ENGINE=InnoDB COMMENT='秒杀活动表';秒杀订单表
sql
CREATE TABLE `seckill_order` (
`id` BIGINT UNSIGNED NOT NULL COMMENT '主键(雪花ID)',
`order_no` VARCHAR(64) NOT NULL COMMENT '订单号',
`user_id` BIGINT UNSIGNED NOT NULL COMMENT '用户ID',
`activity_id` BIGINT UNSIGNED NOT NULL COMMENT '活动ID',
`product_id` BIGINT UNSIGNED NOT NULL COMMENT '商品ID',
`product_name` VARCHAR(255) NOT NULL COMMENT '商品名称(快照)',
`activity_price` DECIMAL(12,2) NOT NULL COMMENT '成交价格',
`status` TINYINT NOT NULL DEFAULT 0 COMMENT '0待支付 1已支付 2已取消',
`expire_time` DATETIME NOT NULL COMMENT '支付截止时间',
`pay_time` DATETIME COMMENT '支付时间',
`create_time` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
`update_time` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `uk_order_no` (`order_no`),
UNIQUE KEY `uk_user_activity` (`user_id`, `activity_id`), -- 防重复购买
KEY `idx_user_id` (`user_id`),
KEY `idx_activity_id` (`activity_id`)
) ENGINE=InnoDB COMMENT='秒杀订单表';消息幂等表
sql
CREATE TABLE `message_dedup` (
`id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`msg_id` VARCHAR(128) NOT NULL COMMENT '消息唯一ID',
`create_time` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
UNIQUE KEY `uk_msg_id` (`msg_id`)
) ENGINE=InnoDB COMMENT='消息幂等表';十三、完整流程串联
正常成功流程
1. 用户打开秒杀页面
-> CDN返回静态页面(不打源站)
2. 前端显示倒计时
-> 服务器时间(避免客户端时钟不准)
3. 秒杀开始,用户点击按钮
-> 按钮立即禁用(防重复)
-> 发送验证码(如有)
4. 请求到达Nginx
-> IP限流检查(每IP每秒5次)
-> 通过
5. 请求到达网关
-> 鉴权(Token验证)
-> 风控(行为检查)
-> 通过
6. 请求到达秒杀服务
-> 参数校验 [OK]
-> 用户资格校验(是否已购买)[OK]
-> 活动校验(时间、状态)[OK]
-> Redis预扣库存(Lua脚本原子操作)[OK] 扣减成功
-> 发送消息到Kafka [OK]
-> 返回"排队中"
7. 订单服务消费Kafka消息
-> 幂等检查 [OK] 未处理
-> DB库存扣减(乐观锁)[OK]
-> 创建订单 [OK]
-> 写入结果到Redis
-> 通知用户
8. 前端轮询结果
-> 查到"SUCCESS"
-> 显示"恭喜!抢购成功"
-> 跳转到订单页
9. RocketMQ延迟消息(30分钟后)
-> 检查订单是否支付
-> 未支付 -> 取消订单,恢复库存失败流程(库存不足)
6. 请求到达秒杀服务
-> Redis预扣库存(Lua脚本)-> 库存=0,返回失败
-> 立即返回"很遗憾,手慢了"
(不进入消息队列,不打数据库)异常流程(消息发送失败)
6. Redis预扣成功
-> Kafka发送失败
-> 回滚Redis库存(Lua原子操作)
-> 返回"系统繁忙,请重试"十四、容量评估
秒杀场景容量计算
假设:
- 100万用户参与
- 库存:1000个
- 秒杀持续时间:1秒(极端)
峰值QPS = 1000000 / 1 = 100万
各层处理能力:
Nginx:50万 QPS(限流到2万放进来)
网关:2万 QPS(限流到5000放进来)
秒杀服务:5000 QPS x N台
Redis:10万 QPS(扛住)
Kafka:1万 TPS(扛住)
DB:5000 TPS(只接受Kafka来的消息,可控)
机器数量:
秒杀服务:5000 QPS需要,单机1000 QPS,需要5台
消费者:1000 TPS,单机200 TPS,需要5台十五、面试答题模板
"请设计一个秒杀系统"标准回答
第一步:说清楚难点(30秒)
秒杀系统的核心难点是:
1. 瞬时高并发:百万QPS打到系统
2. 超卖问题:多人抢到同一个库存
3. 数据一致性:Redis扣了但DB没扣
4. 用户体验:要快速给出结果
我会从接入层->服务层->数据层逐层讲解。第二步:说架构(2分钟)
整体分为五层:
1. 前端层:页面静态化 + CDN,挡住90%的流量
2. 接入层:Nginx限流 + IP防刷
3. 网关层:鉴权 + 风控 + 用户级限流
4. 服务层:Redis预扣库存 + Kafka异步下单
5. 数据层:DB最终落库,乐观锁防超卖第三步:说核心(2分钟)
核心是Redis预扣库存:
用Lua脚本原子执行三个操作:
1. 检查用户是否已购买
2. 检查库存是否充足
3. 扣减库存 + 标记用户已购
Lua保证原子性,彻底防止超卖。第四步:说一致性(1分钟)
三层防超卖:
1. Redis Lua:内存层防超卖
2. DB乐观锁:stock > 0 AND version=#{version}
3. 唯一约束:uk_user_activity,DB兜底
Redis和DB的一致性:
通过Kafka消息 + 本地消息表保证最终一致第五步:说高可用(1分钟)
限流、熔断、降级三级保护
Redis/Kafka故障时有降级方案
监控:消费延迟、库存变化、订单成功率十六、这一讲你必须记住的核心结论
- 秒杀核心:Redis Lua原子扣减库存,是防超卖的关键
- 三层防超卖:Redis Lua + DB乐观锁 + 唯一约束
- 流量层层削减:CDN -> Nginx -> 网关 -> 服务层
- 异步下单:Kafka削峰,让DB从百万QPS降到几千TPS
- 幂等设计:消息唯一ID + 去重表,防止重复创建订单
- 延迟消息:30分钟未支付自动取消,恢复库存
- 降级兜底:Redis挂了走DB+锁,Kafka挂了走本地消息表
- 库存预热:活动开始前写入Redis,避免活动开始时打DB
十七、练习题
练习1:Lua脚本
写一个Lua脚本,实现:
- 检查用户是否已购买
- 检查库存
- 扣减库存
- 标记用户已购买(有效期30分钟)
- 返回:-1=已购买,0=库存不足,1=成功
练习2:超时取消
订单30分钟未支付要自动取消,并且恢复库存。
要求:
- 取消时要保证幂等(多次取消不出错)
- 恢复库存要同时恢复Redis和DB
- 如果恢复失败怎么兜底?
练习3:架构思考
如果秒杀活动有10万个库存(不是100个),设计上有什么不同?
提示:
- Redis库存预扣还适合吗?
- 消息队列还需要吗?
- 数据库压力是什么量级?
练习4:故障场景
场景:秒杀进行到一半,Redis突然宕机,怎么办?
要求:
- 用户正在秒杀的请求怎么处理?
- 已经扣了Redis库存但还没发消息的怎么办?
- 系统如何自动恢复?
十八、下一讲预告
下一讲我们进入:
第 7 讲:信息流系统设计——Feed流、未读数、计数系统
会讲:
- 信息流三种模式:推模式 vs 拉模式 vs 推拉结合
- 什么时候用推,什么时候用拉
- 大V发微博如何处理(写扩散问题)
- 未读数系统的设计方案
- 计数系统(点赞数、粉丝数、评论数)
- 信息流的缓存策略
- 大厂实际案例解析
你可以先做练习题,我帮你批改。
或者直接开始第7讲。
你想怎么安排?