Browse Source

增加上传文件流的相关配置

Sherlock 1 week ago
parent
commit
bdac27e161
6 changed files with 258 additions and 7 deletions
  1. 2 0
      .env.example
  2. 241 0
      Readme.md
  3. 5 5
      api_test.py
  4. 4 0
      core/config.py
  5. 4 2
      models/item2vec/inference.py
  6. 2 0
      utils/file_stream.py

+ 2 - 0
.env.example

@@ -20,3 +20,5 @@ LOG_LEVEL=INFO
 # File Service
 FILE_UPLOAD_URL=http://file-center.jcpt:8080/file/fileUpload
 FILE_DOWNLOAD_URL=http://file-center.jcpt:8080/file/fileDownload
+# 上传/下载所需 cookie(整行从浏览器复制,多个用 "; " 拼接)
+FILE_SERVICE_COOKIE=

+ 241 - 0
Readme.md

@@ -0,0 +1,241 @@
+# BrandCultivation 卷烟品牌培育推荐系统
+
+基于协同过滤、Item2Vec 和 GBDT-LR 的卷烟品牌培育商户推荐系统,提供品规-商户匹配推荐、投放量分配、效果验证等功能。
+
+## 目录结构
+
+```
+BrandCultivation/
+├── core/                    # 基础设施层(日志、配置、异常、中间件)
+├── api/                     # FastAPI 路由层
+├── database/                # 数据访问层(MySQL DAO + Redis)
+├── models/                  # ML 模型(Item2Vec、ItemCF、GBDT-LR)
+├── utils/                   # 工具类(文件上传、报告生成)
+├── config/                  # 配置文件(YAML)
+├── run_api.py               # API 服务入口
+├── train.py                 # 模型训练入口
+├── requirements.txt         # Python 依赖
+└── .env.example             # 环境变量模板
+```
+
+## 环境要求
+
+- Python 3.10+
+- MySQL 5.7+
+- Redis 5.0+
+
+## 安装
+
+```bash
+# 克隆项目
+git clone <repo-url>
+cd BrandCultivation
+
+# 创建虚拟环境
+conda create -n recommend python=3.10
+conda activate recommend
+
+# 安装依赖
+pip install -r requirements.txt
+```
+
+## 配置
+
+### 环境变量
+
+复制 `.env.example` 为 `.env`,填入实际值:
+
+```bash
+cp .env.example .env
+```
+
+必须配置的环境变量:
+
+| 变量 | 说明 | 示例 |
+|------|------|------|
+| `MYSQL_HOST` | MySQL 主机地址 | `rm-xxx.mysql.rds.aliyuncs.com` |
+| `MYSQL_PORT` | MySQL 端口 | `3036` |
+| `MYSQL_USER` | MySQL 用户名 | `BrandCultivation` |
+| `MYSQL_PASSWORD` | MySQL 密码 | (必填) |
+| `MYSQL_DB` | 数据库名 | `brand_cultivation` |
+| `REDIS_HOST` | Redis 主机地址 | `r-xxx.redis.rds.aliyuncs.com` |
+| `REDIS_PORT` | Redis 端口 | `5000` |
+| `REDIS_PASSWORD` | Redis 密码 | (必填) |
+| `REDIS_DB` | Redis 数据库编号 | `10` |
+| `LOG_LEVEL` | 日志级别 | `INFO`(默认) |
+| `FILE_UPLOAD_URL` | 文件上传服务地址 | `http://file-center.jcpt:8080/file/fileUpload` |
+| `FILE_DOWNLOAD_URL` | 文件下载服务地址 | `http://file-center.jcpt:8080/file/fileDownload` |
+
+如果不使用 `.env` 文件,也可以直接 export 环境变量:
+
+```bash
+export MYSQL_PASSWORD='your_password'
+export REDIS_PASSWORD='your_password'
+```
+
+### YAML 配置
+
+非敏感配置保留在 `config/` 目录下的 YAML 文件中,环境变量优先级高于 YAML。
+
+## 运行
+
+### 启动 API 服务
+
+```bash
+python run_api.py
+```
+
+服务启动后监听 `0.0.0.0:7960`,可通过以下方式验证:
+
+```bash
+# 健康检查
+curl http://localhost:7960/health
+
+# 预期返回
+# {"status":"healthy","mysql":"ok","redis":"ok"}
+```
+
+也可以使用 uvicorn 直接启动(支持热重载):
+
+```bash
+uvicorn run_api:app --host 0.0.0.0 --port 7960 --reload
+```
+
+### 模型训练
+
+训练前确保 MySQL 和 Redis 均可连接。
+
+```bash
+# 完整训练(协同过滤 + 热度召回 + GBDT-LR)
+python train.py --run_train --city_uuid 00000000000000000000000011445301
+
+# 仅训练召回模型(协同过滤 + 热度召回)
+python train.py --run_recall --city_uuid 00000000000000000000000011445301
+
+# 仅训练排序模型(GBDT-LR)
+python train.py --run_gbdtlr --city_uuid 00000000000000000000000011445301
+```
+
+训练参数:
+
+| 参数 | 说明 | 默认值 |
+|------|------|--------|
+| `--city_uuid` | 城市 UUID | `00000000000000000000000011445301` |
+| `--train_data_dir` | 训练数据保存目录 | `./data/gbdt` |
+| `--model_path` | 模型权重保存目录 | `./models/rank/weights` |
+| `--largest_n` | ItemCF 热度 Top N | `300` |
+| `--similarity_k` | ItemCF 相似商户数 | `100` |
+| `--top_n` | ItemCF 推荐候选数 | `1500` |
+| `--n_jobs` | 并行计算线程数 | `2` |
+
+## API 接口
+
+基础路径:`/brandcultivation/api/v1`
+
+### POST /recommend
+
+生成商户推荐列表并分配投放量。
+
+请求体:
+```json
+{
+    "city_uuid": "00000000000000000000000011445301",
+    "product_code": "440298",
+    "recall_cust_count": 500,
+    "delivery_count": 5000,
+    "cultivacation_id": "10000001",
+    "limit_cycle_name": "202505W1(05.05-05.11)"
+}
+```
+
+响应:
+```json
+{
+    "code": 200,
+    "msg": "success",
+    "data": {
+        "recommendationInfo": [
+            {"id": 1, "cust_code": "445300108802", "recommend_score": 95.3, "delivery_count": 120}
+        ]
+    }
+}
+```
+
+### POST /report
+
+获取推荐相关报告文件 ID。
+
+请求体:
+```json
+{
+    "cultivacation_id": "10000001"
+}
+```
+
+### POST /eval_report
+
+生成投放效果验证报告。
+
+请求体:
+```json
+{
+    "city_uuid": "00000000000000000000000011445301",
+    "product_code": "440298",
+    "cultivacation_id": "10000001",
+    "start_time": "2025/2/10",
+    "end_time": "2025/2/16"
+}
+```
+
+### GET /health
+
+健康检查,返回 MySQL 和 Redis 连接状态。
+
+## 日志
+
+系统使用 JSON 格式日志输出到 stdout,每条日志包含:
+
+```json
+{
+    "timestamp": "2026-05-21T03:35:48.869426+00:00",
+    "level": "INFO",
+    "module": "recommend",
+    "function": "recommend",
+    "line": 18,
+    "message": "Recommend request: city=xxx, product=440298, recall=500",
+    "request_id": "a1b2c3d4"
+}
+```
+
+通过 `LOG_LEVEL` 环境变量控制日志级别(DEBUG / INFO / WARNING / ERROR)。
+
+API 请求会自动生成 `request_id`,贯穿整个请求链路,方便问题追踪。响应头中也会返回 `X-Request-ID`。
+
+## Docker 部署
+
+```dockerfile
+FROM python:3.10-slim
+
+WORKDIR /app
+COPY requirements.txt .
+RUN pip install --no-cache-dir -r requirements.txt
+
+COPY . .
+
+ENV MYSQL_PASSWORD=""
+ENV REDIS_PASSWORD=""
+ENV LOG_LEVEL=INFO
+
+EXPOSE 7960
+CMD ["python", "run_api.py"]
+```
+
+```bash
+docker build -t brand-cultivation .
+docker run -d \
+    -p 7960:7960 \
+    -e MYSQL_PASSWORD='your_password' \
+    -e REDIS_PASSWORD='your_password' \
+    brand-cultivation
+```
+

+ 5 - 5
api_test.py

@@ -3,12 +3,12 @@ import json
 
 url = "http://127.0.0.1:7960/brandcultivation/api/v1/recommend"
 payload = {
-    "city_uuid": "00000000000000000000000011440901",
-    "product_code": "440308",
-    "recall_cust_count": 80,
+    "city_uuid": "00000000000000000000000011445301",
+    "product_code": "310101",
+    "recall_cust_count": 100,
     "delivery_count": 80,
-    "cultivacation_id": "10000003",
-    "limit_cycle_name": "202603W4(03.21-03.29)"
+    "cultivacation_id": "10000001",
+    "limit_cycle_name": "202606W1(06.01-06.07)"
 }
 headers = {'Content-Type': 'application/json'}
 

+ 4 - 0
core/config.py

@@ -73,6 +73,10 @@ class _Settings:
     def file_download_url(self) -> str:
         return _get_env("FILE_DOWNLOAD_URL", self._service_cfg.get("aliyun", {}).get("download_url", ""))
 
+    @property
+    def file_service_cookie(self) -> str:
+        return _get_env("FILE_SERVICE_COOKIE", self._service_cfg.get("aliyun", {}).get("cookie", ""))
+
     @property
     def model_config(self) -> dict:
         return self._model_cfg

+ 4 - 2
models/item2vec/inference.py

@@ -58,9 +58,11 @@ class Item2VecModel:
             .sort_values(["sale_qty", "cust_code"], ascending=[False, True])
         )
         
-        # 对销量进行归一化
+        # 对销量进行归一化:先 log1p 压缩幂律分布的长尾,再 StandardScaler + sigmoid
+        # 不做 log 变换时,头部商户 z-score 过大会导致 sigmoid 饱和,分数全为 100
+        log_qty = np.log1p(recommend_cust["sale_qty"].values).reshape(-1, 1)
         scaler = StandardScaler()
-        normalized = scaler.fit_transform(recommend_cust["sale_qty"].values.reshape(-1, 1))
+        normalized = scaler.fit_transform(log_qty)
         recommend_cust["recommend_score"] = ((1 / (1 + np.exp(-normalized))) * 100).flatten()
         # recommend_cust = recommend_cust.rename(columns={"sale_qty": "recommend_score"})
         # recommend_cust.to_csv("./data/item2vec_recommend.csv", index=False)

+ 2 - 0
utils/file_stream.py

@@ -15,6 +15,8 @@ class FileStreamUtils:
         "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
         "Accept": "*/*",
     }
+    if settings.file_service_cookie:
+        headers["Cookie"] = settings.file_service_cookie
 
     @staticmethod
     def upload_files(reports_dir, files):