Compare commits
5 Commits
5e99239692
...
05ec452989
| Author | SHA1 | Date | |
|---|---|---|---|
| 05ec452989 | |||
| 44adea7360 | |||
| 60f3ce6e68 | |||
| 65fb9327f6 | |||
| a9d582b3a2 |
164
README.md
Normal file
164
README.md
Normal file
@@ -0,0 +1,164 @@
|
|||||||
|
# KT 常用数据集分析
|
||||||
|
|
||||||
|
一个用于分析知识追踪(Knowledge Tracing, KT)相关数据集的项目。包含多个常用教育数据集的分析 Notebook,涵盖 ASSISTment 平台和 EdNet 平台的学生在线学习数据。
|
||||||
|
|
||||||
|
## 项目结构
|
||||||
|
|
||||||
|
```
|
||||||
|
KTData/
|
||||||
|
├── README.md # 项目文档(本文件)
|
||||||
|
├── pyproject.toml # 项目配置和依赖管理
|
||||||
|
├── uv.lock # 依赖锁定文件
|
||||||
|
├── assist09_analysis.ipynb # ASSISTment2009 数据集分析
|
||||||
|
├── assist12_analysis.ipynb # ASSISTment2012 数据集分析
|
||||||
|
├── assist15_analysis.ipynb # ASSISTment2015 数据集分析
|
||||||
|
├── assist17_analysis.ipynb # ASSISTment2017 数据集分析
|
||||||
|
├── ednet_kt1.ipynb # EdNet KT1 数据集分析
|
||||||
|
└── data/ # 数据集存放目录
|
||||||
|
├── assistment09/ # ASSISTment 2009 数据
|
||||||
|
├── assistment12/ # ASSISTment 2012 数据
|
||||||
|
├── assistment15/ # ASSISTment 2015 数据
|
||||||
|
├── assistment17/ # ASSISTment 2017 数据
|
||||||
|
└── EdNet/ # EdNet 数据集
|
||||||
|
├── EdNet-Contents/ # 问题元数据等
|
||||||
|
└── EdNet-KT1/ # 学生答题记录
|
||||||
|
```
|
||||||
|
|
||||||
|
## 包含的数据集
|
||||||
|
|
||||||
|
### 1. ASSISTment 2009(Skill Builder)
|
||||||
|
- **文件大小**: 61 MB
|
||||||
|
- **数据量**: ~26,688 条问题记录
|
||||||
|
- **学生数量**: 4,217
|
||||||
|
- **技能数量**: 123
|
||||||
|
- **特点**: 包含主问题和支撑问题,记录详细的学生交互过程
|
||||||
|
- **关键字段**:
|
||||||
|
- `order_id`: 问题日志ID
|
||||||
|
- `correct`: 答题是否正确(0/1)
|
||||||
|
- `original`: 问题类型(1=主问题, 0=支撑问题)
|
||||||
|
- `skill_id/skill_name`: 技能标识
|
||||||
|
- `hint_count`: 学生请求的提示次数
|
||||||
|
- `ms_first_response`: 首次响应时间
|
||||||
|
- `tutor_mode`: 导师模式或测试模式
|
||||||
|
|
||||||
|
### 2. ASSISTment 2012-2013
|
||||||
|
- **文件大小**: 2.9 GB
|
||||||
|
- **学年**: 2012-2013
|
||||||
|
- **特点**: 在 2009 的基础上增加了学生情绪状态的置信度预测
|
||||||
|
- **关键字段**:
|
||||||
|
- `problem_log_id`: 问题日志ID
|
||||||
|
- `skill`: 技能名称
|
||||||
|
- `problem_id`: 问题ID
|
||||||
|
- `correct`: 答题是否正确(0/1)
|
||||||
|
- `tutor_mode`: 导师/测试模式
|
||||||
|
- `type`: 问题集类型(LinearSection, MasterySection 等)
|
||||||
|
- `Average_confidence(*)`: 情绪状态置信度(FRUSTRATED, CONFUSED, CONCENTRATING, BORED)
|
||||||
|
|
||||||
|
### 3. ASSISTment 2015
|
||||||
|
- **文件大小**: 18 MB
|
||||||
|
- **学年**: 2015-2016
|
||||||
|
- **数据量**: 100 个最常用的技能构建器
|
||||||
|
- **特点**: 仅包含主问题数据,字段相对简化
|
||||||
|
- **关键字段**:
|
||||||
|
- `user_id`: 学生ID
|
||||||
|
- `log_id`: 作答记录ID
|
||||||
|
- `sequence_id`: 习题集ID(代表技能)
|
||||||
|
- `correct`: 答题是否正确(0/1)
|
||||||
|
|
||||||
|
### 4. ASSISTment 2017
|
||||||
|
- **文件大小**: 524 MB
|
||||||
|
- **年份**: 2017
|
||||||
|
- **特点**: 包含 82 个特征列,最详细的特征工程数据
|
||||||
|
- **关键字段**:
|
||||||
|
- `studentId`: 学生ID
|
||||||
|
- `problemId`: 问题ID
|
||||||
|
- `skill`: 技能名称
|
||||||
|
- `correct`: 答题是否正确
|
||||||
|
- `InferredGender`: 推断的学生性别
|
||||||
|
- `AveCorrect`: 平均正确率
|
||||||
|
- `AveKnow`: 平均知识水平
|
||||||
|
- 多种情绪预测值(Bored, Concentrating, Confused, Frustrated, Off Task, Gaming)
|
||||||
|
|
||||||
|
### 5. EdNet KT1
|
||||||
|
- **数据来源**: EdNet 在线教育平台
|
||||||
|
- **特点**: 目前最大规模的学生交互数据
|
||||||
|
- **学生数据**:
|
||||||
|
- 学生作答数据以 CSV 文件形式存储,每个学生一个文件(u1.csv, u2.csv 等)
|
||||||
|
- 包含学生的完整作答序列
|
||||||
|
- **关键字段**:
|
||||||
|
- `timestamp`: 题目被给出的时间(Unix 时间戳,毫秒)
|
||||||
|
- `question_id`: 问题ID(格式:q{integer})
|
||||||
|
- `solve_id`: 题组捆绑包ID
|
||||||
|
- `user_answer`: 学生提交的答案(a/b/c/d)
|
||||||
|
- `elapsed_time`: 解答时间(毫秒)
|
||||||
|
- **问题元数据**:
|
||||||
|
- `correct_answer`: 正确答案
|
||||||
|
- `part`: 题目部分编号(1-7)
|
||||||
|
- `tags`: 专家标注的技能标签
|
||||||
|
- `deployed_at`: 问题上线时间
|
||||||
|
|
||||||
|
## 数据特征对比
|
||||||
|
|
||||||
|
| 特征 | 2009 | 2012 | 2015 | 2017 | EdNet |
|
||||||
|
|-----|------|------|------|------|-------|
|
||||||
|
| 文件大小 | 61 MB | 2.9 GB | 18 MB | 524 MB | 可变 |
|
||||||
|
| 包含支撑问题 | ✓ | ✓ | ✗ | ✓ | ✗ |
|
||||||
|
| 情绪预测 | ✗ | ✓ | ✗ | ✓ | ✗ |
|
||||||
|
| 学生数量 | 4,217 | 大规模 | 未统计 | 大规模 | 多变 |
|
||||||
|
| 字段数量 | ~30 | ~40 | 4 | 82 | 5 |
|
||||||
|
| 技能标注 | 详细 | 详细 | 简化 | 详细 | 分级标签 |
|
||||||
|
|
||||||
|
## 环境配置
|
||||||
|
|
||||||
|
### 前置要求
|
||||||
|
|
||||||
|
- Python 3.12 或更高版本
|
||||||
|
- UV 包管理工具([安装指南](https://docs.astral.sh/uv/getting-started/installation/))
|
||||||
|
|
||||||
|
### 快速开始
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 同步本地依赖包
|
||||||
|
uv sync
|
||||||
|
|
||||||
|
# 启动 Jupyter Lab
|
||||||
|
uv run jupyter lab
|
||||||
|
|
||||||
|
# 或启动 Jupyter Notebook
|
||||||
|
uv run jupyter notebook
|
||||||
|
```
|
||||||
|
|
||||||
|
### 依赖包
|
||||||
|
|
||||||
|
项目使用以下主要依赖包:
|
||||||
|
|
||||||
|
- **jupyterlab** (≥4.4.10): 交互式数据分析环境
|
||||||
|
- **pandas** (≥2.3.3): 数据处理和分析
|
||||||
|
- **matplotlib** (≥3.9.4): 数据可视化
|
||||||
|
- **tqdm** (≥4.67.1): 进度条显示
|
||||||
|
|
||||||
|
## 使用说明
|
||||||
|
|
||||||
|
### 运行 Notebook
|
||||||
|
|
||||||
|
所有分析代码都包含在 Jupyter Notebook 文件中:
|
||||||
|
|
||||||
|
1. **assist09_analysis.ipynb**: ASSISTment 2009 数据集的全面分析
|
||||||
|
2. **assist12_analysis.ipynb**: ASSISTment 2012 数据集分析,包括情绪状态分析
|
||||||
|
3. **assist15_analysis.ipynb**: ASSISTment 2015 数据集分析
|
||||||
|
4. **assist17_analysis.ipynb**: ASSISTment 2017 数据集分析,大量特征工程
|
||||||
|
5. **ednet_kt1.ipynb**: EdNet KT1 数据集分析
|
||||||
|
|
||||||
|
### 主要分析内容
|
||||||
|
|
||||||
|
- 数据集结构和字段说明
|
||||||
|
- 数据统计信息(学生数、问题数、技能数等)
|
||||||
|
- 答题正确率分析
|
||||||
|
- 学生学习行为分析
|
||||||
|
- 技能相关性分析
|
||||||
|
- 时间序列分析
|
||||||
|
|
||||||
|
## 数据获取
|
||||||
|
|
||||||
|
- **ASSISTment 数据集**: [https://sites.google.com/site/assistmentsdata/datasets](https://sites.google.com/site/assistmentsdata/datasets)
|
||||||
|
- **EdNet 数据集**: [https://github.com/riiid/ednet](https://github.com/riiid/ednet)
|
||||||
@@ -11,7 +11,8 @@
|
|||||||
"Skill builder 数据也称为掌握学习数据。该数据集来源于**技能训练**练习题组。当学生达到特定标准(通常设定为连续正确回答3道题)时,即被视为已掌握某项技能,此后系统将不再提供与该技能相关的题目。\n",
|
"Skill builder 数据也称为掌握学习数据。该数据集来源于**技能训练**练习题组。当学生达到特定标准(通常设定为连续正确回答3道题)时,即被视为已掌握某项技能,此后系统将不再提供与该技能相关的题目。\n",
|
||||||
"\n",
|
"\n",
|
||||||
"# 数据集列含义\n",
|
"# 数据集列含义\n",
|
||||||
"- order_id:原始问题日志的ID\n",
|
"- order_id:问题日志的ID\n",
|
||||||
|
" - 按照时间顺序排列\n",
|
||||||
"- assignmet:课程ID\n",
|
"- assignmet:课程ID\n",
|
||||||
"- user_id:学生ID\n",
|
"- user_id:学生ID\n",
|
||||||
"- assistment_id:辅助问题ID\n",
|
"- assistment_id:辅助问题ID\n",
|
||||||
|
|||||||
2508
assist12_analysis.ipynb
Normal file
2508
assist12_analysis.ipynb
Normal file
File diff suppressed because one or more lines are too long
3955
assist17_analysis.ipynb
Normal file
3955
assist17_analysis.ipynb
Normal file
File diff suppressed because one or more lines are too long
@@ -8,4 +8,5 @@ dependencies = [
|
|||||||
"jupyterlab>=4.4.10",
|
"jupyterlab>=4.4.10",
|
||||||
"matplotlib>=3.9.4",
|
"matplotlib>=3.9.4",
|
||||||
"pandas>=2.3.3",
|
"pandas>=2.3.3",
|
||||||
|
"tqdm>=4.67.1",
|
||||||
]
|
]
|
||||||
|
|||||||
14
uv.lock
generated
14
uv.lock
generated
@@ -904,6 +904,7 @@ dependencies = [
|
|||||||
{ name = "jupyterlab" },
|
{ name = "jupyterlab" },
|
||||||
{ name = "matplotlib" },
|
{ name = "matplotlib" },
|
||||||
{ name = "pandas" },
|
{ name = "pandas" },
|
||||||
|
{ name = "tqdm" },
|
||||||
]
|
]
|
||||||
|
|
||||||
[package.metadata]
|
[package.metadata]
|
||||||
@@ -911,6 +912,7 @@ requires-dist = [
|
|||||||
{ name = "jupyterlab", specifier = ">=4.4.10" },
|
{ name = "jupyterlab", specifier = ">=4.4.10" },
|
||||||
{ name = "matplotlib", specifier = ">=3.9.4" },
|
{ name = "matplotlib", specifier = ">=3.9.4" },
|
||||||
{ name = "pandas", specifier = ">=2.3.3" },
|
{ name = "pandas", specifier = ">=2.3.3" },
|
||||||
|
{ name = "tqdm", specifier = ">=4.67.1" },
|
||||||
]
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
@@ -1824,6 +1826,18 @@ wheels = [
|
|||||||
{ url = "https://files.pythonhosted.org/packages/5e/4f/e1f65e8f8c76d73658b33d33b81eed4322fb5085350e4328d5c956f0c8f9/tornado-6.5.2-cp39-abi3-win_arm64.whl", hash = "sha256:d6c33dc3672e3a1f3618eb63b7ef4683a7688e7b9e6e8f0d9aa5726360a004af", size = 444456, upload-time = "2025-08-08T18:26:59.207Z" },
|
{ url = "https://files.pythonhosted.org/packages/5e/4f/e1f65e8f8c76d73658b33d33b81eed4322fb5085350e4328d5c956f0c8f9/tornado-6.5.2-cp39-abi3-win_arm64.whl", hash = "sha256:d6c33dc3672e3a1f3618eb63b7ef4683a7688e7b9e6e8f0d9aa5726360a004af", size = 444456, upload-time = "2025-08-08T18:26:59.207Z" },
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "tqdm"
|
||||||
|
version = "4.67.1"
|
||||||
|
source = { registry = "https://pypi.org/simple" }
|
||||||
|
dependencies = [
|
||||||
|
{ name = "colorama", marker = "sys_platform == 'win32'" },
|
||||||
|
]
|
||||||
|
sdist = { url = "https://files.pythonhosted.org/packages/a8/4b/29b4ef32e036bb34e4ab51796dd745cdba7ed47ad142a9f4a1eb8e0c744d/tqdm-4.67.1.tar.gz", hash = "sha256:f8aef9c52c08c13a65f30ea34f4e5aac3fd1a34959879d7e59e63027286627f2", size = 169737, upload-time = "2024-11-24T20:12:22.481Z" }
|
||||||
|
wheels = [
|
||||||
|
{ url = "https://files.pythonhosted.org/packages/d0/30/dc54f88dd4a2b5dc8a0279bdd7270e735851848b762aeb1c1184ed1f6b14/tqdm-4.67.1-py3-none-any.whl", hash = "sha256:26445eca388f82e72884e0d580d5464cd801a3ea01e63e5601bdff9ba6a48de2", size = 78540, upload-time = "2024-11-24T20:12:19.698Z" },
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "traitlets"
|
name = "traitlets"
|
||||||
version = "5.14.3"
|
version = "5.14.3"
|
||||||
|
|||||||
Reference in New Issue
Block a user