2103 lines
158 KiB
Plaintext
2103 lines
158 KiB
Plaintext
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "edc68c17",
|
||
"metadata": {},
|
||
"source": [
|
||
"# AssistMent2009 数据集分析\n",
|
||
"\n",
|
||
"# 数据集简介\n",
|
||
"Skill builder 数据也称为掌握学习数据。该数据集来源于**技能训练**练习题组。当学生达到特定标准(通常设定为连续正确回答3道题)时,即被视为已掌握某项技能,此后系统将不再提供与该技能相关的题目。\n",
|
||
"\n",
|
||
"# 数据集列含义\n",
|
||
"- order_id:原始问题日志的ID\n",
|
||
"- assignmet:课程ID\n",
|
||
"- user_id:学生ID\n",
|
||
"- assistment_id:辅助问题ID\n",
|
||
" - 与问题 ID 类似。这是构建器中用户将看到的问题的 ID。如果一个问题包含多个主问题和/或支架问题,则与单个问题相关的一切内容均称为一个辅助任务,并具有相同的辅助任务 ID。如果您在问题日志中看到相同的辅助任务编号,则表明这些问题是同一整体问题下的多个主问题(或支架问题)。\n",
|
||
"- problem_id:问题ID\n",
|
||
" - 如果一个问题有多个主问题,则每个主问题将拥有不同的问题ID\n",
|
||
"- origin(0/1):区分主问题和支撑问题。\n",
|
||
" - 1表示主问题,0代表支撑问题\n",
|
||
" - 如果一个主问题带有支撑问题,且学生回答错误或请求将问题分解为步骤,则会创建一个名为“支撑问题”的新问题。这将在文件中生成单独的问题日志行,其中变量 original 设置为 0。\n",
|
||
"- correct(0/1):问题的回复是否正确\n",
|
||
" - 1表示第一尝试即正确,2表示第一次尝试错误或者请求了帮助\n",
|
||
" - 这一列通常是预测的目标。(补充说明:尼尔·赫弗南指出,虽然大多数情况下确实如此,但我们也有教师可以评分的作文题。尼尔认为,如果该数值为 0.25,这意味着教师给出了 4 分中的 1 分)\n",
|
||
"- attempt_count:尝试次数(学生输入答案的次数)\n",
|
||
"- ms_first_response:开始时间与学生首次操作(请求提示或输入答案)之间的时间间隔(单位:毫秒)\n",
|
||
"- tutor_mode:导师模式、测试模式和课后测试\n",
|
||
" - ASSISTment09数据集中只有tutor和test两种导师模式,且test模式的题目占比极少\n",
|
||
"- answer_type:问题答案的类型\n",
|
||
"- sequence_id:习题集的ID\n",
|
||
"- student_class_id:学生的班级ID\n",
|
||
"- position:问题在作业页面上的位置\n",
|
||
"- type:问题集的名称\n",
|
||
" - ASSISTment09数据集中只有一个问题集MasterySection\n",
|
||
"- base_sequence_id:用于标记习题集被复制的情况\n",
|
||
" - 当一个习题集被复制的时候,该值为被复制的习题集的ID\n",
|
||
"- skill_id:技能ID\n",
|
||
" - 在09数据集中,每个问题只与一个技能相关\n",
|
||
"- skill_name:技能名称\n",
|
||
" - 对于skill_builder数据集,同一条作答记录若对应有多个不同的技能,则该条记录会被复制多次,以保证每一行记录只对应一个技能\n",
|
||
"- teacher_id:教师ID\n",
|
||
"- school_id:学校ID\n",
|
||
"- hint_count:学生期间请求提示的次数\n",
|
||
"- hint_total:系统能够提供的提示总数\n",
|
||
"- overlap_time:学生完成该问题所用的时间(单位:毫秒)\n",
|
||
" - 在理想情况下,这应该为学生完成问题所花费的时间\n",
|
||
" - 在系统中,这个字段经常被错误计算,建议使用其他字段来间接计算\n",
|
||
"- template_id:ASSISTments的模板ID\n",
|
||
" - 具有相同模板ID的ASSISTments包含相似的问题\n",
|
||
"- answer_id:多选题答案的ID\n",
|
||
"- answer_text:填空题的答案文本\n",
|
||
"- first_action:学生首次操作的类型\n",
|
||
"- bottom_hint:\n",
|
||
" - 如果此项为空,说明学生未请求提示\n",
|
||
" - 对于支撑式问题,他们无法获得提示\n",
|
||
"- opportunity:学生在该技能上能够练习的次数\n",
|
||
" - 对于技能构建器数据集,同一数据记录中不同技能的机会分布在不同的行中。这意味着,如果学生回答了一道多技能题目,该记录会被复制多次,每次复制都会被标记为其中一种多技能,并附上相应的机会计数\n",
|
||
"- opportunity_original:学生在该技能上能够练习的次数(仅计算原始问题)\n",
|
||
"\n",
|
||
"# 补充说明\n",
|
||
"\n",
|
||
"## 主问题和支撑式问题\n",
|
||
"当学生在解决一个**主问题**时回答错误或者主动要求将问题分解成小步骤时,ASSISTment系统会提供**一个或多个支撑式问题**。\n",
|
||
"- 支撑式问题在数据集中使用*original*字段进行标记\n",
|
||
"- 学生通常不能在回答支撑式问题时获取提示\n",
|
||
"\n",
|
||
"## 导师模式\n",
|
||
"在数据集中*tutor_mode*列用于区分学生做题时系统处于导师(tutor)、测试(test_mode)、课前测试(pre_test)还是课后测试(post_test)。\n",
|
||
"在导师模式中,学生做题时可以获得即时反馈、提示或逐步辅导;在测试模式中,学生做题时系统不会给出任何反馈和指导信息。"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 43,
|
||
"id": "1ed269cb",
|
||
"metadata": {},
|
||
"outputs": [],
|
||
"source": [
|
||
"import pandas as pd\n",
|
||
"import matplotlib.pyplot as plt\n",
|
||
"\n",
|
||
"# Load the ASSISTments 2009 dataset\n",
|
||
"data = pd.read_csv(\n",
|
||
" \"data/assistment09/skill_builder_data_corrected.csv\",\n",
|
||
" low_memory=False,\n",
|
||
" encoding=\"latin1\",\n",
|
||
" )"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 44,
|
||
"id": "92fcf75b",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"application/vnd.microsoft.datawrangler.viewer.v0+json": {
|
||
"columns": [
|
||
{
|
||
"name": "index",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "order_id",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "assignment_id",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "user_id",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "assistment_id",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "problem_id",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "original",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "correct",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "attempt_count",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "ms_first_response",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "tutor_mode",
|
||
"rawType": "object",
|
||
"type": "string"
|
||
},
|
||
{
|
||
"name": "answer_type",
|
||
"rawType": "object",
|
||
"type": "string"
|
||
},
|
||
{
|
||
"name": "sequence_id",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "student_class_id",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "position",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "type",
|
||
"rawType": "object",
|
||
"type": "string"
|
||
},
|
||
{
|
||
"name": "base_sequence_id",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "skill_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "skill_name",
|
||
"rawType": "object",
|
||
"type": "string"
|
||
},
|
||
{
|
||
"name": "teacher_id",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "school_id",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "hint_count",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "hint_total",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "overlap_time",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "template_id",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "answer_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "answer_text",
|
||
"rawType": "object",
|
||
"type": "string"
|
||
},
|
||
{
|
||
"name": "first_action",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "bottom_hint",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "opportunity",
|
||
"rawType": "int64",
|
||
"type": "integer"
|
||
},
|
||
{
|
||
"name": "opportunity_original",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
}
|
||
],
|
||
"ref": "df411cc4-d6ea-4f22-b70a-9ab5d50356b4",
|
||
"rows": [
|
||
[
|
||
"0",
|
||
"33022537",
|
||
"277618",
|
||
"64525",
|
||
"33139",
|
||
"51424",
|
||
"1",
|
||
"1",
|
||
"1",
|
||
"32454",
|
||
"tutor",
|
||
"algebra",
|
||
"5948",
|
||
"13241",
|
||
"126",
|
||
"MasterySection",
|
||
"5948",
|
||
"1.0",
|
||
"Box and Whisker",
|
||
"22763",
|
||
"73",
|
||
"0",
|
||
"3",
|
||
"32454",
|
||
"30799",
|
||
null,
|
||
"26",
|
||
"0",
|
||
null,
|
||
"1",
|
||
"1.0"
|
||
],
|
||
[
|
||
"1",
|
||
"33022709",
|
||
"277618",
|
||
"64525",
|
||
"33150",
|
||
"51435",
|
||
"1",
|
||
"1",
|
||
"1",
|
||
"4922",
|
||
"tutor",
|
||
"algebra",
|
||
"5948",
|
||
"13241",
|
||
"126",
|
||
"MasterySection",
|
||
"5948",
|
||
"1.0",
|
||
"Box and Whisker",
|
||
"22763",
|
||
"73",
|
||
"0",
|
||
"3",
|
||
"4922",
|
||
"30799",
|
||
null,
|
||
"55",
|
||
"0",
|
||
null,
|
||
"2",
|
||
"2.0"
|
||
],
|
||
[
|
||
"2",
|
||
"35450204",
|
||
"220674",
|
||
"70363",
|
||
"33159",
|
||
"51444",
|
||
"1",
|
||
"0",
|
||
"2",
|
||
"25390",
|
||
"tutor",
|
||
"algebra",
|
||
"5948",
|
||
"11816",
|
||
"22",
|
||
"MasterySection",
|
||
"5948",
|
||
"1.0",
|
||
"Box and Whisker",
|
||
"22763",
|
||
"73",
|
||
"0",
|
||
"3",
|
||
"42000",
|
||
"30799",
|
||
null,
|
||
"88",
|
||
"0",
|
||
null,
|
||
"1",
|
||
"1.0"
|
||
],
|
||
[
|
||
"3",
|
||
"35450295",
|
||
"220674",
|
||
"70363",
|
||
"33110",
|
||
"51395",
|
||
"1",
|
||
"1",
|
||
"1",
|
||
"4859",
|
||
"tutor",
|
||
"algebra",
|
||
"5948",
|
||
"11816",
|
||
"22",
|
||
"MasterySection",
|
||
"5948",
|
||
"1.0",
|
||
"Box and Whisker",
|
||
"22763",
|
||
"73",
|
||
"0",
|
||
"3",
|
||
"4859",
|
||
"30059",
|
||
null,
|
||
"41",
|
||
"0",
|
||
null,
|
||
"2",
|
||
"2.0"
|
||
],
|
||
[
|
||
"4",
|
||
"35450311",
|
||
"220674",
|
||
"70363",
|
||
"33196",
|
||
"51481",
|
||
"1",
|
||
"0",
|
||
"14",
|
||
"19813",
|
||
"tutor",
|
||
"algebra",
|
||
"5948",
|
||
"11816",
|
||
"22",
|
||
"MasterySection",
|
||
"5948",
|
||
"1.0",
|
||
"Box and Whisker",
|
||
"22763",
|
||
"73",
|
||
"3",
|
||
"4",
|
||
"124564",
|
||
"30060",
|
||
null,
|
||
"65",
|
||
"0",
|
||
"0.0",
|
||
"3",
|
||
"3.0"
|
||
],
|
||
[
|
||
"5",
|
||
"35450555",
|
||
"220674",
|
||
"70363",
|
||
"33172",
|
||
"51457",
|
||
"1",
|
||
"1",
|
||
"1",
|
||
"16031",
|
||
"tutor",
|
||
"algebra",
|
||
"5948",
|
||
"11816",
|
||
"22",
|
||
"MasterySection",
|
||
"5948",
|
||
"1.0",
|
||
"Box and Whisker",
|
||
"22763",
|
||
"73",
|
||
"0",
|
||
"4",
|
||
"16031",
|
||
"30060",
|
||
null,
|
||
"12",
|
||
"0",
|
||
null,
|
||
"4",
|
||
"4.0"
|
||
],
|
||
[
|
||
"6",
|
||
"35450573",
|
||
"220674",
|
||
"70363",
|
||
"33174",
|
||
"51459",
|
||
"1",
|
||
"1",
|
||
"1",
|
||
"15047",
|
||
"tutor",
|
||
"algebra",
|
||
"5948",
|
||
"11816",
|
||
"22",
|
||
"MasterySection",
|
||
"5948",
|
||
"1.0",
|
||
"Box and Whisker",
|
||
"22763",
|
||
"73",
|
||
"0",
|
||
"4",
|
||
"15047",
|
||
"30060",
|
||
null,
|
||
"6",
|
||
"0",
|
||
null,
|
||
"5",
|
||
"5.0"
|
||
],
|
||
[
|
||
"7",
|
||
"35480603",
|
||
"220674",
|
||
"70363",
|
||
"33123",
|
||
"51408",
|
||
"1",
|
||
"1",
|
||
"1",
|
||
"10732",
|
||
"tutor",
|
||
"algebra",
|
||
"5948",
|
||
"11816",
|
||
"22",
|
||
"MasterySection",
|
||
"5948",
|
||
"1.0",
|
||
"Box and Whisker",
|
||
"22763",
|
||
"73",
|
||
"0",
|
||
"3",
|
||
"10732",
|
||
"30059",
|
||
null,
|
||
"55",
|
||
"0",
|
||
null,
|
||
"6",
|
||
"6.0"
|
||
],
|
||
[
|
||
"8",
|
||
"33140811",
|
||
"220674",
|
||
"70677",
|
||
"33168",
|
||
"51453",
|
||
"1",
|
||
"1",
|
||
"1",
|
||
"23241",
|
||
"tutor",
|
||
"algebra",
|
||
"5948",
|
||
"11816",
|
||
"22",
|
||
"MasterySection",
|
||
"5948",
|
||
"1.0",
|
||
"Box and Whisker",
|
||
"22763",
|
||
"73",
|
||
"0",
|
||
"4",
|
||
"23241",
|
||
"30060",
|
||
null,
|
||
"12",
|
||
"0",
|
||
null,
|
||
"1",
|
||
"1.0"
|
||
],
|
||
[
|
||
"9",
|
||
"33140919",
|
||
"220674",
|
||
"70677",
|
||
"33112",
|
||
"51397",
|
||
"1",
|
||
"1",
|
||
"1",
|
||
"11512",
|
||
"tutor",
|
||
"algebra",
|
||
"5948",
|
||
"11816",
|
||
"22",
|
||
"MasterySection",
|
||
"5948",
|
||
"1.0",
|
||
"Box and Whisker",
|
||
"22763",
|
||
"73",
|
||
"0",
|
||
"2",
|
||
"11512",
|
||
"30059",
|
||
null,
|
||
"36",
|
||
"0",
|
||
null,
|
||
"2",
|
||
"2.0"
|
||
]
|
||
],
|
||
"shape": {
|
||
"columns": 30,
|
||
"rows": 10
|
||
}
|
||
},
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th>assignment_id</th>\n",
|
||
" <th>user_id</th>\n",
|
||
" <th>assistment_id</th>\n",
|
||
" <th>problem_id</th>\n",
|
||
" <th>original</th>\n",
|
||
" <th>correct</th>\n",
|
||
" <th>attempt_count</th>\n",
|
||
" <th>ms_first_response</th>\n",
|
||
" <th>tutor_mode</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>hint_count</th>\n",
|
||
" <th>hint_total</th>\n",
|
||
" <th>overlap_time</th>\n",
|
||
" <th>template_id</th>\n",
|
||
" <th>answer_id</th>\n",
|
||
" <th>answer_text</th>\n",
|
||
" <th>first_action</th>\n",
|
||
" <th>bottom_hint</th>\n",
|
||
" <th>opportunity</th>\n",
|
||
" <th>opportunity_original</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>0</th>\n",
|
||
" <td>33022537</td>\n",
|
||
" <td>277618</td>\n",
|
||
" <td>64525</td>\n",
|
||
" <td>33139</td>\n",
|
||
" <td>51424</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>32454</td>\n",
|
||
" <td>tutor</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>32454</td>\n",
|
||
" <td>30799</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>26</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>1</th>\n",
|
||
" <td>33022709</td>\n",
|
||
" <td>277618</td>\n",
|
||
" <td>64525</td>\n",
|
||
" <td>33150</td>\n",
|
||
" <td>51435</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>4922</td>\n",
|
||
" <td>tutor</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>4922</td>\n",
|
||
" <td>30799</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>55</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>2</th>\n",
|
||
" <td>35450204</td>\n",
|
||
" <td>220674</td>\n",
|
||
" <td>70363</td>\n",
|
||
" <td>33159</td>\n",
|
||
" <td>51444</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>25390</td>\n",
|
||
" <td>tutor</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>42000</td>\n",
|
||
" <td>30799</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>88</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>3</th>\n",
|
||
" <td>35450295</td>\n",
|
||
" <td>220674</td>\n",
|
||
" <td>70363</td>\n",
|
||
" <td>33110</td>\n",
|
||
" <td>51395</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>4859</td>\n",
|
||
" <td>tutor</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>4859</td>\n",
|
||
" <td>30059</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>41</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>4</th>\n",
|
||
" <td>35450311</td>\n",
|
||
" <td>220674</td>\n",
|
||
" <td>70363</td>\n",
|
||
" <td>33196</td>\n",
|
||
" <td>51481</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>14</td>\n",
|
||
" <td>19813</td>\n",
|
||
" <td>tutor</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>124564</td>\n",
|
||
" <td>30060</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>65</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>0.0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>3.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>5</th>\n",
|
||
" <td>35450555</td>\n",
|
||
" <td>220674</td>\n",
|
||
" <td>70363</td>\n",
|
||
" <td>33172</td>\n",
|
||
" <td>51457</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>16031</td>\n",
|
||
" <td>tutor</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>16031</td>\n",
|
||
" <td>30060</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>4.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>6</th>\n",
|
||
" <td>35450573</td>\n",
|
||
" <td>220674</td>\n",
|
||
" <td>70363</td>\n",
|
||
" <td>33174</td>\n",
|
||
" <td>51459</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>15047</td>\n",
|
||
" <td>tutor</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>15047</td>\n",
|
||
" <td>30060</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>5</td>\n",
|
||
" <td>5.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>7</th>\n",
|
||
" <td>35480603</td>\n",
|
||
" <td>220674</td>\n",
|
||
" <td>70363</td>\n",
|
||
" <td>33123</td>\n",
|
||
" <td>51408</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>10732</td>\n",
|
||
" <td>tutor</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>3</td>\n",
|
||
" <td>10732</td>\n",
|
||
" <td>30059</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>55</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>6</td>\n",
|
||
" <td>6.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>8</th>\n",
|
||
" <td>33140811</td>\n",
|
||
" <td>220674</td>\n",
|
||
" <td>70677</td>\n",
|
||
" <td>33168</td>\n",
|
||
" <td>51453</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>23241</td>\n",
|
||
" <td>tutor</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>4</td>\n",
|
||
" <td>23241</td>\n",
|
||
" <td>30060</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>12</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1.0</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>9</th>\n",
|
||
" <td>33140919</td>\n",
|
||
" <td>220674</td>\n",
|
||
" <td>70677</td>\n",
|
||
" <td>33112</td>\n",
|
||
" <td>51397</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>1</td>\n",
|
||
" <td>11512</td>\n",
|
||
" <td>tutor</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>11512</td>\n",
|
||
" <td>30059</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>36</td>\n",
|
||
" <td>0</td>\n",
|
||
" <td>NaN</td>\n",
|
||
" <td>2</td>\n",
|
||
" <td>2.0</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>10 rows × 30 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" order_id assignment_id user_id assistment_id problem_id original \\\n",
|
||
"0 33022537 277618 64525 33139 51424 1 \n",
|
||
"1 33022709 277618 64525 33150 51435 1 \n",
|
||
"2 35450204 220674 70363 33159 51444 1 \n",
|
||
"3 35450295 220674 70363 33110 51395 1 \n",
|
||
"4 35450311 220674 70363 33196 51481 1 \n",
|
||
"5 35450555 220674 70363 33172 51457 1 \n",
|
||
"6 35450573 220674 70363 33174 51459 1 \n",
|
||
"7 35480603 220674 70363 33123 51408 1 \n",
|
||
"8 33140811 220674 70677 33168 51453 1 \n",
|
||
"9 33140919 220674 70677 33112 51397 1 \n",
|
||
"\n",
|
||
" correct attempt_count ms_first_response tutor_mode ... hint_count \\\n",
|
||
"0 1 1 32454 tutor ... 0 \n",
|
||
"1 1 1 4922 tutor ... 0 \n",
|
||
"2 0 2 25390 tutor ... 0 \n",
|
||
"3 1 1 4859 tutor ... 0 \n",
|
||
"4 0 14 19813 tutor ... 3 \n",
|
||
"5 1 1 16031 tutor ... 0 \n",
|
||
"6 1 1 15047 tutor ... 0 \n",
|
||
"7 1 1 10732 tutor ... 0 \n",
|
||
"8 1 1 23241 tutor ... 0 \n",
|
||
"9 1 1 11512 tutor ... 0 \n",
|
||
"\n",
|
||
" hint_total overlap_time template_id answer_id answer_text first_action \\\n",
|
||
"0 3 32454 30799 NaN 26 0 \n",
|
||
"1 3 4922 30799 NaN 55 0 \n",
|
||
"2 3 42000 30799 NaN 88 0 \n",
|
||
"3 3 4859 30059 NaN 41 0 \n",
|
||
"4 4 124564 30060 NaN 65 0 \n",
|
||
"5 4 16031 30060 NaN 12 0 \n",
|
||
"6 4 15047 30060 NaN 6 0 \n",
|
||
"7 3 10732 30059 NaN 55 0 \n",
|
||
"8 4 23241 30060 NaN 12 0 \n",
|
||
"9 2 11512 30059 NaN 36 0 \n",
|
||
"\n",
|
||
" bottom_hint opportunity opportunity_original \n",
|
||
"0 NaN 1 1.0 \n",
|
||
"1 NaN 2 2.0 \n",
|
||
"2 NaN 1 1.0 \n",
|
||
"3 NaN 2 2.0 \n",
|
||
"4 0.0 3 3.0 \n",
|
||
"5 NaN 4 4.0 \n",
|
||
"6 NaN 5 5.0 \n",
|
||
"7 NaN 6 6.0 \n",
|
||
"8 NaN 1 1.0 \n",
|
||
"9 NaN 2 2.0 \n",
|
||
"\n",
|
||
"[10 rows x 30 columns]"
|
||
]
|
||
},
|
||
"execution_count": 44,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 显示数据集的前十行\n",
|
||
"data.head(10)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 45,
|
||
"id": "75bca3a4",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"application/vnd.microsoft.datawrangler.viewer.v0+json": {
|
||
"columns": [
|
||
{
|
||
"name": "index",
|
||
"rawType": "object",
|
||
"type": "string"
|
||
},
|
||
{
|
||
"name": "order_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "assignment_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "user_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "assistment_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "problem_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "original",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "correct",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "attempt_count",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "ms_first_response",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "sequence_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "student_class_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "position",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "base_sequence_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "skill_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "teacher_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "school_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "hint_count",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "hint_total",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "overlap_time",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "template_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "answer_id",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "first_action",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "bottom_hint",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "opportunity",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
},
|
||
{
|
||
"name": "opportunity_original",
|
||
"rawType": "float64",
|
||
"type": "float"
|
||
}
|
||
],
|
||
"ref": "2ed645f1-02f9-4fc4-a0c7-1c7b54d08cad",
|
||
"rows": [
|
||
[
|
||
"count",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"338001.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"401756.0",
|
||
"45454.0",
|
||
"401756.0",
|
||
"67044.0",
|
||
"401756.0",
|
||
"328291.0"
|
||
],
|
||
[
|
||
"mean",
|
||
"30662559.65079053",
|
||
"273701.84588157985",
|
||
"83414.15454156254",
|
||
"46443.51752556278",
|
||
"81117.0300107528",
|
||
"0.8171402542836946",
|
||
"0.6429225699180597",
|
||
"1.596416730552873",
|
||
"47484.643271040135",
|
||
"7284.411087824451",
|
||
"12919.115221676839",
|
||
"57.163649080536445",
|
||
"6786.020985374207",
|
||
"127.16703205020103",
|
||
"46875.58732165792",
|
||
"3031.291024900686",
|
||
"0.48747000667071555",
|
||
"2.235817262218859",
|
||
"59648.48120501001",
|
||
"39571.335029221715",
|
||
"145094.43166718",
|
||
"0.13001174842441682",
|
||
"0.7240916413101843",
|
||
"20.553534981431515",
|
||
"14.403306822300946"
|
||
],
|
||
[
|
||
"std",
|
||
"5264886.089028761",
|
||
"11338.460016588557",
|
||
"7417.81402055726",
|
||
"11832.443427164199",
|
||
"25426.79966219532",
|
||
"0.38655197714693906",
|
||
"0.4791385086107508",
|
||
"12.050437265853866",
|
||
"361458.9611268155",
|
||
"1497.9410719196715",
|
||
"783.548290733902",
|
||
"65.21546405883011",
|
||
"1263.3597354711062",
|
||
"120.42751824440924",
|
||
"15892.975480841278",
|
||
"1830.4514863620323",
|
||
"1.187255363606401",
|
||
"1.804243880628286",
|
||
"382218.84936623357",
|
||
"12679.4399263291",
|
||
"47127.4782849689",
|
||
"0.3940987205726975",
|
||
"0.4469741784681641",
|
||
"62.52399351910894",
|
||
"62.39368356386473"
|
||
],
|
||
[
|
||
"min",
|
||
"20224085.0",
|
||
"217900.0",
|
||
"14.0",
|
||
"86.0",
|
||
"83.0",
|
||
"0.0",
|
||
"0.0",
|
||
"0.0",
|
||
"-7759575.0",
|
||
"5870.0",
|
||
"11644.0",
|
||
"1.0",
|
||
"5870.0",
|
||
"1.0",
|
||
"11158.0",
|
||
"1.0",
|
||
"0.0",
|
||
"0.0",
|
||
"-7759575.0",
|
||
"86.0",
|
||
"1.0",
|
||
"0.0",
|
||
"0.0",
|
||
"1.0",
|
||
"1.0"
|
||
],
|
||
[
|
||
"25%",
|
||
"26602182.25",
|
||
"266784.0",
|
||
"78970.0",
|
||
"37046.0",
|
||
"58467.0",
|
||
"1.0",
|
||
"0.0",
|
||
"1.0",
|
||
"8518.0",
|
||
"5979.0",
|
||
"12352.0",
|
||
"9.0",
|
||
"5968.0",
|
||
"39.0",
|
||
"42999.0",
|
||
"2770.0",
|
||
"0.0",
|
||
"0.0",
|
||
"10669.0",
|
||
"30244.0",
|
||
"104412.0",
|
||
"0.0",
|
||
"0.0",
|
||
"3.0",
|
||
"3.0"
|
||
],
|
||
[
|
||
"50%",
|
||
"31105126.0",
|
||
"271629.0",
|
||
"80111.0",
|
||
"44498.0",
|
||
"80734.0",
|
||
"1.0",
|
||
"1.0",
|
||
"1.0",
|
||
"19453.0",
|
||
"6910.0",
|
||
"12574.0",
|
||
"27.0",
|
||
"6094.0",
|
||
"74.0",
|
||
"45778.0",
|
||
"2770.0",
|
||
"0.0",
|
||
"3.0",
|
||
"24264.5",
|
||
"30987.0",
|
||
"136247.0",
|
||
"0.0",
|
||
"1.0",
|
||
"8.0",
|
||
"6.0"
|
||
],
|
||
[
|
||
"75%",
|
||
"34943640.75",
|
||
"279158.0",
|
||
"88142.0",
|
||
"53142.0",
|
||
"93102.0",
|
||
"1.0",
|
||
"1.0",
|
||
"1.0",
|
||
"44578.25",
|
||
"8032.0",
|
||
"13241.0",
|
||
"92.0",
|
||
"7014.0",
|
||
"279.0",
|
||
"59882.0",
|
||
"5056.0",
|
||
"0.0",
|
||
"4.0",
|
||
"56989.25",
|
||
"46399.0",
|
||
"184077.0",
|
||
"0.0",
|
||
"1.0",
|
||
"19.0",
|
||
"13.0"
|
||
],
|
||
[
|
||
"max",
|
||
"38310202.0",
|
||
"291503.0",
|
||
"96299.0",
|
||
"106210.0",
|
||
"207348.0",
|
||
"1.0",
|
||
"1.0",
|
||
"3824.0",
|
||
"84076920.0",
|
||
"13362.0",
|
||
"14415.0",
|
||
"295.0",
|
||
"13362.0",
|
||
"378.0",
|
||
"69274.0",
|
||
"9948.0",
|
||
"10.0",
|
||
"10.0",
|
||
"84076925.0",
|
||
"106180.0",
|
||
"323181.0",
|
||
"2.0",
|
||
"1.0",
|
||
"3371.0",
|
||
"3371.0"
|
||
]
|
||
],
|
||
"shape": {
|
||
"columns": 25,
|
||
"rows": 8
|
||
}
|
||
},
|
||
"text/html": [
|
||
"<div>\n",
|
||
"<style scoped>\n",
|
||
" .dataframe tbody tr th:only-of-type {\n",
|
||
" vertical-align: middle;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe tbody tr th {\n",
|
||
" vertical-align: top;\n",
|
||
" }\n",
|
||
"\n",
|
||
" .dataframe thead th {\n",
|
||
" text-align: right;\n",
|
||
" }\n",
|
||
"</style>\n",
|
||
"<table border=\"1\" class=\"dataframe\">\n",
|
||
" <thead>\n",
|
||
" <tr style=\"text-align: right;\">\n",
|
||
" <th></th>\n",
|
||
" <th>order_id</th>\n",
|
||
" <th>assignment_id</th>\n",
|
||
" <th>user_id</th>\n",
|
||
" <th>assistment_id</th>\n",
|
||
" <th>problem_id</th>\n",
|
||
" <th>original</th>\n",
|
||
" <th>correct</th>\n",
|
||
" <th>attempt_count</th>\n",
|
||
" <th>ms_first_response</th>\n",
|
||
" <th>sequence_id</th>\n",
|
||
" <th>...</th>\n",
|
||
" <th>school_id</th>\n",
|
||
" <th>hint_count</th>\n",
|
||
" <th>hint_total</th>\n",
|
||
" <th>overlap_time</th>\n",
|
||
" <th>template_id</th>\n",
|
||
" <th>answer_id</th>\n",
|
||
" <th>first_action</th>\n",
|
||
" <th>bottom_hint</th>\n",
|
||
" <th>opportunity</th>\n",
|
||
" <th>opportunity_original</th>\n",
|
||
" </tr>\n",
|
||
" </thead>\n",
|
||
" <tbody>\n",
|
||
" <tr>\n",
|
||
" <th>count</th>\n",
|
||
" <td>4.017560e+05</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>4.017560e+05</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>4.017560e+05</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>45454.000000</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>67044.000000</td>\n",
|
||
" <td>401756.000000</td>\n",
|
||
" <td>328291.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>mean</th>\n",
|
||
" <td>3.066256e+07</td>\n",
|
||
" <td>273701.845882</td>\n",
|
||
" <td>83414.154542</td>\n",
|
||
" <td>46443.517526</td>\n",
|
||
" <td>81117.030011</td>\n",
|
||
" <td>0.817140</td>\n",
|
||
" <td>0.642923</td>\n",
|
||
" <td>1.596417</td>\n",
|
||
" <td>4.748464e+04</td>\n",
|
||
" <td>7284.411088</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>3031.291025</td>\n",
|
||
" <td>0.487470</td>\n",
|
||
" <td>2.235817</td>\n",
|
||
" <td>5.964848e+04</td>\n",
|
||
" <td>39571.335029</td>\n",
|
||
" <td>145094.431667</td>\n",
|
||
" <td>0.130012</td>\n",
|
||
" <td>0.724092</td>\n",
|
||
" <td>20.553535</td>\n",
|
||
" <td>14.403307</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>std</th>\n",
|
||
" <td>5.264886e+06</td>\n",
|
||
" <td>11338.460017</td>\n",
|
||
" <td>7417.814021</td>\n",
|
||
" <td>11832.443427</td>\n",
|
||
" <td>25426.799662</td>\n",
|
||
" <td>0.386552</td>\n",
|
||
" <td>0.479139</td>\n",
|
||
" <td>12.050437</td>\n",
|
||
" <td>3.614590e+05</td>\n",
|
||
" <td>1497.941072</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>1830.451486</td>\n",
|
||
" <td>1.187255</td>\n",
|
||
" <td>1.804244</td>\n",
|
||
" <td>3.822188e+05</td>\n",
|
||
" <td>12679.439926</td>\n",
|
||
" <td>47127.478285</td>\n",
|
||
" <td>0.394099</td>\n",
|
||
" <td>0.446974</td>\n",
|
||
" <td>62.523994</td>\n",
|
||
" <td>62.393684</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>min</th>\n",
|
||
" <td>2.022408e+07</td>\n",
|
||
" <td>217900.000000</td>\n",
|
||
" <td>14.000000</td>\n",
|
||
" <td>86.000000</td>\n",
|
||
" <td>83.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>-7.759575e+06</td>\n",
|
||
" <td>5870.000000</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>-7.759575e+06</td>\n",
|
||
" <td>86.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>25%</th>\n",
|
||
" <td>2.660218e+07</td>\n",
|
||
" <td>266784.000000</td>\n",
|
||
" <td>78970.000000</td>\n",
|
||
" <td>37046.000000</td>\n",
|
||
" <td>58467.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>8.518000e+03</td>\n",
|
||
" <td>5979.000000</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>2770.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>1.066900e+04</td>\n",
|
||
" <td>30244.000000</td>\n",
|
||
" <td>104412.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>3.000000</td>\n",
|
||
" <td>3.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>50%</th>\n",
|
||
" <td>3.110513e+07</td>\n",
|
||
" <td>271629.000000</td>\n",
|
||
" <td>80111.000000</td>\n",
|
||
" <td>44498.000000</td>\n",
|
||
" <td>80734.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>1.945300e+04</td>\n",
|
||
" <td>6910.000000</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>2770.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>3.000000</td>\n",
|
||
" <td>2.426450e+04</td>\n",
|
||
" <td>30987.000000</td>\n",
|
||
" <td>136247.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>8.000000</td>\n",
|
||
" <td>6.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>75%</th>\n",
|
||
" <td>3.494364e+07</td>\n",
|
||
" <td>279158.000000</td>\n",
|
||
" <td>88142.000000</td>\n",
|
||
" <td>53142.000000</td>\n",
|
||
" <td>93102.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>4.457825e+04</td>\n",
|
||
" <td>8032.000000</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>5056.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>4.000000</td>\n",
|
||
" <td>5.698925e+04</td>\n",
|
||
" <td>46399.000000</td>\n",
|
||
" <td>184077.000000</td>\n",
|
||
" <td>0.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>19.000000</td>\n",
|
||
" <td>13.000000</td>\n",
|
||
" </tr>\n",
|
||
" <tr>\n",
|
||
" <th>max</th>\n",
|
||
" <td>3.831020e+07</td>\n",
|
||
" <td>291503.000000</td>\n",
|
||
" <td>96299.000000</td>\n",
|
||
" <td>106210.000000</td>\n",
|
||
" <td>207348.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>3824.000000</td>\n",
|
||
" <td>8.407692e+07</td>\n",
|
||
" <td>13362.000000</td>\n",
|
||
" <td>...</td>\n",
|
||
" <td>9948.000000</td>\n",
|
||
" <td>10.000000</td>\n",
|
||
" <td>10.000000</td>\n",
|
||
" <td>8.407692e+07</td>\n",
|
||
" <td>106180.000000</td>\n",
|
||
" <td>323181.000000</td>\n",
|
||
" <td>2.000000</td>\n",
|
||
" <td>1.000000</td>\n",
|
||
" <td>3371.000000</td>\n",
|
||
" <td>3371.000000</td>\n",
|
||
" </tr>\n",
|
||
" </tbody>\n",
|
||
"</table>\n",
|
||
"<p>8 rows × 25 columns</p>\n",
|
||
"</div>"
|
||
],
|
||
"text/plain": [
|
||
" order_id assignment_id user_id assistment_id \\\n",
|
||
"count 4.017560e+05 401756.000000 401756.000000 401756.000000 \n",
|
||
"mean 3.066256e+07 273701.845882 83414.154542 46443.517526 \n",
|
||
"std 5.264886e+06 11338.460017 7417.814021 11832.443427 \n",
|
||
"min 2.022408e+07 217900.000000 14.000000 86.000000 \n",
|
||
"25% 2.660218e+07 266784.000000 78970.000000 37046.000000 \n",
|
||
"50% 3.110513e+07 271629.000000 80111.000000 44498.000000 \n",
|
||
"75% 3.494364e+07 279158.000000 88142.000000 53142.000000 \n",
|
||
"max 3.831020e+07 291503.000000 96299.000000 106210.000000 \n",
|
||
"\n",
|
||
" problem_id original correct attempt_count \\\n",
|
||
"count 401756.000000 401756.000000 401756.000000 401756.000000 \n",
|
||
"mean 81117.030011 0.817140 0.642923 1.596417 \n",
|
||
"std 25426.799662 0.386552 0.479139 12.050437 \n",
|
||
"min 83.000000 0.000000 0.000000 0.000000 \n",
|
||
"25% 58467.000000 1.000000 0.000000 1.000000 \n",
|
||
"50% 80734.000000 1.000000 1.000000 1.000000 \n",
|
||
"75% 93102.000000 1.000000 1.000000 1.000000 \n",
|
||
"max 207348.000000 1.000000 1.000000 3824.000000 \n",
|
||
"\n",
|
||
" ms_first_response sequence_id ... school_id hint_count \\\n",
|
||
"count 4.017560e+05 401756.000000 ... 401756.000000 401756.000000 \n",
|
||
"mean 4.748464e+04 7284.411088 ... 3031.291025 0.487470 \n",
|
||
"std 3.614590e+05 1497.941072 ... 1830.451486 1.187255 \n",
|
||
"min -7.759575e+06 5870.000000 ... 1.000000 0.000000 \n",
|
||
"25% 8.518000e+03 5979.000000 ... 2770.000000 0.000000 \n",
|
||
"50% 1.945300e+04 6910.000000 ... 2770.000000 0.000000 \n",
|
||
"75% 4.457825e+04 8032.000000 ... 5056.000000 0.000000 \n",
|
||
"max 8.407692e+07 13362.000000 ... 9948.000000 10.000000 \n",
|
||
"\n",
|
||
" hint_total overlap_time template_id answer_id \\\n",
|
||
"count 401756.000000 4.017560e+05 401756.000000 45454.000000 \n",
|
||
"mean 2.235817 5.964848e+04 39571.335029 145094.431667 \n",
|
||
"std 1.804244 3.822188e+05 12679.439926 47127.478285 \n",
|
||
"min 0.000000 -7.759575e+06 86.000000 1.000000 \n",
|
||
"25% 0.000000 1.066900e+04 30244.000000 104412.000000 \n",
|
||
"50% 3.000000 2.426450e+04 30987.000000 136247.000000 \n",
|
||
"75% 4.000000 5.698925e+04 46399.000000 184077.000000 \n",
|
||
"max 10.000000 8.407692e+07 106180.000000 323181.000000 \n",
|
||
"\n",
|
||
" first_action bottom_hint opportunity opportunity_original \n",
|
||
"count 401756.000000 67044.000000 401756.000000 328291.000000 \n",
|
||
"mean 0.130012 0.724092 20.553535 14.403307 \n",
|
||
"std 0.394099 0.446974 62.523994 62.393684 \n",
|
||
"min 0.000000 0.000000 1.000000 1.000000 \n",
|
||
"25% 0.000000 0.000000 3.000000 3.000000 \n",
|
||
"50% 0.000000 1.000000 8.000000 6.000000 \n",
|
||
"75% 0.000000 1.000000 19.000000 13.000000 \n",
|
||
"max 2.000000 1.000000 3371.000000 3371.000000 \n",
|
||
"\n",
|
||
"[8 rows x 25 columns]"
|
||
]
|
||
},
|
||
"execution_count": 45,
|
||
"metadata": {},
|
||
"output_type": "execute_result"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 显示数据集的基本统计信息\n",
|
||
"data.describe()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "41e5d86e",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 数据集中关键列的统计信息"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "5185e94c",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 数据集原始数据量\n",
|
||
"以下数据描述了原始数据集中包含的数据数量。\n",
|
||
"\n",
|
||
"- 学生数量:4217\n",
|
||
"- 总问题数量:26688\n",
|
||
" - 主问题数量:18209\n",
|
||
" - 支撑问题数量:8479\n",
|
||
"- 技能数量:123"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 46,
|
||
"id": "640bb351",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Number of students: 4217\n",
|
||
"Number of questions: 26688\n",
|
||
"Number of skills: 123\n",
|
||
"Number of main questions: 18209\n",
|
||
"Number of scaffolding questions: 8479\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 统计学生数量\n",
|
||
"num_students = data[\"user_id\"].nunique()\n",
|
||
"print(f\"Number of students: {num_students}\")\n",
|
||
"\n",
|
||
"# 统计问题数量\n",
|
||
"num_questions = data[\"problem_id\"].nunique()\n",
|
||
"print(f\"Number of questions: {num_questions}\")\n",
|
||
"\n",
|
||
"# 统计技能数量\n",
|
||
"num_skills = data[\"skill_id\"].nunique()\n",
|
||
"print(f\"Number of skills: {num_skills}\")\n",
|
||
"\n",
|
||
"# 主问题数量\n",
|
||
"num_main_questions = data[data[\"original\"] == 1][\"problem_id\"].nunique()\n",
|
||
"print(f\"Number of main questions: {num_main_questions}\")\n",
|
||
"\n",
|
||
"# 支撑问题数量\n",
|
||
"num_scaffolding_questions = data[data[\"original\"] == 0][\"problem_id\"].nunique()\n",
|
||
"print(f\"Number of scaffolding questions: {num_scaffolding_questions}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0ed57684",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 数据缺失情况\n",
|
||
"- skill_id:63755\n",
|
||
"- skill_name:76119\n",
|
||
"- answer_id:356302\n",
|
||
"- answer_text:89208\n",
|
||
"- bottom_hint:334712\n",
|
||
"- opportunity_original:73465"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 47,
|
||
"id": "a1ccdaf2",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"skill_id 63755\n",
|
||
"skill_name 76119\n",
|
||
"answer_id 356302\n",
|
||
"answer_text 89208\n",
|
||
"bottom_hint 334712\n",
|
||
"opportunity_original 73465\n",
|
||
"dtype: int64\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 统计原始数据中所有存在缺失值的列\n",
|
||
"missing_values = data.isnull().sum()\n",
|
||
"print(missing_values[missing_values > 0])"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "ea3a2c1e",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 统计数据量\n",
|
||
"以下数据通过一些统计量来描述数据集的结构。\n",
|
||
"\n",
|
||
"- 平均每个学生的答题次数:95.27\n",
|
||
"- 每个问题平均关联的技能数量(排除没有关联技能的问题):1.2"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 48,
|
||
"id": "df58c949",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Average attempts per student: 95.27\n",
|
||
"Median attempts per student: 26.00\n",
|
||
"Skills per question statistics:\n",
|
||
"count 17751.000000\n",
|
||
"mean 1.196890\n",
|
||
"std 0.470233\n",
|
||
"min 1.000000\n",
|
||
"25% 1.000000\n",
|
||
"50% 1.000000\n",
|
||
"75% 1.000000\n",
|
||
"max 4.000000\n",
|
||
"Name: skill_id, dtype: float64\n",
|
||
"Average skills per question: 1.20\n",
|
||
"Median skills per question: 1.00\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 平均每个学生的答题次数\n",
|
||
"attempts_per_student = data.groupby(\"user_id\")[\"problem_id\"].count()\n",
|
||
"avg_attempts_per_student = attempts_per_student.mean()\n",
|
||
"median_attempts_per_student = attempts_per_student.median()\n",
|
||
"print(f\"Average attempts per student: {avg_attempts_per_student:.2f}\")\n",
|
||
"print(f\"Median attempts per student: {median_attempts_per_student:.2f}\")\n",
|
||
"\n",
|
||
"# 每个问题关联的技能数量\n",
|
||
"skills_per_question = data.groupby(\"problem_id\")[\"skill_id\"].nunique()\n",
|
||
"skills_per_question = skills_per_question[skills_per_question > 0] # 排除没有关联技能的问题\n",
|
||
"print(\"Skills per question statistics:\")\n",
|
||
"print(skills_per_question.describe())\n",
|
||
"# 计算每个问题关联的技能数量的平均值和中位数\n",
|
||
"avg_skills_per_question = skills_per_question.mean()\n",
|
||
"median_skills_per_question = skills_per_question.median()\n",
|
||
"print(f\"Average skills per question: {avg_skills_per_question:.2f}\")\n",
|
||
"print(f\"Median skills per question: {median_skills_per_question:.2f}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "21c2c810",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 其他列的分析\n",
|
||
"\n",
|
||
"- 技能信息\n",
|
||
"- 主问题和支撑问题\n",
|
||
"- 首次操作类型\n",
|
||
"- 题目的答案类型"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "e35a25ae",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 技能 (skill_id, skill_name)\n",
|
||
"数据集中并不是每一个问题都有其对应的技能,整个数据集中存在8937个问题没有对应的技能ID(包括主问题和支撑问题)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 49,
|
||
"id": "8734257a",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Total unique skills: 123\n",
|
||
"Number of questions without associated skills: 8937\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 技能的数量\n",
|
||
"skill_counts = data[\"skill_id\"].dropna().unique()\n",
|
||
"print(f\"Total unique skills: {len(skill_counts)}\")\n",
|
||
"\n",
|
||
"# 筛选出没有关联技能的问题\n",
|
||
"questions_without_skills = data[data[\"skill_id\"].isnull()][\"problem_id\"].unique()\n",
|
||
"print(f\"Number of questions without associated skills: {len(questions_without_skills)}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "0c52ae66",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 主问题和支撑问题 (original)\n",
|
||
"- 主问题:1\n",
|
||
"- 支撑问题:0"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 50,
|
||
"id": "b7c4cb88",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Total main questions: 18209\n",
|
||
"Total scaffolding questions: 8479\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 选择所有的主问题\n",
|
||
"main_questions = data[data[\"original\"] == 1][\"problem_id\"].unique()\n",
|
||
"print(f\"Total main questions: {len(main_questions)}\")\n",
|
||
"# 选择所有的支撑问题\n",
|
||
"scaffolding_questions = data[data[\"original\"] == 0][\"problem_id\"].unique()\n",
|
||
"print(f\"Total scaffolding questions: {len(scaffolding_questions)}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "b8639e1f",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 首次操作的类型 (first_action)\n",
|
||
"- 0:尝试作答\n",
|
||
"- 1:获取提示\n",
|
||
"- 2:支撑结构\n",
|
||
"\n",
|
||
"> 在该数据集中,所有学生点击题目后都进行了上述三者之一的操作"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 51,
|
||
"id": "abc2aa1e",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"First action types in the dataset: [0 1 2]\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 首次操作的类型\n",
|
||
"first_action_types = data[\"first_action\"].dropna().unique()\n",
|
||
"print(\"First action types in the dataset:\", first_action_types)"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "26996726",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 题目的答案类型 (answer_type)\n",
|
||
"- algebra(数字):18660\n",
|
||
"- fill_in_1(填空):3048\n",
|
||
"- choose_1(单项选择):4900\n",
|
||
"- open_response(开放式回答):5\n",
|
||
"- choose_n(多选题):75"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 52,
|
||
"id": "d97ca7d1",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Answer types in the dataset: ['algebra' 'fill_in_1' 'choose_1' 'open_response' 'choose_n']\n",
|
||
"Algebra questions: 18660\n",
|
||
"Fill-in questions: 3048\n",
|
||
"Choose questions: 4900\n",
|
||
"Open response questions: 5\n",
|
||
"Choose-n questions: 75\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"# 题目的答案类型\n",
|
||
"answer_type = data[\"answer_type\"].dropna().unique()\n",
|
||
"print(\"Answer types in the dataset:\", answer_type)\n",
|
||
"\n",
|
||
"# 每个类型的题目数量分布\n",
|
||
"ablgebra_count = data[data[\"answer_type\"] == \"algebra\"][\"problem_id\"].nunique()\n",
|
||
"fill_in_1_count = data[data[\"answer_type\"] == \"fill_in_1\"][\"problem_id\"].nunique()\n",
|
||
"choose_count = data[data[\"answer_type\"] == \"choose_1\"][\"problem_id\"].nunique()\n",
|
||
"open_response_count = data[data[\"answer_type\"] == \"open_response\"][\"problem_id\"].nunique()\n",
|
||
"choose_n_count = data[data[\"answer_type\"] == \"choose_n\"][\"problem_id\"].nunique()\n",
|
||
"print(f\"Algebra questions: {ablgebra_count}\")\n",
|
||
"print(f\"Fill-in questions: {fill_in_1_count}\")\n",
|
||
"print(f\"Choose questions: {choose_count}\")\n",
|
||
"print(f\"Open response questions: {open_response_count}\")\n",
|
||
"print(f\"Choose-n questions: {choose_n_count}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "438d0e89",
|
||
"metadata": {},
|
||
"source": [
|
||
"### 复制的行数量"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 53,
|
||
"id": "0cf952bb",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"name": "stdout",
|
||
"output_type": "stream",
|
||
"text": [
|
||
"Number of copy sequences: 320\n"
|
||
]
|
||
}
|
||
],
|
||
"source": [
|
||
"different_sequence = data[data[\"sequence_id\"] != data[\"base_sequence_id\"]]\n",
|
||
"print(f\"Number of copy sequences: {different_sequence['sequence_id'].nunique()}\")"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"id": "84635008",
|
||
"metadata": {},
|
||
"source": [
|
||
"# 数据结构可视化\n",
|
||
"这一板块中包含了对数据集中重要数据的可视化代码和结果。\n",
|
||
"\n",
|
||
"- 学生的答题次数分布图\n",
|
||
"- 问题类型分布图\n",
|
||
"- 整体答题正确率分布图"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 54,
|
||
"id": "c289ef13",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 1000x600 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 每个学生的答题次数\n",
|
||
"student_attempts = data.groupby(\"user_id\")[\"problem_id\"].count()\n",
|
||
"\n",
|
||
"# 绘制学生答题次数的分布图\n",
|
||
"plt.figure(figsize=(10, 6))\n",
|
||
"plt.hist(student_attempts, bins=50, color='skyblue', edgecolor='black')\n",
|
||
"plt.title('Distribution of Student Attempts')\n",
|
||
"plt.xlabel('Number of Attempts')\n",
|
||
"plt.ylabel('Number of Students')\n",
|
||
"plt.yscale('log')\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 55,
|
||
"id": "250de1a1",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 1000x600 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 每个类型的问题数量分布\n",
|
||
"answer_type_counts = data[\"answer_type\"].value_counts()\n",
|
||
"\n",
|
||
"# 绘制问题类型数量的分布图\n",
|
||
"plt.figure(figsize=(10, 6))\n",
|
||
"answer_type_counts.plot(kind='bar', color='skyblue', edgecolor='black')\n",
|
||
"plt.title('Distribution of Answer Types')\n",
|
||
"plt.xlabel('Answer Type')\n",
|
||
"plt.ylabel('Number of Questions')\n",
|
||
"plt.xticks(rotation=0)\n",
|
||
"plt.show()"
|
||
]
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"execution_count": 56,
|
||
"id": "6a1b1ce4",
|
||
"metadata": {},
|
||
"outputs": [
|
||
{
|
||
"data": {
|
||
"image/png": "",
|
||
"text/plain": [
|
||
"<Figure size 800x800 with 1 Axes>"
|
||
]
|
||
},
|
||
"metadata": {},
|
||
"output_type": "display_data"
|
||
}
|
||
],
|
||
"source": [
|
||
"# 答题正确率\n",
|
||
"correct_data = data.groupby(\"correct\")[\"problem_id\"].count()\n",
|
||
"correct_data.index = ['Incorrect', 'Correct']\n",
|
||
"\n",
|
||
"# 绘制答题正确率的饼图\n",
|
||
"plt.figure(figsize=(8, 8))\n",
|
||
"correct_data.plot(kind='pie', autopct='%1.1f%%', startangle=90, colors=['lightcoral', 'lightskyblue'])\n",
|
||
"plt.title('Overall Correctness Distribution')\n",
|
||
"plt.show()"
|
||
]
|
||
}
|
||
],
|
||
"metadata": {
|
||
"kernelspec": {
|
||
"display_name": "data-analysis",
|
||
"language": "python",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.13.9"
|
||
}
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
}
|