diff --git a/Notebooks/Lecture1.ipynb b/Notebooks/Lecture1.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..cdff2cf722793dcc733cc63f615f4b656780884f
--- /dev/null
+++ b/Notebooks/Lecture1.ipynb
@@ -0,0 +1,1887 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "
\t\n",
+ "\t\n",
+ "# part 1\n",
+ "\t\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "3c152b43",
+ "metadata": {
+ "collapsed": false,
+ "id": "44427C9739024026A4C897D8E6AD79B8",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "\t\n",
+ "\t\n",
+ "# 高级心理统计[Advanced Statistics in Psychological Science]\n",
+ "## 《贝叶斯统计及其在Python中的实现》 [Bayesian inference in Python]\n",
+ "## Instructor: 胡传鹏(博士)[Dr. Hu Chuan-Peng]\n",
+ "### 南京师范大学心理学院[School of Psychology, Nanjing Normal University]\n",
+ "\t\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "d2b325df",
+ "metadata": {
+ "collapsed": false,
+ "id": "3838D48714B2463C83EF78BDC23758C7",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "scrolled": true,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "\n",
+ "\t\n",
+ "研究人类心理与行为的规律,容易吗?\n",
+ "\t\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ef4b6d7c",
+ "metadata": {
+ "collapsed": false,
+ "id": "722B560AB14C42FEAFA907AC6DC6D292",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "scrolled": true,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "## Outlines\n",
+ "* 1. 为什么要学习本课程 [Why Bayesia inference]\n",
+ "* 2. 本课程的内容将是什么 [What is the syllabus]\n",
+ "* 3. 如何学好这门课[How can I learn this course well]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "51c6ff3f-2c30-4f57-b1e4-7f8c30caca4d",
+ "metadata": {
+ "id": "9CD453483FDF4EAC8BD652AF2DE6812D",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "## 1. 为什么要学习本课程 [Why Bayesia inference]\n",
+ "\n",
+ "### 1.1 为什么心理学需要更好的方法【Why does psychological science need better methods?]\n",
+ "\n",
+ "#### 原因1: 复杂的研究问题"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "4fa75bb6-a4cf-49c8-a932-87abca98af37",
+ "metadata": {
+ "collapsed": true,
+ "id": "0D3A4878173746D7ADD25C7A3A15B2A9",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "![Image Name](https://cdn.kesci.com/upload/image/rhdcyu860w.gif?imageView2/0/w/960/h/960)\n",
+ "\n",
+ "\n",
+ "Source: https://www.science.org/toc/science/309/5731"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e5e145de",
+ "metadata": {
+ "collapsed": false,
+ "id": "675C0668AAD748039C048C541A141B6E",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "scrolled": true,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "Q1: What is the Uiverse Made of [physics ]\n",
+ "\n",
+ "Q2: What is the Biological Basis of Consciouness [psychological science]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1c9f54a6-6401-4a02-8195-c715fd3d44db",
+ "metadata": {
+ "collapsed": false,
+ "id": "38E2EDC2C3A54F15BBBF28A25F9CDD68",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "scrolled": true,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "##### 问题\n",
+ "同样重要和复杂的问题,是否意味着类似复杂和高级的方法?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ee3d9074-24f5-45b2-90b4-68f5cf5031b9",
+ "metadata": {
+ "id": "7D43885A2D07479C96006E7E12D9EEF7",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "##### 物理学中的方法 [Methods in Physics]:\n",
+ "\n",
+ "Example 1: Webb telescope (韦伯望远镜) [**equipment**]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "286968e3",
+ "metadata": {
+ "collapsed": true,
+ "id": "CC7CD6845ED3486EBD8D258473C50753",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "\n",
+ "![Image Name](https://cdn.kesci.com/upload/image/rhdd0r46k3.png?imageView2/0/w/720/h/640)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "aa0b7267",
+ "metadata": {
+ "collapsed": false,
+ "id": "4291074DF34847E29D0CE63EF1680A94",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "scrolled": true,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "Example 2: Big-team science (CERN, the European Organization for Nuclear Research) [**equipment & practices**]\n",
+ "\n",
+ "Example 3: **Mathematics**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e5aaf896-93ba-4eec-aa33-8f20d01c7501",
+ "metadata": {
+ "id": "58FA36357F4343D38B6AF90BDD4AE2F5",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "##### 其他研究人类智能的领域所采用的方法 [Methods in other fields that also study \"intelligence\"]\n",
+ "\n",
+ "**AI**\n",
+ "\n",
+ "\n",
+ "![Image Name](https://cdn.kesci.com/upload/image/rhdd1sr5y2.png?imageView2/0/w/640/h/640)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "25c0a697-6d3c-4ba1-87a1-4e751ac78a49",
+ "metadata": {
+ "collapsed": false,
+ "id": "8CC769416F0048BCA9800D597430F7A1",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "scrolled": true,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "##### 心理科学的研究方法 [What do psychological scientists have?]\n",
+ "你们能够想到的研究方法包括哪些?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "22c0a59f-75cd-438d-b6d6-e57387dc272b",
+ "metadata": {
+ "collapsed": true,
+ "id": "8ABBBA8C18E64A60BBD994839DBBB954",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "\n",
+ "![Image Name](https://cdn.kesci.com/upload/image/rhdd2dgwc8.png?imageView2/0/w/640/h/640)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "c705f696",
+ "metadata": {
+ "id": "3EB4A250D8EA460C91E69DDE129F456C",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "**实证研究:**\n",
+ "* 质性研究\n",
+ "* 观察法\n",
+ "* 问卷\n",
+ "* 行为实验\n",
+ "* 眼动、生理数据记录\n",
+ "* EEG/ERP/MEG\n",
+ "* fMRI/PET/fNIRs\n",
+ "* TMS/tDCS\n",
+ "* ..."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ef08f1c3",
+ "metadata": {
+ "id": "1099CD2609B14FC98B9199570CB1EC90",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "**统计方法:**\n",
+ "* t-test\n",
+ "* ANOVA\n",
+ "* Correlation\n",
+ "* Structural equation model (SEM)\n",
+ "* ?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b2deedd2",
+ "metadata": {
+ "id": "B9CD0257490C4D7EAD946F808396794B",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "##### 相关方法课程:\n",
+ "* 心理测量\n",
+ "* 心理统计(包括SPSS等)\n",
+ "* 实验心理学(包括Eprime等)\n",
+ "* ?"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "f64c2e5f",
+ "metadata": {
+ "id": "3EFA113998AB4146A01BCFEB310DE280",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "* 更好的仪器\n",
+ "* **更好的统计/数据分析**\n",
+ "* 更好的实践 (e.g., big-team science)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "52bbc6a2-f165-4c3e-9792-ca81cc8ca423",
+ "metadata": {
+ "id": "7E33B6BBE590490F912D1DC3F2A9EBDB",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "#### 原因2: 更复杂的数据\n",
+ "\n",
+ "* 数据字化的时代,大数据\n",
+ "* 神经成像/生理数据\n",
+ "* 多模态的数据融合"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e0c9551d",
+ "metadata": {
+ "id": "2278FC4931A74B4CA1F993CB92E42D6F",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "### 1.2 确实有更好的统计方法"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "6e127fb1",
+ "metadata": {
+ "id": "584D67832525481E99EAAAF215C39F42",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "贝叶斯统计 (Bayesian inference)\n",
+ "\n",
+ "\n",
+ "![Image Name](https://cdn.kesci.com/upload/image/rhdf3bb12c.png?imageView2/0/w/640/h/640)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "1ed0f312",
+ "metadata": {
+ "id": "963D62D14B5E484B8D354DDDB717CD69",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "* 灵活/强大/能用\n",
+ "* 易用\n",
+ "* 可拓展性强\n",
+ "* 方便交流\n",
+ "* ..."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "ec099ac3",
+ "metadata": {
+ "id": "A47CEFD8018F4137A9EEFB9FA524E8F6",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "##### 灵活/强大/通用\n",
+ "\n",
+ "不需要解析解\n",
+ "\n",
+ "贝叶斯分析在多个学科中得到广泛应用,尤其是AI"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "00bad60c-8aca-44a7-afa6-4ef626c5d4a6",
+ "metadata": {
+ "id": "99BAB33F937C4C3AA1248EDD7C2AA43E",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "##### (相对)易用\n",
+ "\n",
+ "概率编程语言(Probabilistic Programming Languages)的发展和普及\n",
+ "\n",
+ "\n",
+ "PPLs: *computational languages for statistical modeling*\n",
+ "\n",
+ "* PyMC\n",
+ "* Stan\n",
+ "* NumPyro\n",
+ "* Pyro\n",
+ "* BUGS\n",
+ "* ..."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "e5a52f7f-6654-4158-b4f4-1a07f5051035",
+ "metadata": {
+ "collapsed": true,
+ "id": "7224C81362ED40E88884DB9A2541A17D",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "大部分情况下,开发者使用它可以轻松地定义概率模型,然后程序会自动地求解模型。\n",
+ "\n",
+ "\n",
+ "![Image Name](https://cdn.kesci.com/upload/image/rhdf4r9fbh.png?imageView2/0/w/640/h/640)\n",
+ "\n",
+ "\n",
+ "Source: https://towardsdatascience.com/intro-to-probabilistic-programming-b47c4e926ec5\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "bfac15b7-da82-4018-b5ee-8119494f4dd2",
+ "metadata": {
+ "id": "7EF6CC6C622D4508B6030A2CFFE11B92",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "##### 可拓展\n",
+ "\n",
+ "贝叶斯概念已经应用到以深度学习为中心的新技术的发展,包括深度学习框架(TensorFlow, Pytorch),创建表示能力更强、数据驱动的模型"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "id": "b906104e",
+ "metadata": {
+ "id": "F96BC6BE697740AFAAE1152721DB15E6",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "##### 方便交流\n",
+ "大部分PPLs都有类似的数据结构,但是不同的学科使用的语言不同。\n",
+ "\n",
+ "心理学/社会科学/神经科学:\n",
+ "* **PyMC3**\n",
+ "* Stan\n",
+ "* BUGS"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\t\n",
+ "\t\n",
+ "# part 2\n",
+ "\t\n",
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false,
+ "id": "B1CD9D9075E8417B8A12276281A87272",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "### 例1:社会关系地位与幸福感的关系\n",
+ "\n",
+ "实例的数据来自[ Many Labs 2 项目](osf.io/uazdm/)中的一个研究。\n",
+ "\n",
+ "该研究探究了社会关系地位对于幸福感的影响 “Sociometric status and well-being”, (Anderson, Kraus, Galinsky, & Keltner, 2012)。\n",
+ "\n",
+ "该数据集包括6905个被试的数据。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "84B17C132F804024858A286483EAFD88",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "WARNING (theano.link.c.cmodule): install mkl with `conda install mkl-service`: No module named 'mkl'\n"
+ ]
+ }
+ ],
+ "source": [
+ "# import modules\n",
+ "import arviz as az\n",
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import pymc3 as pm\n",
+ "import xarray as xr\n",
+ "\n",
+ "%config InlineBackend.figure_format = 'retina'\n",
+ "az.style.use(\"arviz-darkgrid\")\n",
+ "rng = np.random.default_rng(1234)\n",
+ "\n",
+ "import matplotlib\n",
+ "matplotlib.rcParams['figure.figsize'] = [4, 3]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "75694A4449E849718D46562560D4A45A",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [],
+ "source": [
+ "# 导入数据\n",
+ "SMS_data = pd.read_csv('/home/mw/input/Bayesian3285/data_chp1_SMS_Well_being.csv')[['uID','variable','factor','Country']]\n",
+ "\n",
+ "# 把数据分为高低两种社会关系的地位的子数据以便画图与后续分析\n",
+ "plot_data = [\n",
+ " sorted(SMS_data.query('factor==\"Low\"').variable[0:3000]),\n",
+ " sorted(SMS_data.query('factor==\"High\"').variable[0:3000])]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false,
+ "id": "62D3505AFED84C6491DD5CD6DEEBE5CD",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": true,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "#### 通过画图对于两种社会关系地位对幸福感的影响\n",
+ "\n",
+ "图中横坐标代表高低两种社会关系地位,纵坐标代表了主观幸福感评分。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "E3425865C50D4CBB81E3B30E2895C3C4",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [],
+ "source": [
+ "# import matplotlib\n",
+ "# a = sorted([f.name for f in matplotlib.font_manager.fontManager.ttflist])\n",
+ "\n",
+ "# for i in a:\n",
+ "# print(i)\n",
+ "\n",
+ "# 字体样式\n",
+ "font = {'family' : 'Source Han Sans CN'}\n",
+ "# 具体使用\n",
+ "plt.rc('font',**font)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "59C5972D76C34D0CB9137A7B46DD0154",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:45: UserWarning: This figure was using constrained_layout, but that is incompatible with subplots_adjust and/or tight_layout; disabling constrained_layout.\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# 画图对比两种社会地位对幸福感的影响\n",
+ "def adjacent_values(vals, q1, q3):\n",
+ " upper_adjacent_value = q3 + (q3 - q1) * 1.5\n",
+ " upper_adjacent_value = np.clip(upper_adjacent_value, q3, vals[-1])\n",
+ "\n",
+ " lower_adjacent_value = q1 - (q3 - q1) * 1.5\n",
+ " lower_adjacent_value = np.clip(lower_adjacent_value, vals[0], q1)\n",
+ " return lower_adjacent_value, upper_adjacent_value\n",
+ "\n",
+ "def set_axis_style(ax, labels):\n",
+ " ax.xaxis.set_tick_params(direction='out')\n",
+ " ax.xaxis.set_ticks_position('bottom')\n",
+ " ax.set_xticks(np.arange(1, len(labels) + 1), labels=labels)\n",
+ " ax.set_xlim(0.25, len(labels) + 0.75)\n",
+ " ax.set_xlabel('社会关系地位')\n",
+ "\n",
+ "fig, ax1 = plt.subplots(nrows=1, ncols=1, figsize=(9, 4), sharey=True)\n",
+ "\n",
+ "parts = ax1.violinplot(\n",
+ " plot_data, showmeans=False, showmedians=False,\n",
+ " showextrema=False)\n",
+ "\n",
+ "for pc in parts['bodies']:\n",
+ " pc.set_facecolor('#D43F3A')\n",
+ " pc.set_edgecolor('black')\n",
+ " pc.set_alpha(1)\n",
+ "\n",
+ "quartile1, medians, quartile3 = np.percentile(plot_data, [25, 50, 75], axis=1)\n",
+ "whiskers = np.array([\n",
+ " adjacent_values(sorted_array, q1, q3)\n",
+ " for sorted_array, q1, q3 in zip(plot_data, quartile1, quartile3)])\n",
+ "whiskers_min, whiskers_max = whiskers[:, 0], whiskers[:, 1]\n",
+ "\n",
+ "inds = np.arange(1, len(medians) + 1)\n",
+ "ax1.scatter(inds, medians, marker='o', color='white', s=30, zorder=3)\n",
+ "ax1.vlines(inds, quartile1, quartile3, color='k', linestyle='-', lw=5)\n",
+ "ax1.vlines(inds, whiskers_min, whiskers_max, color='k', linestyle='-', lw=1)\n",
+ "\n",
+ "# set style for the axes\n",
+ "labels = ['低','高']\n",
+ "plt.xticks(np.arange(2)+1, labels)\n",
+ "plt.xlabel('社会关系地位')\n",
+ "plt.ylabel('幸福感')\n",
+ "\n",
+ "plt.subplots_adjust(bottom=0.15, wspace=0.05)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false,
+ "id": "D161E5B1ECE1430D9CCABC74C9FEF64C",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": true,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "#### 通过t检验,分析两种社会关系地位下幸福感的差异\n",
+ "\n",
+ "结果发现,两种社会关系水平下被试的主观幸福感边缘显著,*t*(6903) = -1.76, p = .08。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "2174B974752E40D4AB61ACB340D8B1C0",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "低社会关系:0.014 ± 0.66; 高社会关系:-0.014 ± 0.67\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "Ttest_indResult(statistic=1.7593310889762195, pvalue=0.07856558333862036)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from scipy import stats\n",
+ "SMS_low = SMS_data.query('factor==\"Low\"').variable.values\n",
+ "SMS_high = SMS_data.query('factor==\"High\"').variable.values\n",
+ "print(\n",
+ " f\"低社会关系:{np.around(np.mean(SMS_low),3)} ± {np.around(np.std(SMS_low),2)};\",\n",
+ " f\"高社会关系:{np.around(np.mean(SMS_high),3)} ± {np.around(np.std(SMS_high),2)}\")\n",
+ " \n",
+ "stats.ttest_ind(\n",
+ " a= SMS_low,\n",
+ " b= SMS_high, \n",
+ " equal_var=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false,
+ "id": "D6D5A5DD5EE14EAD8785CF9AD1A07EC6",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "#### 通过贝叶斯推断替代*t*检验\n",
+ "\n",
+ "零假设显著性检验(Null hypothesis significance test, NHST)的框架之下,*t*检验只提供了一个二分的结果:拒绝或者无法拒绝$H_0$。 但 *p* = 0.078这样的结果无法支持$H_0$\n",
+ "\n",
+ "贝叶斯推断是否可以带来不一样的结果?\n",
+ "\n",
+ "一个简单的线性模型:\n",
+ "\n",
+ "1. 通过建立线性模型去替代原本的*t*检验模型。\n",
+ "\n",
+ "2. 通过PyMC对后验进行采样\n",
+ "\n",
+ "3. 通过Arviz对结果进行展示,辅助统计推断"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "3461187F468F46DBB18F8C3F4C3EE76C",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [],
+ "source": [
+ "# 通过pymc建立基于贝叶斯的线性模型\n",
+ "x = pd.factorize(SMS_data.factor)[0] # high为0,low为1\n",
+ "\n",
+ "with pm.Model() as linear_regression:\n",
+ " sigma = pm.HalfCauchy(\"sigma\", beta=2)\n",
+ " β0 = pm.Normal(\"β0\", 0, sigma=5)\n",
+ " β1 = pm.Normal(\"β1\", 0, sigma=5)\n",
+ " x = pm.Data(\"x\", x)\n",
+ " # μ = pm.Deterministic(\"μ\", β0 + β1 * x)\n",
+ " pm.Normal(\"y\", mu=β0 + β1 * x, sigma=sigma, observed=SMS_data.variable)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false,
+ "id": "105F06D55D5542ABB329490BCD2F22D7",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "可以通过pymc自带的可视化工具将模型关系可视化。\n",
+ "\n",
+ "x 为自变量,其中1为低社会关系,0为高社会关系。\n",
+ "\n",
+ "参数 $\\beta0$ 是线性模型的截距,而 $\\beta1$ 是斜率。\n",
+ "\n",
+ "截距代表了高社会关系地位被试的幸福感;而截距加上斜率表示低社会关系地位被试的幸福感。\n",
+ "\n",
+ "参数$sigma$是残差,因变量$y$即主观幸福感。\n",
+ "\n",
+ "模型图展示了各参数通过怎样的关系影响到因变量。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "BECEC5A3264F49F59EF4D89F47F57785",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "pm.model_to_graphviz(linear_regression)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "27050DEF12FD4AAEB72B694ECA3DA602",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Auto-assigning NUTS sampler...\n",
+ "Initializing NUTS using jitter+adapt_diag...\n",
+ "Multiprocess sampling (4 chains in 4 jobs)\n",
+ "NUTS: [β1, β0, sigma]\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "Sampling 4 chains for 1_000 tune and 2_000 draw iterations (4_000 + 8_000 draws total) took 6 seconds.\n"
+ ]
+ }
+ ],
+ "source": [
+ "# 模型拟合过程 (mcmc采样过程)\n",
+ "with linear_regression:\n",
+ " idata = pm.sample(2000, tune=1000, target_accept=0.9, return_inferencedata=True)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false,
+ "id": "4E09582EC9134F1C95875C19D60B3494",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "#### 参数的后验分布\n",
+ "这里的模型分析结果展示了各参数的分布(后验)情况"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "E54DEC474F6049F89408157C78B15D7D",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "az.plot_trace(idata);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false,
+ "id": "9C2986E309324D9BAE4DE77FDA3B7090",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "下图反应了参数β1的可信度,即两个社会关系下幸福感差异的可信度。\n",
+ "\n",
+ "结果显示,两个社会关系下幸福感差异的可信度为96%。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "2E8C5D580A384AB5A8F2F85A8F765460",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array(0.960125)"
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "(idata.posterior.β1 > 0).mean().values"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "F500A43067C4488F94A672EB6A77BE57",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "az.plot_posterior(idata, var_names=['β1'], kind='hist',ref_val=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "3E302511A7FB4C0E8BFBD80013CBCA5E",
+ "jupyter": {},
+ "notebookId": "630c7d9f30feb16a92822876",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "\n",
+ "az.plot_posterior(idata, var_names=['β1'], kind='hist', rope = [-0.1, 0.1], hdi_prob=.95)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false,
+ "id": "B533467B3553473BB862E806B5830837",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "#### 模型诊断\n",
+ "\n",
+ "通过模型思维进行数据分析需要注意模型检验,即检验模型是否能有效的反应数据的特征。"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false,
+ "id": "303C9F7A2BD142B69A945635458C4401",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "下表格为模型参数的基本信息:\n",
+ "\n",
+ "mean和sd 为各参数的均值和标准差;\n",
+ "hdi 3%-97% 为参数分布的可信区间;\n",
+ "msce mean和sd 为mcmc采样标准误统计量的均值和标准差;\n",
+ "ess bulk和tail 反应了mcmc采样有效样本数量相关性能;\n",
+ "r hat 为参数收敛性的指标。"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "CA789587B08D4E828FD0598C0B45A171",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " mean | \n",
+ " sd | \n",
+ " hdi_3% | \n",
+ " hdi_97% | \n",
+ " mcse_mean | \n",
+ " mcse_sd | \n",
+ " ess_bulk | \n",
+ " ess_tail | \n",
+ " r_hat | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " β0 | \n",
+ " -0.014 | \n",
+ " 0.011 | \n",
+ " -0.034 | \n",
+ " 0.008 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 4233.0 | \n",
+ " 4505.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " β1 | \n",
+ " 0.028 | \n",
+ " 0.016 | \n",
+ " -0.002 | \n",
+ " 0.058 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 4381.0 | \n",
+ " 4873.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " sigma | \n",
+ " 0.661 | \n",
+ " 0.006 | \n",
+ " 0.650 | \n",
+ " 0.671 | \n",
+ " 0.0 | \n",
+ " 0.0 | \n",
+ " 4904.0 | \n",
+ " 4846.0 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " mean sd hdi_3% hdi_97% mcse_mean mcse_sd ess_bulk ess_tail \\\n",
+ "β0 -0.014 0.011 -0.034 0.008 0.0 0.0 4233.0 4505.0 \n",
+ "β1 0.028 0.016 -0.002 0.058 0.0 0.0 4381.0 4873.0 \n",
+ "sigma 0.661 0.006 0.650 0.671 0.0 0.0 4904.0 4846.0 \n",
+ "\n",
+ " r_hat \n",
+ "β0 1.0 \n",
+ "β1 1.0 \n",
+ "sigma 1.0 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "az.summary(idata)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "collapsed": false,
+ "id": "A4F6079034BB4F4B8F8F8D27C8E8384F",
+ "jupyter": {},
+ "mdEditEnable": false,
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "source": [
+ "### 后验预测检验 ppc (posterior predictive check)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "185D0307378F4B5B8CCE748A6655D9A7",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ },
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ " \n",
+ "
\n",
+ " 100.00% [8000/8000 00:21<00:00]\n",
+ "
\n",
+ " "
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "with linear_regression:\n",
+ " pm.set_data({\"x\": np.array([0,1])})\n",
+ " ppc_y = pm.sample_posterior_predictive(idata, var_names=[\"y\"],keep_size=True)[\"y\"] # keep size不是data的size 而是mcmc的size"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "collapsed": false,
+ "id": "D27A38FFC2E14DE7AE59DED2B16E366B",
+ "jupyter": {},
+ "notebookId": "630abaa16bfce48b61ae22ae",
+ "scrolled": false,
+ "slideshow": {
+ "slide_type": "slide"
+ },
+ "tags": [],
+ "trusted": true
+ },
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:30: UserWarning: This figure was using constrained_layout, but that is incompatible with subplots_adjust and/or tight_layout; disabling constrained_layout.\n"
+ ]
+ },
+ {
+ "data": {
+ "text/html": [
+ ""
+ ],
+ "text/plain": [
+ "