Share on

데이터마이닝의 이론과 실제 기말고사 연습문제2

연습문제2

다음은 호흡과 뇌파와의 관계를 연구한 자료이다.

호흡을 3번(흡호=1:1, 7:3, 3:7)측정하였다.

성별에 따라 호흡(기간범주형변수)이 뇌파간 차이가 있는 가?

Ch1al에 대해서만 검증하세요.

이원 반복측정 분산분석(Two-way Repeated Measures ANOVA)

두개의 독립변수(범주형 자료, 시간(기간)개념), 연속형 종속변수

시간과 군의 교호작용이 통계적으로 유의한지!!

일원배치 분산분석 -> 3개이상 범주형 독립변수 , 1개의 연속형 종속변수

1.기본 package 설정

1.1 library 로드

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.8
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(tidymodels)

## -- Attaching packages -------------------------------------- tidymodels 0.2.0 --

## v broom        0.8.0     v rsample      0.1.1
## v dials        0.1.1     v tune         0.2.0
## v infer        1.0.0     v workflows    0.2.6
## v modeldata    0.1.1     v workflowsets 0.2.1
## v parsnip      0.2.1     v yardstick    0.0.9
## v recipes      0.2.0

## -- Conflicts ----------------------------------------- tidymodels_conflicts() --
## x scales::discard() masks purrr::discard()
## x dplyr::filter()   masks stats::filter()
## x recipes::fixed()  masks stringr::fixed()
## x dplyr::lag()      masks stats::lag()
## x yardstick::spec() masks readr::spec()
## x recipes::step()   masks stats::step()
## * Dig deeper into tidy modeling with R at https://www.tmwr.org

library(rstatix)

## 
## 다음의 패키지를 부착합니다: 'rstatix'

## The following objects are masked from 'package:infer':
## 
##     chisq_test, prop_test, t_test

## The following object is masked from 'package:dials':
## 
##     get_n

## The following object is masked from 'package:stats':
## 
##     filter

library(skimr)
library("dplyr")

2.데이터 불러오기

brain_tb <- read_csv('data2.csv', 
                     col_names = TRUE,
                     locale=locale('ko', encoding='euc-kr'), # 한글
                     na=".") %>%
  round(2) %>%                 # 소수점 2자리로 반올림
  mutate_if(is.character, as.factor) %>%
  mutate(sex = factor(sex,
                      levels=c(1:2),
                      labels=c("남자","여자"))) %>%
  mutate(breath = factor(breath,
                         levels=c(1:3),
                         labels=c("1:1","7:3","3:7")))

## Rows: 339 Columns: 6
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (6): id, breath, sex, ch1al, ch2al, ch3al
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

str(brain_tb)

## tibble [339 x 6] (S3: tbl_df/tbl/data.frame)
##  $ id    : num [1:339] 1 1 1 2 2 2 3 3 3 4 ...
##  $ breath: Factor w/ 3 levels "1:1","7:3","3:7": 1 2 3 1 2 3 1 2 3 1 ...
##  $ sex   : Factor w/ 2 levels "남자","여자": 2 2 2 2 2 2 2 2 2 2 ...
##  $ ch1al : num [1:339] 0.03 0.01 0.03 0.05 0.03 0.07 0.05 0.04 0.04 0.01 ...
##  $ ch2al : num [1:339] 0.03 0.01 0.02 0.04 0.03 0.06 0.02 0.03 0.04 0.01 ...
##  $ ch3al : num [1:339] 0.02 0.01 0.05 0.07 0.04 0.07 0.06 0.03 0.05 0.02 ...

brain_tb

## # A tibble: 339 x 6
##       id breath sex   ch1al ch2al ch3al
##    <dbl> <fct>  <fct> <dbl> <dbl> <dbl>
##  1     1 1:1    여자   0.03  0.03  0.02
##  2     1 7:3    여자   0.01  0.01  0.01
##  3     1 3:7    여자   0.03  0.02  0.05
##  4     2 1:1    여자   0.05  0.04  0.07
##  5     2 7:3    여자   0.03  0.03  0.04
##  6     2 3:7    여자   0.07  0.06  0.07
##  7     3 1:1    여자   0.05  0.02  0.06
##  8     3 7:3    여자   0.04  0.03  0.03
##  9     3 3:7    여자   0.04  0.04  0.05
## 10     4 1:1    여자   0.01  0.01  0.02
## # ... with 329 more rows

3.기술통계분석

상호작용효과에 관심있는 변수를 2번째 표시 (그룹)

skim(brain_tb)

Data summary
Name	brain_tb
Number of rows	339
Number of columns	6
_______________________
Column type frequency:
factor	2
numeric	4
________________________
Group variables	None

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
breath	0	1	FALSE	3	1:1: 113, 7:3: 113, 3:7: 113
sex	0	1	FALSE	2	여자: 249, 남자: 90

Variable type: numeric

skim_variable	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
id	1	68.18	41.05	1	32.00	66.00	104.00	139.00	▇▇▇▆▇
ch1al	1	0.04	0.02	0	0.03	0.04	0.05	0.08	▁▇▆▇▂
ch2al	1	0.04	0.02	0	0.03	0.04	0.05	0.10	▃▇▅▁▁
ch3al	1	0.04	0.02	0	0.03	0.04	0.05	0.11	▂▇▅▂▁

brain_tb %>% 
  group_by(sex, breath) %>%
  get_summary_stats(ch1al)

## # A tibble: 6 x 15
##   breath sex   variable     n   min   max median    q1    q3   iqr   mad  mean
##   <fct>  <fct> <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1:1    남자  ch1al       30  0     0.08   0.04  0.03 0.05  0.02  0.015 0.038
## 2 7:3    남자  ch1al       30  0.01  0.08   0.04  0.03 0.057 0.027 0.015 0.043
## 3 3:7    남자  ch1al       30  0.01  0.07   0.04  0.03 0.05  0.02  0.015 0.04 
## 4 1:1    여자  ch1al       83  0.01  0.08   0.04  0.03 0.05  0.02  0.015 0.041
## 5 7:3    여자  ch1al       83  0.01  0.08   0.04  0.03 0.05  0.02  0.015 0.044
## 6 3:7    여자  ch1al       83  0.02  0.08   0.04  0.03 0.05  0.02  0.015 0.043
## # ... with 3 more variables: sd <dbl>, se <dbl>, ci <dbl>

그래프 안나올때 사용

#dev.off()

4.그래프 그리기(박스그래프,히스토그램)

brain_tb %>% 
  ggplot(mapping = aes(x = sex,
                       y = ch1al,
                       color = sex)) +
  geom_boxplot() +
  facet_wrap(~ breath)

brain_tb %>% 
  ggplot(mapping = aes(x = ch1al)) +
  geom_histogram(bins = 10,               # binwidth=1,000: 값차이
                 color = "white", 
                 fill = "steelblue") +
  facet_wrap(~ sex*breath)                   # 그룹별 분리

summarise()` has grouped output 해결방안

options(dplyr.summarise.inform = FALSE)    # Change global options

상호작용 검정

breath_tb<- brain_tb %>% 
  group_by(breath, sex) %>% 
  summarise(ch1al = mean(ch1al)) %>% 
  ggplot(aes(x = breath, 
             y = ch1al, 
             color = sex)) +
  geom_line(aes(group = sex)) +
  geom_point()

plot(breath_tb)

5.이상치 제거

이상치 확인

brain_tb %>%
  group_by(sex, breath) %>%
  identify_outliers(ch1al)

## [1] breath     sex        id         ch1al      ch2al      ch3al      is.outlier
## [8] is.extreme
## <0 행> <또는 row.names의 길이가 0입니다>

6.정규분포 검정

brain_tb %>%
  group_by(sex, breath) %>%
  shapiro_test(ch1al)

## # A tibble: 6 x 5
##   breath sex   variable statistic        p
##   <fct>  <fct> <chr>        <dbl>    <dbl>
## 1 1:1    남자  ch1al        0.969 0.520   
## 2 7:3    남자  ch1al        0.948 0.148   
## 3 3:7    남자  ch1al        0.944 0.119   
## 4 1:1    여자  ch1al        0.959 0.00963 
## 5 7:3    여자  ch1al        0.946 0.00173 
## 6 3:7    여자  ch1al        0.934 0.000363

7. ANOVA 분석

지수 표기법 수정 : 2.2e-4=0.00022

options("scipen" = 20)

구형성 검정

(p<0.05) 이면 Greenhous-Geisser, Huynh-Feldt 등의 수정된 검정통계량을 사용하여 반복측정 분산분석 수행할 수 있음

get_anova_table(correction = “HF”)

“GG”: applies Greenhouse-Geisser correction to all within-subjects factors even if the assumption of sphericity is met (i.e., Mauchly’s test is not significant, p > 0.05).

“HF”: applies Hyunh-Feldt correction to all within-subjects factors even if the assumption of sphericity is met,

“none”: returns the ANOVA table without any correction and

“auto”: apply automatically GG correction to only within-subjects factors violating the sphericity assumption (i.e., Mauchly’s test p-value is significant, p <= 0.05).

brain_tb %>% 
  anova_test(dv = ch1al,
             wid = id,
             within = breath,
             between = sex) %>%
  get_anova_table(correction = "HF")

## ANOVA Table (type III tests)
## 
##       Effect  DFn    DFd     F     p p<.05      ges
## 1        sex 1.00 111.00 1.391 0.241       0.004000
## 2     breath 2.03 224.99 1.290 0.277       0.008000
## 3 sex:breath 2.03 224.99 0.054 0.948       0.000334

# HF -> 구형성 계산 
# -> breath 레벨이 3개 이상이므로 사용함

8.사후검정(Multicamparison test )

breath에 따른 상호작용 효과 분석

brain_tb %>% 
  group_by(breath) %>%
  pairwise_t_test(ch1al ~ sex,
                    p.adj="bonferroni")

## # A tibble: 3 x 10
##   breath .y.   group1 group2    n1    n2     p p.signif p.adj p.adj.signif
## * <fct>  <chr> <chr>  <chr>  <int> <int> <dbl> <chr>    <dbl> <chr>       
## 1 1:1    ch1al 남자   여자      30    83 0.532 ns       0.532 ns          
## 2 7:3    ch1al 남자   여자      30    83 0.676 ns       0.676 ns          
## 3 3:7    ch1al 남자   여자      30    83 0.345 ns       0.345 ns

brain_tb %>% 
  group_by(sex) %>%
  pairwise_t_test(ch1al ~ breath,
                    paired=TRUE,
                    p.adj="bonferroni")

## # A tibble: 6 x 11
##   sex   .y.   group1 group2    n1    n2 statistic    df     p p.adj p.adj.signif
## * <fct> <chr> <chr>  <chr>  <int> <int>     <dbl> <dbl> <dbl> <dbl> <chr>       
## 1 남자  ch1al 1:1    7:3       30    30    -0.934    29 0.358 1     ns          
## 2 남자  ch1al 1:1    3:7       30    30    -0.465    29 0.645 1     ns          
## 3 남자  ch1al 7:3    3:7       30    30     0.532    29 0.599 1     ns          
## 4 여자  ch1al 1:1    7:3       83    83    -1.37     82 0.175 0.525 ns          
## 5 여자  ch1al 1:1    3:7       83    83    -1.14     82 0.259 0.777 ns          
## 6 여자  ch1al 7:3    3:7       83    83     0.295    82 0.768 1     ns

부록: aov 이용

twa_result <- aov(ch1al ~ sex + breath + sex:breath, 
                    data=brain_tb)

tidy(twa_result)

## # A tibble: 4 x 6
##   term          df     sumsq    meansq statistic p.value
##   <chr>      <dbl>     <dbl>     <dbl>     <dbl>   <dbl>
## 1 sex            1 0.000334  0.000334     1.30     0.256
## 2 breath         2 0.000818  0.000409     1.59     0.205
## 3 sex:breath     2 0.0000286 0.0000143    0.0556   0.946
## 4 Residuals    333 0.0856    0.000257    NA       NA

Share on

Copy URL

Kakao Facebook Twitter LinkedIn

SeongJae Yoo

Share on

데이터마이닝의 이론과 실제 기말고사 연습문제2

SeongJae-Yoo

2022-06-08 16:37:22 +0900

연습문제2

다음은 호흡과 뇌파와의 관계를 연구한 자료이다.

호흡을 3번(흡호=1:1, 7:3, 3:7)측정하였다.

성별에 따라 호흡(기간범주형변수)이 뇌파간 차이가 있는 가?

Ch1al에 대해서만 검증하세요.

이원 반복측정 분산분석(Two-way Repeated Measures ANOVA)

두개의 독립변수(범주형 자료, 시간(기간)개념), 연속형 종속변수

시간과 군의 교호작용이 통계적으로 유의한지!!

일원배치 분산분석 -> 3개이상 범주형 독립변수 , 1개의 연속형 종속변수

1.기본 package 설정

1.1 library 로드

2.데이터 불러오기

3.기술통계분석

상호작용효과에 관심있는 변수를 2번째 표시 (그룹)

그래프 안나올때 사용

4.그래프 그리기(박스그래프,히스토그램)

summarise()` has grouped output 해결방안

상호작용 검정

5.이상치 제거

이상치 확인

6.정규분포 검정

7. ANOVA 분석

지수 표기법 수정 : 2.2e-4=0.00022

구형성 검정

(p<0.05) 이면 Greenhous-Geisser, Huynh-Feldt 등의 수정된 검정통계량을 사용하여 반복측정 분산분석 수행할 수 있음

get_anova_table(correction = “HF”)

“GG”: applies Greenhouse-Geisser correction to all within-subjects factors even if the assumption of sphericity is met (i.e., Mauchly’s test is not significant, p > 0.05).

“HF”: applies Hyunh-Feldt correction to all within-subjects factors even if the assumption of sphericity is met,

“none”: returns the ANOVA table without any correction and

“auto”: apply automatically GG correction to only within-subjects factors violating the sphericity assumption (i.e., Mauchly’s test p-value is significant, p <= 0.05).

8.사후검정(Multicamparison test )

breath에 따른 상호작용 효과 분석

부록: aov 이용

Share on

Leave a comment

You may also enjoy

자동주식투자프로그램 세팅방법

Data Mining Theory And Reality Midtestexample04

Data Mining Theory And Reality Midtestexample03

Data Mining Theory And Reality Midtestexample02