상관관계분석의 개념과 원리

개념

  • 두 연속형 변수 간의 상관 정도를 파악

  • 상관계수를 통해 표준화된 상관관계를 도출

  • 주로 회귀분석을 하기 전에 독립변수 , 종속변수 간의 상관이 있는지 탐색

1
a <- read.csv('cosmetics.csv',header = T)
1
a
gendermarriageedujobmincomeawarecountamountdecisionpropensityskinpromolocationsatisf_bsatisf_isatisf_alrepurchase
1 1 4 1 2 2 1 110002 1 1 1 2 5 2 2 2
2 1 4 9 2 1 4 300001 1 3 2 3 2 3 3 4
2 2 4 4 3 1 6 1000003 2 3 2 2 4 5 4 4
2 2 4 7 5 2 6 650003 2 5 2 3 3 4 4 4
1 2 6 6 5 2 2 500002 2 3 2 3 3 3 3 3
2 2 2 7 3 1 2 1000002 1 4 2 3 3 4 4 3
2 1 6 4 5 1 5 1000003 2 5 2 3 2 2 3 4
1 1 6 4 5 4 10 390003 2 2 1 2 4 4 4 4
2 2 4 5 2 2 2 400003 2 3 2 3 3 4 4 4
2 2 4 5 2 1 2 1000003 3 3 1 3 2 3 4 4
2 1 7 4 3 10 3 500001 3 1 2 3 3 3 4 4
1 1 2 5 3 2 1 300003 2 3 2 2 3 3 3 3
2 2 4 4 3 4 4 3200002 3 3 3 2 4 4 4 4
2 2 4 4 2 3 2 2000001 2 3 1 3 3 3 3 3
1 2 4 4 6 2 2 600003 2 1 2 5 3 3 3 4
2 1 4 5 2 2 3 500001 2 4 1 3 3 4 3 3
1 2 8 3 2 5 3 10000001 3 1 2 2 3 3 3 3
2 1 3 8 5 1 6 15000003 3 2 4 1 4 4 4 4
1 2 2 6 2 4 1 800002 3 1 2 3 3 3 4 4
1 1 4 4 3 8 3 300002 2 3 2 3 3 3 3 3
2 2 2 4 2 8 4 3500003 2 3 2 2 3 4 4 4
2 2 4 7 6 1 4 2500003 3 2 2 3 2 3 4 4
2 2 4 7 3 1 25 500002 2 1 2 3 3 4 4 4
2 2 2 9 1 1 1 200001 1 5 1 3 3 3 3 3
1 1 3 8 4 2 3 420001 2 3 1 3 3 3 4 4
1 1 4 8 4 2 3 420003 3 2 1 3 3 4 4 4
2 1 4 4 3 2 20 400003 1 5 2 3 2 4 4 4
2 2 4 4 6 1 6 700003 3 5 2 1 3 4 4 4
2 2 8 4 5 5 6 2000002 1 4 1 1 4 4 4 3
1 2 4 2 6 2 1 2000003 2 1 2 2 3 4 4 4
...................................................
2 2 4 5 2 1 10 300002 2 2 4 3 3 4 4 4
1 2 4 6 6 7 5 500003 2 4 2 2 3 4 4 4
1 2 4 1 4 1 1 100001 3 1 1 1 5 1 3 1
1 1 4 4 3 2 1 100001 1 4 3 2 2 3 3 3
1 1 4 4 3 2 3 500003 2 5 2 3 2 4 4 4
1 2 2 5 4 2 1 600001 1 1 4 5 3 3 3 3
1 1 6 1 3 2 2 500003 2 5 1 2 4 4 3 4
2 2 6 4 6 1 3 5000003 3 5 2 1 3 4 4 4
1 2 4 1 3 2 1 500003 2 1 2 3 2 3 3 3
2 2 6 3 4 1 1 1000003 2 1 2 1 3 3 3 3
2 2 4 7 4 1 2 500003 1 5 2 2 3 2 3 4
2 2 4 4 2 2 12 200002 2 3 2 3 3 3 3 3
1 1 4 4 4 2 4 350002 2 1 3 2 3 3 4 4
1 1 4 4 4 2 4 300002 2 1 2 3 3 4 4 3
2 2 4 7 1 2 2 500001 1 3 2 5 3 4 4 4
2 2 4 7 1 2 3 500001 2 3 2 5 3 3 3 3
2 1 4 4 2 2 7 800003 2 3 2 2 3 3 3 3
1 1 4 1 3 2 6 200003 1 3 1 3 2 3 4 4
1 1 4 10 2 2 2 250003 1 3 2 3 3 4 4 4
2 1 3 8 1 1 7 1000002 1 5 2 3 3 3 1 2
1 2 4 4 3 2 2 500001 2 5 2 2 3 4 3 3
1 2 4 4 5 2 1 800002 2 1 2 3 3 3 3 3
2 2 6 7 5 1 2 3000003 3 2 3 2 3 4 3 3
2 2 7 7 4 1 2 2000002 2 4 3 2 3 3 3 3
1 1 2 1 2 2 5 30001 1 2 1 1 1 1 1 1
1 1 4 2 3 2 6 40001 1 1 1 4 2 1 1 1
2 2 4 4 2 1 10 1500003 2 2 1 2 3 4 4 4
2 2 7 8 1 2 3 1000001 2 1 1 5 2 5 4 4
1 1 4 6 1 3 2 200003 1 1 1 3 4 3 3 2
2 2 6 10 1 1 10 10000003 2 3 1 3 2 3 3 3
1
2
install.packages('corrplot')
library(corrplot)
1
2
3
4
5
6
7
8
9
  There is a binary version available but the source version is later:
         binary source needs_compilation
corrplot   0.88   0.92             FALSE



installing the source package 'corrplot'

corrplot 0.92 loaded

Correlation, Variance and Covariance (Matrices)

Description

var, cov and cor compute the variance of x and the covariance or correlation of x and y if these are vectors. If x and y are matrices then the covariances (or correlations) between the columns of x and the columns of y are computed. cov2cor scales a covariance matrix into the corresponding correlation matrix efficiently.

  • Usage

var(x, y = NULL, na.rm = FALSE, use)

cov(x, y = NULL, use = “everything”, method = c(“pearson”, “kendall”, “spearman”))

cor(x, y = NULL, use = “everything”, method = c(“pearson”, “kendall”, “spearman”))

cov2cor(V)

  • Arguments

x : a numeric vector, matrix or data frame.

y : NULL (default) or a vector, matrix or data frame with compatible dimensions to x. The default is equivalent to y = x (but more efficient).

na.rm : logical. Should missing values be removed?

use : an optional character string giving a method for computing covariances in the presence of missing values. This must be (an abbreviation of) one of the strings “everything”, “all.obs”, “complete.obs”, “na.or.complete”, or “pairwise.complete.obs”.

method : a character string indicating which correlation coefficient (or covariance) is to be computed. One of “pearson” (default), “kendall”, or “spearman”: can be abbreviated.

V : symmetric numeric matrix, usually positive definite such as a covariance matrix.

1
cor(a, method='pearson')
gendermarriageedujobmincomeawarecountamountdecisionpropensityskinpromolocationsatisf_bsatisf_isatisf_alrepurchase
gender 1.00000000 0.018950432-0.015141892 0.234495300-0.24322115 -0.173258517 0.26720736 0.07060766 0.024599925 0.105123747 0.100927559 0.031070486-0.07987533 0.068425536 0.03371565 0.031582421 0.13717521
marriage 0.01895043 1.000000000 0.090430642-0.097376313 0.34672053 0.002746706-0.03388587 0.11179503 0.065853663 0.161741172 0.000243840 0.056206150-0.06633169 0.037174065 0.08363306 0.104428021 0.16562358
edu-0.01514189 0.090430642 1.000000000-0.152514543 0.29125234 -0.053224463 0.02317484 0.10287980 0.008078891 0.144735867-0.048833264 0.016601180-0.16879183 -0.017346951 0.09259939 0.020172944 0.03726943
job 0.23449530 -0.097376313-0.152514543 1.000000000-0.29724975 -0.037035494 0.06774745 -0.04452342 0.015581175-0.148122023 0.036206215 0.033042235 0.21765244 -0.007970738 0.07370249 -0.054013630 0.04257733
mincome-0.24322115 0.346720533 0.291252342-0.297249748 1.00000000 0.033181017-0.03751775 0.12555069 0.093481119 0.291048862 0.002721420 0.041308095-0.27159896 0.041790891 0.11493493 0.121591093 0.11494068
aware-0.17325852 0.002746706-0.053224463-0.037035494 0.03318102 1.000000000-0.14045380 0.02599560 0.083566385 0.002056555-0.057377153 0.004190501-0.01139770 0.097678118-0.02809058 0.016987040-0.09646385
count 0.26720736 -0.033885869 0.023174842 0.067747449-0.03751775 -0.140453801 1.00000000 -0.06605694 -0.034378011 0.010766170 0.039127374 0.010843867 0.01737366 -0.023712383 0.17298313 0.121654091 0.19176923
amount 0.07060766 0.111795032 0.102879797-0.044523425 0.12555069 0.025995598-0.06605694 1.00000000 -0.092237915 0.248702226 0.039647452 0.167832282-0.21865952 0.151351288 0.05486640 0.063516185 0.05797403
decision 0.02459993 0.065853663 0.008078891 0.015581175 0.09348112 0.083566385-0.03437801 -0.09223791 1.000000000 0.104598639 0.103755420 0.022390779-0.10788193 0.003375614 0.13588654 0.189271220 0.21929154
propensity 0.10512375 0.161741172 0.144735867-0.148122023 0.29104886 0.002056555 0.01076617 0.24870223 0.104598639 1.000000000-0.098094475 0.197142362-0.27947384 0.323968388 0.21183616 0.180745009 0.22548460
skin 0.10092756 0.000243840-0.048833264 0.036206215 0.00272142 -0.057377153 0.03912737 0.03964745 0.103755420-0.098094475 1.000000000 0.003177493 0.02155061 -0.127471531 0.06872337 0.011722962 0.02536274
promo 0.03107049 0.056206150 0.016601180 0.033042235 0.04130810 0.004190501 0.01084387 0.16783228 0.022390779 0.197142362 0.003177493 1.000000000-0.03164168 0.072483016 0.13940641 -0.005563814 0.10102533
location-0.07987533 -0.066331688-0.168791830 0.217652440-0.27159896 -0.011397701 0.01737366 -0.21865952 -0.107881932-0.279473839 0.021550612-0.031641677 1.00000000 -0.254221863-0.08922635 -0.095974739 0.05341473
satisf_b 0.06842554 0.037174065-0.017346951-0.007970738 0.04179089 0.097678118-0.02371238 0.15135129 0.003375614 0.323968388-0.127471531 0.072483016-0.25422186 1.000000000 0.01837903 -0.031382338-0.02892399
satisf_i 0.03371565 0.083633059 0.092599391 0.073702488 0.11493493 -0.028090585 0.17298313 0.05486640 0.135886536 0.211836156 0.068723366 0.139406409-0.08922635 0.018379033 1.00000000 0.584506125 0.51077138
satisf_al 0.03158242 0.104428021 0.020172944-0.054013630 0.12159109 0.016987040 0.12165409 0.06351618 0.189271220 0.180745009 0.011722962-0.005563814-0.09597474 -0.031382338 0.58450612 1.000000000 0.56502825
repurchase 0.13717521 0.165623585 0.037269434 0.042577334 0.11494068 -0.096463853 0.19176923 0.05797403 0.219291539 0.225484605 0.025362744 0.101025333 0.05341473 -0.028923989 0.51077138 0.565028245 1.00000000
1
attach(a)
1
2
3
4
5
The following objects are masked from a (pos = 3):

    amount, aware, count, decision, edu, gender, job, location,
    marriage, mincome, promo, propensity, repurchase, satisf_al,
    satisf_b, satisf_i, skin
1
library(base)
1
cor <- cbind(decision,satisf_b, satisf_i, satisf_al, repurchase)
1
cor
decisionsatisf_bsatisf_isatisf_alrepurchase
25222
12334
34544
33444
23333
23443
32234
34444
33444
32344
13344
33333
24444
13333
33334
13433
13333
34444
23344
23333
33444
32344
23444
13333
13344
33444
32444
33444
24443
33444
...............
23444
33444
15131
12333
32444
13333
34434
33444
32333
33333
33234
23333
23344
23443
13444
13333
33333
32344
33444
23312
13433
23333
33433
23333
11111
12111
33444
12544
34332
32333
1
cor(cor, method='pearson')
decisionsatisf_bsatisf_isatisf_alrepurchase
decision1.000000000 0.0033756140.13588654 0.18927122 0.21929154
satisf_b0.003375614 1.0000000000.01837903 -0.03138234 -0.02892399
satisf_i0.135886536 0.0183790331.00000000 0.58450612 0.51077138
satisf_al0.189271220 -0.0313823380.58450612 1.00000000 0.56502825
repurchase0.219291539 -0.0289239890.51077138 0.56502825 1.00000000
1
pairs(cor)

output_11_0

1
library(ggplot2)
1
ggplot(as.data.frame(cor),aes(x=satisf_al, y= repurchase)) + geom_point()

output_13_0

1
ggplot(as.data.frame(cor),aes(x=satisf_al, y= repurchase, shape=factor(decision))) + geom_point(color= 'red',fill='blue', alpha=0.5, size=6,stroke=2)

output_14_0

1
ggplot(as.data.frame(cor),aes(x=satisf_al, y= repurchase)) + geom_smooth(method=lm)
1
`geom_smooth()` using formula 'y ~ x'

output_15_1

  • 회색부분이 신뢰구간을 나타내 준다,

Meta Info

Categories:

Published At:

Modified At:

Leave a comment