1-5) 파이썬 데이터 분석 기초 스터디

카테고리 없음

1-5) 파이썬 데이터 분석 기초 스터디

intergem 2025. 3. 2. 22:16

강의명 : 5분빨리 퇴근하자! 파이썬 데이터 분석, 시각화, 웹 대시보드 제작하기

플랫폼 : 인프런

6) Histogram

Histplot_matplotlib
- 도수분포표를 시각화한 그래프
- 수치형 변수의 분포를 표현
- bins 인자를 통해 표현할 막대의 수 조절 가능
```
df = sns.load_dataset('tips')

fig, ax = plt.subplots()
sns.histplot(df, x='total_bill', ax=ax)
```

Histplot_Seaborn

수치형 변수의 분포를 표현
bins 인자를 통해 표현할 막대의 수 조절 가능

#bins 변수로 그래프 개수 결정
df = sns.load_dataset('tips')

fig, ax = plt.subplots()
sns.histplot(df, x='total_bill', ax=ax, bins=30)

hue 인자를 통해 특정 변수의 그룹별로 막대를 나눠 표현 가능

#hue 인자 활용
fig, ax = plt.subplots()
sns.histplot(df, x='total_bill', ax=ax, hue='time')

stack 인자를 나누어진 막대를 누적하여 표현 가능

fig, ax = plt.subplots()
sns.histplot(df, x='total_bill', 
             ax=ax, hue='time', multiple='stack')

Plotly histogram

수치형 변수의 분포를 표현
nbins 인자를 통해 막대의 수 조절 가능
update_traces로 nbins 크기 변경 가능

fig = px.histogram(data_frame=df, x='total_bill', width=450, nbins=20)
fig.update_traces(xbins_size=10)
fig.show()

color 인자를 통해 특정 변수의 그룹별로 막대를 나눠 그리기 가능
barmode 인자를 통해 나누어진 막대를 겹쳐/누적하여 그리기 가능

#막대가 서로 겹쳐보이게 표시 
fig = px.histogram(
    data_frame=df, x='total_bill', width=450,
    color='time', **barmode='overlay'**
)
fig.show()

#누적 막대로 표시 
fig = px.histogram(
    data_frame = df, x='total_bill', width=450, color='time',
    **barmode='relative'**
)
fig.show()

7) Heatmap

Pivot table

2개 이상의 변수를 각각 index/column으로 지정하고, 다른 변수(의 통계량)을 각 value로 변환

age_bin_list = np.arange(10, 80, 10)
df['age_bin'] = pd.cut(df['age'], bins=age_bin_list)

pivot_df = df.**pivot_table(
    index='age_bin', columns='region',
    values='charges', aggfunc='median'
)**
pivot_df

각각의 값들에 대해 크기를 가늠할 수 있게끔 시각화(주로 색상)하는 방법
2D 형식으로 준비된 데이터를 seaborn heatmap으로 시각화
annot 인자를 통해 각 셀의 값 표현 가능

fig, ax = plt.subplots()
sns.heatmap(pivot_df, ax=ax, annot=True)

fmt 인자를 통해 각 셀에 표현되는 수치 값의 형식 변경 가능

fig, ax = plt.subplots()
sns.heatmap(pivot_df, ax=ax, annot=True, fmt='.1f')

vmax, vmin 인자를 통해 color bar의 범위 설정 가능
cmap 인자를 통해 여러 종류의 color map 사용 가능(공식 홈페이지 참고)

color = sns.light_palette('seagreen', as_cmp=True)

fig, ax = plt.subplots()
sns.heatmap(
    pivot_df, ax=ax, annot=True, fmt='.1f',
    vmax=16000, vmin=0, cmap='RdBu'
)

#sns.light_palette 라이브러리 적용하기
**color = sns.light_palette('seagreen', as_cmap=True)**

fig, ax = plt.subplots()
sns.heatmap(
    pivot_df, ax=ax, annot=True, fmt='.1f',
    vmax=16000, vmin=0, cmap=color
)

Plotly
- 각각의 값들에 대해 크기를 가늠할 수 있게끔 시각화(주로 색상)하는 방법
- 2D 형식으로 준비된 데이터를 Plotly imshow를 이용해 heatmap으로 표현
- Seaborn heatmap과 달리 x값과 y값을 따로 전달해줘야 함
- text_auto 인자에 각 셀에 표현되는 수치 값의 형식 변경 가능
```
fig = px.imshow(
    pivot_df, x=pivot_df.columns, y=pivot_df.index.astype('str'),
    text_auto='.1f', width=400, height=400
)
fig.show()
```
- color_continous_scale 인자를 통해 여러 종류의 color map 사용 가능
```
fig = px.imshow(
    pivot_df, x=pivot_df.columns, y=pivot_df.index.astype('str'),
    text_auto='.1f', width=400, height=400,
    **color_continuous_scale = 'RdBu'**
)

fig.show()
```

8) Figure-Level Plot과 Axes-Level Plot

figure-level plot
- figure 단위로 지정하여 그리는 그래프
- ax를 지정하여 그릴 수 없음
- 특정 column 기준으로 groupby 가능(colum, row 나누기 가능)
- Implot, replot 등
```
fig, ax = plt.subplots(2, 2, figsize=(12, 12))
```

axes-level plot

ax 단위로 지정하여 그리는 그래프
ax를 지정하여 그릴 수 있음
특정 column 기준으로 groupby 불가능
scatterplot, boxplot, heatmap 등

 df = pd.read_csv('./datasets/medical_cost/medical_cost.csv')

fig, ax = plt.subplots(2, 2, figsize=(12, 12)) 

sns.regplot(
x='bmi', y='charges', data=df.query('region == "southwest"'),
    ax=ax[0][0]
)
ax[0][0].set_title('region : southwest')

sns.regplot(
x='bmi', y='charges', data=df.query('region == "southeast"'), ax=ax[0][1]
)
ax[0][1].set_title('region : southeast')

sns.regplot(
x='bmi', y='charges', data=df.query('region == "northwest"'), ax=ax[1][0]
)
ax[1][0].set_title('region : northwest')

sns.regplot(
x='bmi', y='charges', data=df.query('region == "northeast"'), ax=ax[1][1]
)
ax[1][1].set_title('region : northeast')

figure-level plot을 특정 변수의 그룹별로 subplot으로 나눠 그리기
- row, col, col_wrap 인자 사용
  - col_wrap : x축에 몇 개의 그래프를 넣을 것인가
  - sharex : x축을 절대값으로 고정할 것인가
  - sharey : y축을 절대값으로 고정할 것인가
```
 sns.lmplot(
    x='bmi', y='charges', data=df,
    col='region', **col_wrap=2,**
    **sharex=False, sharey=False**
)
```
- lmplot에서는 regplot에서 사용불가하던 hue 인자 사용 가능
```
 sns.lmplot(
    x='bmi', y='charges', data=df,
    col='smoker', row='region', hue='sex',
    sharex=False, sharey=False
)
```
axes-level plot을 figure-level plot처럼 그리기
- facetgrid 이용하여 axes-level plot을 figure-level plot처럼 그리기 가능
- axes-level plot 특정 변수의 그룹별 column, row 나눠 그리기 가능
- FacetGrid(데이터, column으로 나눌 변수, row로 나눌 변수) + map_dataframe(axes-level plot, plot의 인자들) 형식으로 사용
```
g = sns.FacetGrid(
    data=df, col='region', col_wrap=2,
    sharex=False, sharey=False
)
g.map_dataframe(
    **sns.boxplot**, x='smoker', y='charges', hue='sex'
)
```

Plotly express 함수들의 facet 사용

Plotly express 대부분의 함수는 facet을 나눠 그릴 수 있게끔 인자 제공
facet_col, facet_row 인자 이용하여 Seaborn의 figure-level plot처럼 활용

fig = px.scatter(
    data_frame=df, x='bmi', y='charges',
    color='sex', **facet_row='region', facet_col='smoker',**
    width=700, height=1200, trendline='ols'
)
fig.show()

update_yaxes : y 값을 통일하지 않도록함
facet_col_spacing : 그래프 간 사이 간격 조절

fig = px.scatter(
    data_frame=df, x='bmi', y='charges',
    color='sex', facet_row='region', facet_col='smoker',
    width=700, height=1200, trendline='ols', **facet_col_spacing**=0.05
).**update_yaxes(matches=None, showticklabels=True)**
fig.show()

8) Figure-Level Plot과 Axes-Level Plot

figure-level plot
- figure 단위로 지정하여 그리는 그래프
- ax를 지정하여 그릴 수 없음
- 특정 column 기준으로 groupby 가능(colum, row 나누기 가능)
- Implot, replot 등
```
fig, ax = plt.subplots(2, 2, figsize=(12, 12))
```

axes-level plot

ax 단위로 지정하여 그리는 그래프
ax를 지정하여 그릴 수 있음
특정 column 기준으로 groupby 불가능
scatterplot, boxplot, heatmap 등

 df = pd.read_csv('./datasets/medical_cost/medical_cost.csv')

fig, ax = plt.subplots(2, 2, figsize=(12, 12)) 

sns.regplot(
x='bmi', y='charges', data=df.query('region == "southwest"'),
    ax=ax[0][0]
)
ax[0][0].set_title('region : southwest')

sns.regplot(
x='bmi', y='charges', data=df.query('region == "southeast"'), ax=ax[0][1]
)
ax[0][1].set_title('region : southeast')

sns.regplot(
x='bmi', y='charges', data=df.query('region == "northwest"'), ax=ax[1][0]
)
ax[1][0].set_title('region : northwest')

sns.regplot(
x='bmi', y='charges', data=df.query('region == "northeast"'), ax=ax[1][1]
)
ax[1][1].set_title('region : northeast')

figure-level plot을 특정 변수의 그룹별로 subplot으로 나눠 그리기
- row, col, col_wrap 인자 사용
  - col_wrap : x축에 몇 개의 그래프를 넣을 것인가
  - sharex : x축을 절대값으로 고정할 것인가
  - sharey : y축을 절대값으로 고정할 것인가
```
 sns.lmplot(
    x='bmi', y='charges', data=df,
    col='region', **col_wrap=2,**
    **sharex=False, sharey=False**
)
```
- lmplot에서는 regplot에서 사용불가하던 hue 인자 사용 가능
```
 sns.lmplot(
    x='bmi', y='charges', data=df,
    col='smoker', row='region', hue='sex',
    sharex=False, sharey=False
)
```
axes-level plot을 figure-level plot처럼 그리기
- facetgrid 이용하여 axes-level plot을 figure-level plot처럼 그리기 가능
- axes-level plot 특정 변수의 그룹별 column, row 나눠 그리기 가능
- FacetGrid(데이터, column으로 나눌 변수, row로 나눌 변수) + map_dataframe(axes-level plot, plot의 인자들) 형식으로 사용
```
g = sns.FacetGrid(
    data=df, col='region', col_wrap=2,
    sharex=False, sharey=False
)
g.map_dataframe(
    **sns.boxplot**, x='smoker', y='charges', hue='sex'
)
```

Plotly express 함수들의 facet 사용

Plotly express 대부분의 함수는 facet을 나눠 그릴 수 있게끔 인자 제공
facet_col, facet_row 인자 이용하여 Seaborn의 figure-level plot처럼 활용

fig = px.scatter(
    data_frame=df, x='bmi', y='charges',
    color='sex', **facet_row='region', facet_col='smoker',**
    width=700, height=1200, trendline='ols'
)
fig.show()

update_yaxes : y 값을 통일하지 않도록함
facet_col_spacing : 그래프 간 사이 간격 조절

fig = px.scatter(
    data_frame=df, x='bmi', y='charges',
    color='sex', facet_row='region', facet_col='smoker',
    width=700, height=1200, trendline='ols', **facet_col_spacing**=0.05
).**update_yaxes(matches=None, showticklabels=True)**
fig.show()

fig = px.box(
    data_frame=df, x='smoker', y='charges',
    facet_col='region', facet_col_wrap=2, color='sex',
    width=700, height=500
)
fig.show()

fig = px.box( data_frame=df, x='smoker', y='charges', facet_col='region', facet_col_wrap=2, color='sex', width=700, height=500 ) fig.show()

현재글1-5) 파이썬 데이터 분석 기초 스터디

intergem 님의 블로그

intergem 님의 블로그 입니다.

기획자, 프로덕트오너, PO, 서평, PMO, 관점, PM,

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

intergem 님의 블로그

1-5) 파이썬 데이터 분석 기초 스터디

6) Histogram

7) Heatmap

8) Figure-Level Plot과 Axes-Level Plot

8) Figure-Level Plot과 Axes-Level Plot

'카테고리 없음'의 다른글

티스토리툴바