In [256]:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import csv

In [257]:

df = pd.read_csv('work.csv', encoding = 'cp949')
df

Out[257]:

	국가별	1995	1995.1	1995.2	1996	1996.1	1996.2	1997	1997.1	1997.2	...	2016.2	2017	2017.1	2017.2	2018	2018.1	2018.2	2019	2019.1	2019.2
0	국가별	전체	남자	여자	전체	남자	여자	전체	남자	여자	...	여자	전체	남자	여자	전체	남자	여자	전체	남자	여자
1	아시아	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	대한민국	6.0	6.9	5.2	5.9	7.2	4.6	7.3	8.3	6.3	...	10.1	9.8	10.0	9.5	10.1	10.3	10.0	9.8	10.1	9.6
3	아프가니스탄	18.3	17.3	23.0	18.3	17.5	22.8	18.3	17.3	23.0	...	21.7	17.5	16.3	21.3	17.3	16.1	21.1	17.2	16.0	20.9
4	방글라데시	6.3	6.5	5.4	6.5	6.7	5.5	7.2	7.4	6.5	...	14.3	12.3	10.4	16.7	12.2	10.2	16.6	12.1	10.2	16.6
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
137	짐바브웨	11.9	15.2	7.9	13.2	16.3	9.4	14.5	17.5	10.9	...	9.6	8.3	7.3	9.5	8.2	7.3	9.3	8.1	7.2	9.2
138	오세아니아	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
139	오스트레일리아	15.2	15.5	14.8	15.6	16.4	14.8	15.8	16.9	14.6	...	11.4	12.7	13.7	11.5	11.9	13.0	10.7	11.8	13.0	10.7
140	뉴질랜드	12.3	12.3	12.2	12.3	12.9	11.5	13.6	13.8	13.4	...	13.5	12.9	12.6	13.2	11.7	12.5	10.9	11.2	11.0	11.3
141	파푸아뉴기니	5.2	6.1	4.3	5.3	6.0	4.5	5.2	6.2	4.3	...	3.4	4.4	5.5	3.3	4.3	5.4	3.2	4.3	5.3	3.2

142 rows × 76 columns

In [258]:

df= df.dropna(axis=0)
df.head()

Out[258]:

	국가별	1995	1995.1	1995.2	1996	1996.1	1996.2	1997	1997.1	1997.2	...	2016.2	2017	2017.1	2017.2	2018	2018.1	2018.2	2019	2019.1	2019.2
0	국가별	전체	남자	여자	전체	남자	여자	전체	남자	여자	...	여자	전체	남자	여자	전체	남자	여자	전체	남자	여자
2	대한민국	6.0	6.9	5.2	5.9	7.2	4.6	7.3	8.3	6.3	...	10.1	9.8	10.0	9.5	10.1	10.3	10.0	9.8	10.1	9.6
3	아프가니스탄	18.3	17.3	23.0	18.3	17.5	22.8	18.3	17.3	23.0	...	21.7	17.5	16.3	21.3	17.3	16.1	21.1	17.2	16.0	20.9
4	방글라데시	6.3	6.5	5.4	6.5	6.7	5.5	7.2	7.4	6.5	...	14.3	12.3	10.4	16.7	12.2	10.2	16.6	12.1	10.2	16.6
5	부탄	4.8	5.1	4.3	4.8	5.1	4.3	4.7	5.2	4.3	...	11.9	9.9	7.8	12.0	9.9	7.8	12.1	10.0	7.9	12.3

5 rows × 76 columns

In [259]:

work_df=df.reset_index(drop=True)
work_df.head()
work_df=work_df.loc[1:]
work_df
work_df=work_df.set_index('국가별')

In [260]:

work_df=work_df.astype(float)

print(work_df.head())
print(work_df.info())

        1995  1995.1  1995.2  1996  1996.1  1996.2  1997  1997.1  1997.2  \
국가별                                                                        
대한민국     6.0     6.9     5.2   5.9     7.2     4.6   7.3     8.3     6.3   
아프가니스탄  18.3    17.3    23.0  18.3    17.5    22.8  18.3    17.3    23.0   
방글라데시    6.3     6.5     5.4   6.5     6.7     5.5   7.2     7.4     6.5   
부탄       4.8     5.1     4.3   4.8     5.1     4.3   4.7     5.2     4.3   
캄보디아     1.4     0.7     2.1   1.4     0.7     2.1   1.5     0.9     2.1   

        1998  ...  2016.2  2017  2017.1  2017.2  2018  2018.1  2018.2  2019  \
국가별           ...                                                             
대한민국    15.5  ...    10.1   9.8    10.0     9.5  10.1    10.3    10.0   9.8   
아프가니스탄  18.3  ...    21.7  17.5    16.3    21.3  17.3    16.1    21.1  17.2   
방글라데시    8.0  ...    14.3  12.3    10.4    16.7  12.2    10.2    16.6  12.1   
부탄       4.7  ...    11.9   9.9     7.8    12.0   9.9     7.8    12.1  10.0   
캄보디아     1.6  ...     1.2   0.4     0.3     0.5   0.4     0.2     0.5   0.4   

        2019.1  2019.2  
국가별                     
대한민국      10.1     9.6  
아프가니스탄    16.0    20.9  
방글라데시     10.2    16.6  
부탄         7.9    12.3  
캄보디아       0.2     0.5  

[5 rows x 75 columns]
<class 'pandas.core.frame.DataFrame'>
Index: 135 entries, 대한민국 to 파푸아뉴기니
Data columns (total 75 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   1995    135 non-null    float64
 1   1995.1  135 non-null    float64
 2   1995.2  135 non-null    float64
 3   1996    135 non-null    float64
 4   1996.1  135 non-null    float64
 5   1996.2  135 non-null    float64
 6   1997    135 non-null    float64
 7   1997.1  135 non-null    float64
 8   1997.2  135 non-null    float64
 9   1998    135 non-null    float64
 10  1998.1  135 non-null    float64
 11  1998.2  135 non-null    float64
 12  1999    135 non-null    float64
 13  1999.1  135 non-null    float64
 14  1999.2  135 non-null    float64
 15  2000    135 non-null    float64
 16  2000.1  135 non-null    float64
 17  2000.2  135 non-null    float64
 18  2001    135 non-null    float64
 19  2001.1  135 non-null    float64
 20  2001.2  135 non-null    float64
 21  2002    135 non-null    float64
 22  2002.1  135 non-null    float64
 23  2002.2  135 non-null    float64
 24  2003    135 non-null    float64
 25  2003.1  135 non-null    float64
 26  2003.2  135 non-null    float64
 27  2004    135 non-null    float64
 28  2004.1  135 non-null    float64
 29  2004.2  135 non-null    float64
 30  2005    135 non-null    float64
 31  2005.1  135 non-null    float64
 32  2005.2  135 non-null    float64
 33  2006    135 non-null    float64
 34  2006.1  135 non-null    float64
 35  2006.2  135 non-null    float64
 36  2007    135 non-null    float64
 37  2007.1  135 non-null    float64
 38  2007.2  135 non-null    float64
 39  2008    135 non-null    float64
 40  2008.1  135 non-null    float64
 41  2008.2  135 non-null    float64
 42  2009    135 non-null    float64
 43  2009.1  135 non-null    float64
 44  2009.2  135 non-null    float64
 45  2010    135 non-null    float64
 46  2010.1  135 non-null    float64
 47  2010.2  135 non-null    float64
 48  2011    135 non-null    float64
 49  2011.1  135 non-null    float64
 50  2011.2  135 non-null    float64
 51  2012    135 non-null    float64
 52  2012.1  135 non-null    float64
 53  2012.2  135 non-null    float64
 54  2013    135 non-null    float64
 55  2013.1  135 non-null    float64
 56  2013.2  135 non-null    float64
 57  2014    135 non-null    float64
 58  2014.1  135 non-null    float64
 59  2014.2  135 non-null    float64
 60  2015    135 non-null    float64
 61  2015.1  135 non-null    float64
 62  2015.2  135 non-null    float64
 63  2016    135 non-null    float64
 64  2016.1  135 non-null    float64
 65  2016.2  135 non-null    float64
 66  2017    135 non-null    float64
 67  2017.1  135 non-null    float64
 68  2017.2  135 non-null    float64
 69  2018    135 non-null    float64
 70  2018.1  135 non-null    float64
 71  2018.2  135 non-null    float64
 72  2019    135 non-null    float64
 73  2019.1  135 non-null    float64
 74  2019.2  135 non-null    float64
dtypes: float64(75)
memory usage: 80.2+ KB
None

In [261]:

work_df.head()

Out[261]:

	1995	1995.1	1995.2	1996	1996.1	1996.2	1997	1997.1	1997.2	1998	...	2016.2	2017	2017.1	2017.2	2018	2018.1	2018.2	2019	2019.1	2019.2
국가별
대한민국	6.0	6.9	5.2	5.9	7.2	4.6	7.3	8.3	6.3	15.5	...	10.1	9.8	10.0	9.5	10.1	10.3	10.0	9.8	10.1	9.6
아프가니스탄	18.3	17.3	23.0	18.3	17.5	22.8	18.3	17.3	23.0	18.3	...	21.7	17.5	16.3	21.3	17.3	16.1	21.1	17.2	16.0	20.9
방글라데시	6.3	6.5	5.4	6.5	6.7	5.5	7.2	7.4	6.5	8.0	...	14.3	12.3	10.4	16.7	12.2	10.2	16.6	12.1	10.2	16.6
부탄	4.8	5.1	4.3	4.8	5.1	4.3	4.7	5.2	4.3	4.7	...	11.9	9.9	7.8	12.0	9.9	7.8	12.1	10.0	7.9	12.3
캄보디아	1.4	0.7	2.1	1.4	0.7	2.1	1.5	0.9	2.1	1.6	...	1.2	0.4	0.3	0.5	0.4	0.2	0.5	0.4	0.2	0.5

5 rows × 75 columns

In [262]:

work_df=work_df.astype(float)
work_df.max(axis='columns')

Out[262]:

국가별
대한민국       18.9
아프가니스탄     23.1
방글라데시      16.7
부탄         14.2
캄보디아        2.3
           ... 
우간다         7.2
짐바브웨       17.5
오스트레일리아    16.9
뉴질랜드       18.4
파푸아뉴기니      6.2
Length: 135, dtype: float64

In [263]:

work_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 135 entries, 대한민국 to 파푸아뉴기니
Data columns (total 75 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   1995    135 non-null    float64
 1   1995.1  135 non-null    float64
 2   1995.2  135 non-null    float64
 3   1996    135 non-null    float64
 4   1996.1  135 non-null    float64
 5   1996.2  135 non-null    float64
 6   1997    135 non-null    float64
 7   1997.1  135 non-null    float64
 8   1997.2  135 non-null    float64
 9   1998    135 non-null    float64
 10  1998.1  135 non-null    float64
 11  1998.2  135 non-null    float64
 12  1999    135 non-null    float64
 13  1999.1  135 non-null    float64
 14  1999.2  135 non-null    float64
 15  2000    135 non-null    float64
 16  2000.1  135 non-null    float64
 17  2000.2  135 non-null    float64
 18  2001    135 non-null    float64
 19  2001.1  135 non-null    float64
 20  2001.2  135 non-null    float64
 21  2002    135 non-null    float64
 22  2002.1  135 non-null    float64
 23  2002.2  135 non-null    float64
 24  2003    135 non-null    float64
 25  2003.1  135 non-null    float64
 26  2003.2  135 non-null    float64
 27  2004    135 non-null    float64
 28  2004.1  135 non-null    float64
 29  2004.2  135 non-null    float64
 30  2005    135 non-null    float64
 31  2005.1  135 non-null    float64
 32  2005.2  135 non-null    float64
 33  2006    135 non-null    float64
 34  2006.1  135 non-null    float64
 35  2006.2  135 non-null    float64
 36  2007    135 non-null    float64
 37  2007.1  135 non-null    float64
 38  2007.2  135 non-null    float64
 39  2008    135 non-null    float64
 40  2008.1  135 non-null    float64
 41  2008.2  135 non-null    float64
 42  2009    135 non-null    float64
 43  2009.1  135 non-null    float64
 44  2009.2  135 non-null    float64
 45  2010    135 non-null    float64
 46  2010.1  135 non-null    float64
 47  2010.2  135 non-null    float64
 48  2011    135 non-null    float64
 49  2011.1  135 non-null    float64
 50  2011.2  135 non-null    float64
 51  2012    135 non-null    float64
 52  2012.1  135 non-null    float64
 53  2012.2  135 non-null    float64
 54  2013    135 non-null    float64
 55  2013.1  135 non-null    float64
 56  2013.2  135 non-null    float64
 57  2014    135 non-null    float64
 58  2014.1  135 non-null    float64
 59  2014.2  135 non-null    float64
 60  2015    135 non-null    float64
 61  2015.1  135 non-null    float64
 62  2015.2  135 non-null    float64
 63  2016    135 non-null    float64
 64  2016.1  135 non-null    float64
 65  2016.2  135 non-null    float64
 66  2017    135 non-null    float64
 67  2017.1  135 non-null    float64
 68  2017.2  135 non-null    float64
 69  2018    135 non-null    float64
 70  2018.1  135 non-null    float64
 71  2018.2  135 non-null    float64
 72  2019    135 non-null    float64
 73  2019.1  135 non-null    float64
 74  2019.2  135 non-null    float64
dtypes: float64(75)
memory usage: 84.2+ KB

In [264]:

year_df=work_df.iloc[:,range(0,74,3)]
year_df.head()

Out[264]:

	1995	1996	1997	1998	1999	2000	2001	2002	2003	2004	...	2010	2011	2012	2013	2014	2015	2016	2017	2018	2019
국가별
대한민국	6.0	5.9	7.3	15.5	13.7	10.5	10.0	8.4	9.9	10.3	...	9.6	9.6	8.9	9.2	9.9	9.9	10.1	9.8	10.1	9.8
아프가니스탄	18.3	18.3	18.3	18.3	18.3	18.2	18.1	18.2	18.2	18.1	...	18.2	18.1	18.1	18.1	17.9	17.8	17.7	17.5	17.3	17.2
방글라데시	6.3	6.5	7.2	8.0	8.9	9.8	9.3	8.2	6.7	7.8	...	6.4	7.5	8.7	10.1	10.4	10.8	11.1	12.3	12.2	12.1
부탄	4.8	4.8	4.7	4.7	4.7	5.4	6.1	6.0	5.8	7.7	...	8.5	8.8	6.8	9.0	8.8	9.7	9.8	9.9	9.9	10.0
캄보디아	1.4	1.4	1.5	1.6	1.7	1.8	1.8	1.9	2.0	2.1	...	1.0	0.6	0.7	0.8	1.1	0.7	1.1	0.4	0.4	0.4

5 rows × 25 columns

In [265]:

kor=year_df.loc[['대한민국','일본','중국']]

In [268]:

year=list(year_df.columns)

In [293]:

plt.figure(figsize = (10, 5), dpi = 300)
plt.rc('font', family='Malgun Gothic')
plt.title('한 중 일 실업률')
plt.plot(year, year_df.loc['대한민국'], label = '대한민국')
plt.plot(year, year_df.loc['일본'], label = '일본')
plt.plot(year, year_df.loc['중국'], label = '중국')
plt.legend()
plt.xticks(rotation=50)
plt.savefig('한중일실업률 .png', facecolor='#eeeeee')
plt.show()

In [ ]:

work_df.info()

In [ ]:

kor_df=work_df.iloc[[0]]
men=kor_df.iloc[:,range(1,74,3)]
women=kor_df.iloc[:,range(2,75,3)]

men=men.astype(float)

print(len(men.columns))

print(len(women.columns))
print(women.columns)
print(men.columns)

In [ ]:

plt.rcParams['axes.unicode_minus']= False
plt.rc('font', family='Malgun Gothic')
plt.figure(figsize = (10, 5), dpi = 300)
plt.barh(year, men.iloc[0], label = 'men')
plt.barh(year, women.apply(lambda x: -x).iloc[0], label = 'women')
plt.xticks(range(-20,20,5),rotation=50)
plt.legend() # 위에서 주었던 label을 표시하겠다는 뜻입니다.
plt.show()

In [ ]:

print(year_df.loc['대한민국'])
print(year_df.loc['미국'])

In [ ]:

plt.rcParams['axes.unicode_minus']= False
plt.rc('font', family='Malgun Gothic')
plt.figure(figsize = (10, 5), dpi = 300)
plt.barh(year, year_df.loc['대한민국'], label = 'kor')
plt.barh(year, year_df.loc['미국'].apply(lambda x: -x), label = 'usa')
plt.legend() # 위에서 주었던 label을 표시하겠다는 뜻입니다.
plt.xticks(range(-17,17,5),rotation=50)
plt.show()

20대 청년 취직율¶

30대 취직률 계로 그래프화 남녀별 차이

In [25]:

df1 = pd.read_csv('고용분기별.csv', encoding = 'cp949')
df1.head(100)

Out[25]:

	시점	연령계층별	계	남자	여자
0	1963	계	7563	4930	2633
1	1963	20 - 29세	2010	1281	729
2	1963	30 - 39세	1878	1269	609
3	1963	40 - 49세	1528	995	533
4	1963	50 - 59세	906	615	291
...	...	...	...	...	...
95	1978	60세이상	663	426	237
96	1979	계	13602	8383	5219
97	1979	20 - 29세	3360	2029	1331
98	1979	30 - 39세	3494	2380	1114
99	1979	40 - 49세	3117	1936	1181

100 rows × 5 columns

20대 취업인구 총계 남 여 별 전처리¶

In [72]:

df20=df1[df1['연령계층별'].str.contains('20')] # 20대 만 추력
 
#city_data = age_data[age_data['행정구역'].str.contains('계룡시')]
df20.head()

Out[72]:

	시점	연령계층별	계	남자	여자
1	1963	20 - 29세	2010	1281	729
7	1964	20 - 29세	2010	1300	710
13	1965	20 - 29세	2051	1328	723
19	1966	20 - 29세	2047	1294	753
25	1967	20 - 29세	2092	1327	765

2010년이후에 데이터만 저장¶

남녀 취업률이 역전되는 시점

In [87]:

df2010 =df20.loc[df20['시점'] >= 1998] # 20대 만 추력\
df2010

Out[87]:

	시점	연령계층별	계	남자	여자
211	1998	20 - 29세	4401	2327	2075
217	1999	20 - 29세	4340	2260	2080
223	2000	20 - 29세	4493	2307	2186
229	2001	20 - 29세	4472	2255	2217
235	2002	20 - 29세	4510	2247	2263
241	2003	20 - 29세	4374	2161	2212
247	2004	20 - 29세	4352	2115	2237
253	2005	20 - 29세	4244	2027	2218
259	2006	20 - 29세	4126	1976	2150
265	2007	20 - 29세	4062	1944	2118
271	2008	20 - 29세	3950	1882	2068
277	2009	20 - 29세	3813	1826	1988
283	2010	20 - 29세	3724	1782	1942
289	2011	20 - 29세	3679	1774	1905
295	2012	20 - 29세	3605	1729	1875
301	2013	20 - 29세	3504	1681	1823
307	2014	20 - 29세	3556	1701	1855
313	2015	20 - 29세	3619	1739	1881
319	2016	20 - 29세	3664	1760	1904
325	2017	20 - 29세	3660	1748	1912
331	2018	20 - 29세	3699	1770	1929
337	2019	20 - 29세	3747	1830	1917
343	2020	20 - 29세	3601	1754	1847
349	2021	20 - 29세	3706	1772	1934

20대 취업인구 시각화¶

In [255]:

plt.figure(figsize = (10, 5), dpi = 300)
plt.rc('font', family='Malgun Gothic')
plt.title('')
plt.plot(df20['시점'], df20['계'], label = '20대 취업인구')
#plt.plot(df20['시점'], df20['남자'], label = '남자 취업인구')
#plt.plot(df20['시점'], df20['여자'], label = '여자 취업인구')
plt.legend()
plt.xticks(rotation=50)
plt.tight_layout()
plt.savefig('20대 취업인구211.png', facecolor='#eeeeee')

In [292]:

plt.figure(figsize = (10, 8), dpi = 300)
plt.rc('font', family='Malgun Gothic')
plt.title('년도별 20대 취업자수')
plt.xlabel('년도별 성별 20대 취업자수')
plt.subplot(3,1,1)
plt.plot(df20['시점'], df20['계'], label = '20대 취업인구')
plt.xlabel('년도별 20대 취업자수')
plt.legend()


plt.subplots_adjust(left=0.125, bottom=0.1,  right=0.9, top=0.9, wspace=0.2, hspace=0.35)

plt.subplot(3,1,2)
plt.plot(df20['시점'], df20['남자'], label = '남자 취업인구')
plt.plot(df20['시점'], df20['여자'], label = '여자 취업인구')
plt.xlabel('년도별 성별 20대 취업자수')
plt.legend()




plt.subplots_adjust(left=0.125, bottom=0.1,  right=0.9, top=0.9, wspace=0.2, hspace=0.35)
plt.subplot(3,1,3)
plt.plot(df2010['시점'], df2010['남자'], label = '남자 취업인구')
plt.plot(df2010['시점'], df2010['여자'], label = '여자 취업인구')
plt.xlabel('년도별 성별 20대 변화')
plt.legend()
plt.savefig('서브플랏_남녀취업인구.png',facecolor='#eeeeee')
plt.show()

In [ ]:

20대 내국인 인구수전처리¶

불필요한 외국인포함총인구컬럼은 삭제

In [162]:

df = pd.read_csv('20대인구수.csv', encoding = 'cp949')
print(df.head(10))

#
df.drop(df.iloc[:,3:7],axis=1, inplace=True)

  행정구역별(읍면동)    시점     연령별      총인구(명)   총인구_남자(명)   총인구_여자(명) 총인구_성비  \
0         전국  2015      합계  51069375.0  25608502.0  25460873.0  100.6   
1         전국  2015    0~4세   2258670.0   1159011.0   1099659.0  105.4   
2         전국  2015    5~9세   2267851.0   1169770.0   1098081.0  106.5   
3         전국  2015  10~14세   2427792.0   1262770.0   1165022.0  108.4   
4         전국  2015  15~19세   3194079.0   1668683.0   1525396.0  109.4   
5         전국  2015  20~24세   3531108.0   1887776.0   1643332.0  114.9   
6         전국  2015  25~29세   3265288.0   1728888.0   1536400.0  112.5   
7         전국  2015  30~34세   3811610.0   1986796.0   1824814.0  108.9   
8         전국  2015  35~39세   3926862.0   2022466.0   1904396.0  106.2   
9         전국  2015  40~44세   4338827.0   2218442.0   2120385.0  104.6   

       내국인(명)   내국인_남자(명)   내국인_여자(명) 내국인_성비  
0  49705663.0  24819839.0  24885824.0   99.7  
1   2235397.0   1147126.0   1088271.0  105.4  
2   2252950.0   1162087.0   1090863.0  106.5  
3   2418360.0   1257902.0   1160458.0  108.4  
4   3170545.0   1657722.0   1512823.0  109.6  
5   3385936.0   1808857.0   1577079.0  114.7  
6   3027896.0   1581887.0   1446009.0  109.4  
7   3611034.0   1854905.0   1756129.0  105.6  
8   3783589.0   1927388.0   1856201.0  103.8  
9   4215921.0   2142101.0   2073820.0  103.3

지역구컬럼 삭제¶

In [163]:

df.drop(['행정구역별(읍면동)'], axis=1, inplace=True)
print(df)

       시점     연령별      내국인(명)   내국인_남자(명)   내국인_여자(명) 내국인_성비
0    2015      합계  49705663.0  24819839.0  24885824.0   99.7
1    2015    0~4세   2235397.0   1147126.0   1088271.0  105.4
2    2015    5~9세   2252950.0   1162087.0   1090863.0  106.5
3    2015  10~14세   2418360.0   1257902.0   1160458.0  108.4
4    2015  15~19세   3170545.0   1657722.0   1512823.0  109.6
..    ...     ...         ...         ...         ...    ...
191  2021  15~64세  35464929.0  18109395.0  17355534.0  104.3
192  2021   65세이상   8619927.0   3751312.0   4868615.0   77.1
193  2021   85세이상    874045.0    245130.0    628915.0   39.0
194  2021    평균연령        43.6        42.5        44.8      -
195  2021    중위연령        44.9        43.6        46.3      -

[196 rows x 6 columns]

In [164]:

df.head()

Out[164]:

	시점	연령별	내국인(명)	내국인_남자(명)	내국인_여자(명)	내국인_성비
0	2015	합계	49705663.0	24819839.0	24885824.0	99.7
1	2015	0~4세	2235397.0	1147126.0	1088271.0	105.4
2	2015	5~9세	2252950.0	1162087.0	1090863.0	106.5
3	2015	10~14세	2418360.0	1257902.0	1160458.0	108.4
4	2015	15~19세	3170545.0	1657722.0	1512823.0	109.6

20대 만 추력¶

In [180]:

# 20대 만 추력
df.head()

Out[180]:

	시점	연령별	내국인(명)	내국인_남자(명)	내국인_여자(명)	내국인_성비
5	2015	20~24세	3385936.0	1808857.0	1577079.0	114.7
6	2015	25~29세	3027896.0	1581887.0	1446009.0	109.4
33	2016	20~24세	3400634.0	1816165.0	1584469.0	114.6
34	2016	25~29세	3068970.0	1607473.0	1461497.0	110.0
61	2017	20~24세	3355986.0	1786645.0	1569341.0	113.8

20_24, 25-29인 행을 합해서 출력¶

In [178]:

grop_year=df.groupby(df['시점']).sum()
grop_year

Out[178]:

	내국인(명)	내국인_남자(명)	내국인_여자(명)
시점
2015	6413832.0	3390744.0	3023088.0
2016	6469604.0	3423638.0	3045966.0
2017	6520025.0	3451738.0	3068287.0
2018	6549517.0	3466761.0	3082756.0
2019	6547164.0	3462128.0	3085036.0
2020	6616547.0	3484532.0	3132015.0
2021	6505610.0	3425581.0	3080029.0

In [201]:

#grop_year_reset=grop_year.reset_index()
grop_year_reset

Out[201]:

	시점	내국인(명)	내국인_남자(명)	내국인_여자(명)
0	2015	6413832.0	3390744.0	3023088.0
1	2016	6469604.0	3423638.0	3045966.0
2	2017	6520025.0	3451738.0	3068287.0
3	2018	6549517.0	3466761.0	3082756.0
4	2019	6547164.0	3462128.0	3085036.0
5	2020	6616547.0	3484532.0	3132015.0
6	2021	6505610.0	3425581.0	3080029.0

취업 인구 데이터¶

불필요한 나이대 컬럼은 삭제¶

In [221]:

df2015 =df20.loc[df20['시점'] >= 2015] # 20대 만 추력\
df2015
df2015.drop(['연령계층별'], axis=1, inplace=True)
df2015

C:\Users\82105\AppData\Local\Temp\ipykernel_24188\4144972728.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2015.drop(['연령계층별'], axis=1, inplace=True)

Out[221]:

	시점	계	남자	여자
313	2015	3619	1739	1881
319	2016	3664	1760	1904
325	2017	3660	1748	1912
331	2018	3699	1770	1929
337	2019	3747	1830	1917
343	2020	3601	1754	1847
349	2021	3706	1772	1934

df2015는 단위가 천단위기 떄문에 곱하기 1000해서 맞춰줌¶

1. 먼저 시점 컬러를 인덱스시킨다
2. 데이테에 1000을 곱한다
3. 다시 인덱스 리셋한다
4. 비교할 데이터를 int형으로 바꿔준다

In [222]:

df2015 = df2015.set_index(keys=['시점'], inplace=False, drop=True)

In [223]:

df2015=df2015.iloc[:,0:].mul(1000)
df2015

Out[223]:

	계	남자	여자
시점
2015	3619000	1739000	1881000
2016	3664000	1760000	1904000
2017	3660000	1748000	1912000
2018	3699000	1770000	1929000
2019	3747000	1830000	1917000
2020	3601000	1754000	1847000
2021	3706000	1772000	1934000

In [224]:

df2015=df2015.reset_index(drop=False)

In [225]:

df2015

Out[225]:

	시점	계	남자	여자
0	2015	3619000	1739000	1881000
1	2016	3664000	1760000	1904000
2	2017	3660000	1748000	1912000
3	2018	3699000	1770000	1929000
4	2019	3747000	1830000	1917000
5	2020	3601000	1754000	1847000
6	2021	3706000	1772000	1934000

In [ ]:

In [227]:

#grop_year_reset=grop_year_reset.astype(int)
print(df2015)
print(grop_year_reset)

     시점        계       남자       여자
0  2015  3619000  1739000  1881000
1  2016  3664000  1760000  1904000
2  2017  3660000  1748000  1912000
3  2018  3699000  1770000  1929000
4  2019  3747000  1830000  1917000
5  2020  3601000  1754000  1847000
6  2021  3706000  1772000  1934000
     시점   내국인(명)  내국인_남자(명)  내국인_여자(명)
0  2015  6413832    3390744    3023088
1  2016  6469604    3423638    3045966
2  2017  6520025    3451738    3068287
3  2018  6549517    3466761    3082756
4  2019  6547164    3462128    3085036
5  2020  6616547    3484532    3132015
6  2021  6505610    3425581    3080029

In [ ]:

인구랑 비율

In [230]:

df_start=pd.merge(df2015, grop_year_reset, how='inner', on='시점')
df_start

Out[230]:

	시점	계	남자	여자	내국인(명)	내국인_남자(명)	내국인_여자(명)
0	2015	3619000	1739000	1881000	6413832	3390744	3023088
1	2016	3664000	1760000	1904000	6469604	3423638	3045966
2	2017	3660000	1748000	1912000	6520025	3451738	3068287
3	2018	3699000	1770000	1929000	6549517	3466761	3082756
4	2019	3747000	1830000	1917000	6547164	3462128	3085036
5	2020	3601000	1754000	1847000	6616547	3484532	3132015
6	2021	3706000	1772000	1934000	6505610	3425581	3080029

In [233]:

df_start['취업율']=df_start['계']/df_start['내국인(명)']*100

In [279]:

df_start['남자취업율']=df_start['남자']/df_start['내국인_남자(명)']*100
df_start['여자취업율']=df_start['여자']/df_start['내국인_여자(명)']*100   
df_start

Out[279]:

	시점	계	남자	여자	내국인(명)	내국인_남자(명)	내국인_여자(명)	취업율	남자취업율	여자취업율
0	2015	3619000	1739000	1881000	6413832	3390744	3023088	56.424927	51.286679	62.221146
1	2016	3664000	1760000	1904000	6469604	3423638	3045966	56.634069	51.407304	62.508905
2	2017	3660000	1748000	1912000	6520025	3451738	3068287	56.134754	50.641155	62.314901
3	2018	3699000	1770000	1929000	6549517	3466761	3082756	56.477447	51.056303	62.573879
4	2019	3747000	1830000	1917000	6547164	3462128	3085036	57.230887	52.857664	62.138659
5	2020	3601000	1754000	1847000	6616547	3484532	3132015	54.424158	50.336745	58.971621
6	2021	3706000	1772000	1934000	6505610	3425581	3080029	56.966218	51.728451	62.791617

In [234]:

df_start

Out[234]:

	시점	계	남자	여자	내국인(명)	내국인_남자(명)	내국인_여자(명)	취업율
0	2015	3619000	1739000	1881000	6413832	3390744	3023088	56.424927
1	2016	3664000	1760000	1904000	6469604	3423638	3045966	56.634069
2	2017	3660000	1748000	1912000	6520025	3451738	3068287	56.134754
3	2018	3699000	1770000	1929000	6549517	3466761	3082756	56.477447
4	2019	3747000	1830000	1917000	6547164	3462128	3085036	57.230887
5	2020	3601000	1754000	1847000	6616547	3484532	3132015	54.424158
6	2021	3706000	1772000	1934000	6505610	3425581	3080029	56.966218

인구와 취업인구 시각화¶

In [ ]:

In [291]:

plt.figure(figsize = (10, 5), dpi = 300)
plt.rc('font', family='Malgun Gothic')
plt.title('20대 취업률')
plt.plot(df_start['시점'], df_start['취업율'], label = '20대 취업률')
plt.plot(df_start['시점'], df_start['남자취업율'], label = '남자 ')
plt.plot(df_start['시점'], df_start['여자취업율'], label = '여자')
plt.legend()
plt.xticks(rotation=50)
plt.ylim([30,70])
plt.savefig('20대 취업률_남녀.png', facecolor='#eeeeee')
plt.show()

In [ ]:

In [294]:

df = pd.read_csv('고용분기별성별.csv', encoding = 'cp949')
df.head(100)

Out[294]:

	시점	항목	계	남자	여자
0	1999.3/4	15세이상인구 (천명)	35814.0	17335.0	18479.0
1	1999.3/4	경제활동인구 (천명)	21934.0	13021.0	8912.0
2	1999.3/4	취업자 (천명)	20542.0	12095.0	8446.0
3	1999.3/4	실업자 (천명)	1392.0	926.0	466.0
4	1999.3/4	비경제활동인구 (천명)	13881.0	4314.0	9567.0
...	...	...	...	...	...
95	2002.2/4	고용률 (%)	60.1	72.4	48.6
96	2002.3/4	15세이상인구 (천명)	37064.0	17978.0	19086.0
97	2002.3/4	경제활동인구 (천명)	23034.0	13523.0	9511.0
98	2002.3/4	취업자 (천명)	22311.0	13049.0	9262.0
99	2002.3/4	실업자 (천명)	723.0	474.0	249.0

100 rows × 5 columns

주요국가 GDP¶

In [374]:

# 
df = pd.read_csv('GDP2.csv', encoding = 'cp949')
df.head(100)
df= df.dropna(axis=1)
df.head()

Out[374]:

	시점	대한민국	아프가니스탄	아르메니아	아제르바이잔	바레인	방글라데시	부탄	브루나이	캄보디아	...	뉴칼레도니아	뉴질랜드	북마리아나제도	팔라우	파푸아뉴기니	사모아	솔로몬제도	통가	투발루	바누아투
0	1995	9.6	-	6.9	-11.8	3.9	5.1	7.1	4.5	9.9	...	5.9	4.7	-	-	-3.3	6.7	10.1	7.4	-5.0	1.0
1	1996	7.9	-	5.9	1.3	4.1	4.5	5.6	2.9	5.9	...	0.4	3.6	-	-	7.7	7.2	1.6	1.8	-6.0	2.3
2	1997	6.2	-	3.3	5.8	3.1	4.5	5.4	-1.5	4.0	...	2.0	2.0	-	-	-3.9	0.6	-0.9	1.2	10.0	4.9
3	1998	-5.1	-	7.3	10.0	4.8	5.2	5.9	-0.6	4.7	...	-3.2	0.8	-	-	-3.8	2.2	1.3	2.5	15.5	1.2
4	1999	11.5	-	3.3	7.4	4.3	4.7	8.0	3.1	12.7	...	0.9	5.5	-	-	1.9	2.2	-0.5	3.7	-1.6	0.3

5 rows × 207 columns

세계 국가들 중 주요 국가 데이터만 출력 저장¶

In [375]:

df_GDP=df.loc[:,['시점','미국','대한민국','일본','중국','러시아','영국','인도']]
df_GDP

Out[375]:

	시점	미국	대한민국	일본	중국	러시아	영국	인도
0	1995	2.7	9.6	2.6	11.0	-4.1	2.5	7.6
1	1996	3.8	7.9	3.1	9.9	-3.8	2.4	7.5
2	1997	4.4	6.2	1.0	9.2	1.4	4.9	4.0
3	1998	4.5	-5.1	-1.3	7.8	-5.3	3.2	6.2
4	1999	4.8	11.5	-0.3	7.7	6.4	3.0	8.8
5	2000	4.1	9.1	2.8	8.5	10.0	3.7	3.8
6	2001	1.0	4.9	0.4	8.3	5.1	2.1	4.8
7	2002	1.7	7.7	0.0	9.1	4.7	2.1	3.8
8	2003	2.8	3.1	1.5	10.0	7.3	3.0	7.9
9	2004	3.9	5.2	2.2	10.1	7.2	2.4	7.9
10	2005	3.5	4.3	1.8	11.4	6.4	2.6	7.9
11	2006	2.8	5.3	1.4	12.7	8.2	2.6	8.1
12	2007	2.0	5.8	1.5	14.2	8.5	2.3	7.7
13	2008	0.1	3.0	-1.2	9.7	5.2	-0.2	3.1
14	2009	-2.6	0.8	-5.7	9.4	-7.8	-4.2	7.9
15	2010	2.7	6.8	4.1	10.6	4.5	2.1	8.5
16	2011	1.5	3.7	0.0	9.6	4.3	1.5	5.2
17	2012	2.3	2.4	1.4	7.9	4.0	1.5	5.5
18	2013	1.8	3.2	2.0	7.8	1.8	1.9	6.4
19	2014	2.3	3.2	0.3	7.4	0.7	3.0	7.4
20	2015	2.7	2.8	1.6	7.0	-2.0	2.6	8.0
21	2016	1.7	2.9	0.8	6.8	0.2	2.3	8.3
22	2017	2.3	3.2	1.7	6.9	1.8	2.1	6.8
23	2018	2.9	2.9	0.6	6.7	2.8	1.7	6.5
24	2019	2.3	2.2	-0.2	6.0	2.2	1.7	3.7
25	2020	-3.4	-0.7	-4.5	2.2	-2.7	-9.3	-6.6
26	2021	5.7	4.1	1.6	8.1	4.8	7.4	8.9

In [328]:

df_GDP

Out[328]:

	시점	미국	대한민국	일본	중국	러시아	영국	인도
0	1995	2.7	9.6	2.6	11.0	-4.1	2.5	7.6
1	1996	3.8	7.9	3.1	9.9	-3.8	2.4	7.5
2	1997	4.4	6.2	1.0	9.2	1.4	4.9	4.0
3	1998	4.5	-5.1	-1.3	7.8	-5.3	3.2	6.2
4	1999	4.8	11.5	-0.3	7.7	6.4	3.0	8.8
5	2000	4.1	9.1	2.8	8.5	10.0	3.7	3.8
6	2001	1.0	4.9	0.4	8.3	5.1	2.1	4.8
7	2002	1.7	7.7	0.0	9.1	4.7	2.1	3.8
8	2003	2.8	3.1	1.5	10.0	7.3	3.0	7.9
9	2004	3.9	5.2	2.2	10.1	7.2	2.4	7.9
10	2005	3.5	4.3	1.8	11.4	6.4	2.6	7.9
11	2006	2.8	5.3	1.4	12.7	8.2	2.6	8.1
12	2007	2.0	5.8	1.5	14.2	8.5	2.3	7.7
13	2008	0.1	3.0	-1.2	9.7	5.2	-0.2	3.1
14	2009	-2.6	0.8	-5.7	9.4	-7.8	-4.2	7.9
15	2010	2.7	6.8	4.1	10.6	4.5	2.1	8.5
16	2011	1.5	3.7	0.0	9.6	4.3	1.5	5.2
17	2012	2.3	2.4	1.4	7.9	4.0	1.5	5.5
18	2013	1.8	3.2	2.0	7.8	1.8	1.9	6.4
19	2014	2.3	3.2	0.3	7.4	0.7	3.0	7.4
20	2015	2.7	2.8	1.6	7.0	-2.0	2.6	8.0
21	2016	1.7	2.9	0.8	6.8	0.2	2.3	8.3
22	2017	2.3	3.2	1.7	6.9	1.8	2.1	6.8
23	2018	2.9	2.9	0.6	6.7	2.8	1.7	6.5
24	2019	2.3	2.2	-0.2	6.0	2.2	1.7	3.7
25	2020	-3.4	-0.7	-4.5	2.2	-2.7	-9.3	-6.6
26	2021	5.7	4.1	1.6	8.1	4.8	7.4	8.9

In [314]:

df_GDP['미국']

Out[314]:

0     2.7
1     3.8
2     4.4
3     4.5
4     4.8
5     4.1
6     1.0
7     1.7
8     2.8
9     3.9
10    3.5
11    2.8
12    2.0
13    0.1
14   -2.6
15    2.7
16    1.5
17    2.3
18    1.8
19    2.3
20    2.7
21    1.7
22    2.3
23    2.9
24    2.3
25   -3.4
26    5.7
Name: 미국, dtype: float64

주요국가 시각화¶

In [345]:

plt.figure(figsize = (10, 5), dpi = 300)#
plt.rc('font', family='Malgun Gothic')
plt.title('주요 국가 GDP 변화')

plt.plot(df_GDP['시점'], df_GDP['미국'], label = '미국')
plt.plot(df_GDP['시점'], df_GDP['대한민국'], label = '대한민국 ',color='red', marker='o', linestyle='dashed', linewidth=2, markersize=5)
plt.plot(df_GDP['시점'], df_GDP['일본'], label = '일본')
plt.plot(df_GDP['시점'], df_GDP['중국'], label = '중국')
plt.plot(df_GDP['시점'], df_GDP['러시아'], label = '러시아')
plt.plot(df_GDP['시점'], df_GDP['인도'], label = '인도')

plt.legend()
plt.xticks(rotation=50)

plt.savefig('주요국가 GDP그래프.png', facecolor='#eeeeee')
plt.show()

한중일 GDP 차이 시각화¶

In [344]:

import matplotlib
matplotlib.rcParams['axes.unicode_minus'] = False
plt.figure(figsize = (10, 5), dpi = 300)#
plt.rc('font', family='Malgun Gothic')
plt.title('한중일 국가 GDP 변화')


plt.plot(df_GDP['시점'], df_GDP['대한민국'], label = '대한민국 ',color='red', marker='o', linestyle='dashed', markersize=5)
plt.plot(df_GDP['시점'], df_GDP['일본'], label = '일본')
plt.plot(df_GDP['시점'], df_GDP['중국'], label = '중국')


plt.legend()
plt.xticks(rotation=50)

plt.savefig('한중일 GDP그래프.png', facecolor='#eeeeee')
plt.show()

한미 GDP그래프¶

In [379]:

plt.figure(figsize = (10, 5), dpi = 300)#
plt.rc('font', family='Malgun Gothic')
plt.title('한미 국가 GDP 변화')


plt.plot(df_GDP['시점'], df_GDP['대한민국'], label = '대한민국 ',color='red', marker='o', linestyle='dashed', markersize=5)
plt.plot(df_GDP['시점'], df_GDP['미국'], label = '미국')


plt.legend()
plt.xticks(rotation=50)

plt.savefig('한미 GDP그래프.png', facecolor='#eeeeee')
plt.show()

In [368]:

df10 = pd.read_csv('소비자.csv', encoding = 'cp949')
df10.head(100)

Out[368]:

	항목	시점	아시아	대한민국	아프가니스탄	아르메니아	아제르바이잔	바레인	방글라데시	부탄	...	피지	키리바시	미크로네시아	뉴질랜드	팔라우	파푸아뉴기니	사모아	솔로몬제도	통가	바누아투
0	소비자물가지수	1995	NaN	54.8	-	44.3	42.6	83.8	42.5	56.7	...	60.4	-	-	71.7	-	30.2	49.3	28.2	39.5	67.0
1	소비자물가지수	1996	NaN	57.5	-	52.6	51.0	83.4	43.5	61.6	...	62.3	-	-	73.3	-	33.7	51.9	31.5	40.7	67.7
2	소비자물가지수	1997	NaN	60.1	-	59.9	52.9	85.4	45.9	65.7	...	64.4	-	-	74.2	-	35.1	55.5	34.1	41.6	69.6
3	소비자물가지수	1998	NaN	64.6	-	65.1	52.5	85.1	49.7	72.6	...	68.0	-	-	75.1	-	39.8	56.7	38.3	42.9	71.9
4	소비자물가지수	1999	NaN	65.1	-	65.5	48.0	84.0	52.7	77.5	...	69.4	-	70.9	75.0	-	45.8	56.9	41.3	44.9	73.3
5	소비자물가지수	2000	NaN	66.6	-	65.0	48.9	83.4	53.9	80.6	...	70.1	-	72.4	77.0	-	52.9	57.4	44.6	47.7	75.1
6	소비자물가지수	2001	NaN	69.3	-	67.0	49.6	82.4	55.0	83.4	...	73.1	-	72.8	79.0	74.1	57.8	59.6	47.7	51.7	77.8
7	소비자물가지수	2002	NaN	71.2	-	67.8	51.0	82.0	56.8	85.4	...	73.7	-	72.7	81.1	73.1	64.6	64.4	52.9	57.0	79.4
8	소비자물가지수	2003	NaN	73.7	-	71.0	52.1	83.3	60.0	86.8	...	76.8	-	72.8	82.6	73.8	74.2	64.5	57.3	63.6	81.8
9	소비자물가지수	2004	NaN	76.3	63.5	75.9	55.6	85.3	64.6	71.1	...	78.9	-	74.5	84.4	77.4	75.8	75.0	61.3	70.6	82.9
10	소비자물가지수	2005	NaN	78.4	71.6	76.4	61.0	87.5	69.2	74.8	...	80.8	-	77.6	87.0	80.5	77.1	76.4	65.8	76.7	83.9
11	소비자물가지수	2006	NaN	80.2	76.4	78.6	66.1	89.3	73.8	78.6	...	82.8	80.5	81.2	89.9	84.1	78.9	79.2	73.2	81.5	85.6
12	소비자물가지수	2007	NaN	82.2	83.1	82.1	77.1	92.2	80.6	82.6	...	86.8	83.4	83.9	92.1	86.8	79.7	83.7	78.8	86.2	89.0
13	소비자물가지수	2008	NaN	86.1	105.0	89.4	93.2	95.4	87.7	89.5	...	93.5	94.8	91.0	95.7	97.2	88.2	93.3	92.4	95.2	93.3
14	소비자물가지수	2009	NaN	88.5	97.9	92.4	94.6	98.1	92.5	93.4	...	96.4	104.1	96.8	97.7	98.6	94.3	99.2	99.0	96.6	97.3
15	소비자물가지수	2010	NaN	91.1	100.0	100.0	100.0	100.0	100.0	100.0	...	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
16	소비자물가지수	2011	NaN	94.7	111.8	107.7	107.9	99.6	111.4	108.8	...	107.3	101.5	105.1	104.0	104.7	104.4	105.2	107.3	106.3	100.9
17	소비자물가지수	2012	NaN	96.8	119.0	110.4	109.0	102.3	118.3	120.7	...	111.0	98.4	110.4	105.1	108.5	109.2	107.4	113.7	107.5	102.2
18	소비자물가지수	2013	NaN	98.0	127.8	116.8	111.6	105.7	127.2	129.2	...	114.2	96.9	112.3	106.3	112.1	114.6	108.0	119.8	108.3	103.7
19	소비자물가지수	2014	NaN	99.3	133.8	120.3	113.2	108.5	136.1	139.9	...	114.8	99.0	113.0	107.6	116.8	120.6	107.6	126.0	111.0	104.6
20	소비자물가지수	2015	NaN	100.0	132.9	124.8	117.7	110.5	144.6	146.2	...	116.4	99.5	112.7	107.9	117.9	127.8	108.4	125.3	109.9	107.1
21	소비자물가지수	2016	NaN	101.0	138.7	123.0	132.4	113.6	152.5	151.0	...	120.8	101.5	111.5	108.6	116.7	136.3	109.8	125.9	112.7	108.1
22	소비자물가지수	2017	NaN	102.9	145.6	124.2	149.5	115.2	161.2	158.4	...	124.9	101.8	112.3	110.7	118.2	143.7	111.7	126.5	121.1	111.4
23	소비자물가지수	2018	NaN	104.5	146.5	127.3	152.9	117.6	170.2	162.7	...	130.0	102.4	114.3	112.4	-	150.0	116.4	130.9	-	114.0
24	소비자물가지수	2019	NaN	104.9	149.9	129.2	156.9	118.8	179.7	167.2	...	132.3	100.5	-	114.2	-	155.9	117.6	133.1	-	117.1
25	소비자물가지수	2020	NaN	105.4	-	130.7	161.2	116.0	189.9	176.6	...	128.9	-	-	116.2	-	163.5	115.7	137.0	-	-

26 rows × 193 columns

In [369]:

df10=df10.iloc[:,1:]

In [371]:

df10

Out[371]:

	시점	아시아	대한민국	아프가니스탄	아르메니아	아제르바이잔	바레인	방글라데시	부탄	브루나이	...	피지	키리바시	미크로네시아	뉴질랜드	팔라우	파푸아뉴기니	사모아	솔로몬제도	통가	바누아투
0	1995	NaN	54.8	-	44.3	42.6	83.8	42.5	56.7	90.9	...	60.4	-	-	71.7	-	30.2	49.3	28.2	39.5	67.0
1	1996	NaN	57.5	-	52.6	51.0	83.4	43.5	61.6	92.7	...	62.3	-	-	73.3	-	33.7	51.9	31.5	40.7	67.7
2	1997	NaN	60.1	-	59.9	52.9	85.4	45.9	65.7	94.3	...	64.4	-	-	74.2	-	35.1	55.5	34.1	41.6	69.6
3	1998	NaN	64.6	-	65.1	52.5	85.1	49.7	72.6	93.9	...	68.0	-	-	75.1	-	39.8	56.7	38.3	42.9	71.9
4	1999	NaN	65.1	-	65.5	48.0	84.0	52.7	77.5	93.5	...	69.4	-	70.9	75.0	-	45.8	56.9	41.3	44.9	73.3
5	2000	NaN	66.6	-	65.0	48.9	83.4	53.9	80.6	95.0	...	70.1	-	72.4	77.0	-	52.9	57.4	44.6	47.7	75.1
6	2001	NaN	69.3	-	67.0	49.6	82.4	55.0	83.4	95.5	...	73.1	-	72.8	79.0	74.1	57.8	59.6	47.7	51.7	77.8
7	2002	NaN	71.2	-	67.8	51.0	82.0	56.8	85.4	93.3	...	73.7	-	72.7	81.1	73.1	64.6	64.4	52.9	57.0	79.4
8	2003	NaN	73.7	-	71.0	52.1	83.3	60.0	86.8	93.6	...	76.8	-	72.8	82.6	73.8	74.2	64.5	57.3	63.6	81.8
9	2004	NaN	76.3	63.5	75.9	55.6	85.3	64.6	71.1	94.4	...	78.9	-	74.5	84.4	77.4	75.8	75.0	61.3	70.6	82.9
10	2005	NaN	78.4	71.6	76.4	61.0	87.5	69.2	74.8	95.5	...	80.8	-	77.6	87.0	80.5	77.1	76.4	65.8	76.7	83.9
11	2006	NaN	80.2	76.4	78.6	66.1	89.3	73.8	78.6	95.7	...	82.8	80.5	81.2	89.9	84.1	78.9	79.2	73.2	81.5	85.6
12	2007	NaN	82.2	83.1	82.1	77.1	92.2	80.6	82.6	96.6	...	86.8	83.4	83.9	92.1	86.8	79.7	83.7	78.8	86.2	89.0
13	2008	NaN	86.1	105.0	89.4	93.2	95.4	87.7	89.5	98.6	...	93.5	94.8	91.0	95.7	97.2	88.2	93.3	92.4	95.2	93.3
14	2009	NaN	88.5	97.9	92.4	94.6	98.1	92.5	93.4	99.6	...	96.4	104.1	96.8	97.7	98.6	94.3	99.2	99.0	96.6	97.3
15	2010	NaN	91.1	100.0	100.0	100.0	100.0	100.0	100.0	100.0	...	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
16	2011	NaN	94.7	111.8	107.7	107.9	99.6	111.4	108.8	100.1	...	107.3	101.5	105.1	104.0	104.7	104.4	105.2	107.3	106.3	100.9
17	2012	NaN	96.8	119.0	110.4	109.0	102.3	118.3	120.7	100.2	...	111.0	98.4	110.4	105.1	108.5	109.2	107.4	113.7	107.5	102.2
18	2013	NaN	98.0	127.8	116.8	111.6	105.7	127.2	129.2	100.6	...	114.2	96.9	112.3	106.3	112.1	114.6	108.0	119.8	108.3	103.7
19	2014	NaN	99.3	133.8	120.3	113.2	108.5	136.1	139.9	100.4	...	114.8	99.0	113.0	107.6	116.8	120.6	107.6	126.0	111.0	104.6
20	2015	NaN	100.0	132.9	124.8	117.7	110.5	144.6	146.2	99.9	...	116.4	99.5	112.7	107.9	117.9	127.8	108.4	125.3	109.9	107.1
21	2016	NaN	101.0	138.7	123.0	132.4	113.6	152.5	151.0	99.7	...	120.8	101.5	111.5	108.6	116.7	136.3	109.8	125.9	112.7	108.1
22	2017	NaN	102.9	145.6	124.2	149.5	115.2	161.2	158.4	98.4	...	124.9	101.8	112.3	110.7	118.2	143.7	111.7	126.5	121.1	111.4
23	2018	NaN	104.5	146.5	127.3	152.9	117.6	170.2	162.7	99.4	...	130.0	102.4	114.3	112.4	-	150.0	116.4	130.9	-	114.0
24	2019	NaN	104.9	149.9	129.2	156.9	118.8	179.7	167.2	99.0	...	132.3	100.5	-	114.2	-	155.9	117.6	133.1	-	117.1
25	2020	NaN	105.4	-	130.7	161.2	116.0	189.9	176.6	100.9	...	128.9	-	-	116.2	-	163.5	115.7	137.0	-	-

26 rows × 192 columns

In [377]:

df_sobi=df10.loc[:,['시점','미국','대한민국','일본','중국','러시아','영국','인도']]
df_sobi

Out[377]:

	시점	미국	대한민국	일본	중국	러시아	영국	인도
0	1995	69.9	54.8	101.1	64.5	6.3	73.9	37.7
1	1996	71.9	57.5	101.3	69.8	9.4	76.0	41.1
2	1997	73.6	60.1	103.0	71.8	10.7	77.7	44.1
3	1998	74.8	64.6	103.7	71.2	13.7	79.1	49.9
4	1999	76.4	65.1	103.4	70.2	25.5	80.5	52.2
5	2000	79.0	66.6	102.7	70.5	30.8	81.5	54.3
6	2001	81.2	69.3	101.9	71.0	37.4	82.7	56.4
7	2002	82.5	71.2	101.0	70.4	43.3	84.0	58.8
8	2003	84.4	73.7	100.7	71.2	49.2	85.1	61.1
9	2004	86.6	76.3	100.7	74.0	54.5	86.3	63.4
10	2005	89.6	78.4	100.4	75.3	61.4	88.1	66.0
11	2006	92.4	80.2	100.7	76.5	67.4	90.3	69.9
12	2007	95.1	82.2	100.7	80.2	73.5	92.4	74.3
13	2008	98.7	86.1	102.1	85.0	83.8	95.7	80.5
14	2009	98.4	88.5	100.7	84.3	93.6	97.6	89.3
15	2010	100.0	91.1	100.0	87.0	100.0	100.0	100.0
16	2011	103.2	94.7	99.7	91.8	108.4	103.9	108.9
17	2012	105.3	96.8	99.7	94.3	113.9	106.5	119.0
18	2013	106.8	98.0	100.0	96.7	121.6	109.0	132.2
19	2014	108.6	99.3	102.8	98.6	131.2	110.6	140.9
20	2015	108.7	100.0	103.6	100.0	151.5	111.0	147.9
21	2016	110.1	101.0	103.5	102.0	162.2	112.1	155.2
22	2017	112.4	102.9	104.0	103.6	168.2	114.9	160.3
23	2018	115.2	104.5	105.0	105.8	173.0	117.6	166.7
24	2019	117.2	104.9	105.5	108.8	180.8	119.6	172.9
25	2020	118.7	105.4	105.5	111.5	186.9	120.8	184.3

In [404]:

df_GDP2020=df_GDP.iloc[1:-1]
df_GDP2020=df_GDP2020.reset_index(drop=True)

In [405]:

df_GDP2020

Out[405]:

	시점	미국	대한민국	일본	중국	러시아	영국	인도
0	1996	3.8	7.9	3.1	9.9	-3.8	2.4	7.5
1	1997	4.4	6.2	1.0	9.2	1.4	4.9	4.0
2	1998	4.5	-5.1	-1.3	7.8	-5.3	3.2	6.2
3	1999	4.8	11.5	-0.3	7.7	6.4	3.0	8.8
4	2000	4.1	9.1	2.8	8.5	10.0	3.7	3.8
5	2001	1.0	4.9	0.4	8.3	5.1	2.1	4.8
6	2002	1.7	7.7	0.0	9.1	4.7	2.1	3.8
7	2003	2.8	3.1	1.5	10.0	7.3	3.0	7.9
8	2004	3.9	5.2	2.2	10.1	7.2	2.4	7.9
9	2005	3.5	4.3	1.8	11.4	6.4	2.6	7.9
10	2006	2.8	5.3	1.4	12.7	8.2	2.6	8.1
11	2007	2.0	5.8	1.5	14.2	8.5	2.3	7.7
12	2008	0.1	3.0	-1.2	9.7	5.2	-0.2	3.1
13	2009	-2.6	0.8	-5.7	9.4	-7.8	-4.2	7.9
14	2010	2.7	6.8	4.1	10.6	4.5	2.1	8.5
15	2011	1.5	3.7	0.0	9.6	4.3	1.5	5.2
16	2012	2.3	2.4	1.4	7.9	4.0	1.5	5.5
17	2013	1.8	3.2	2.0	7.8	1.8	1.9	6.4
18	2014	2.3	3.2	0.3	7.4	0.7	3.0	7.4
19	2015	2.7	2.8	1.6	7.0	-2.0	2.6	8.0
20	2016	1.7	2.9	0.8	6.8	0.2	2.3	8.3
21	2017	2.3	3.2	1.7	6.9	1.8	2.1	6.8
22	2018	2.9	2.9	0.6	6.7	2.8	1.7	6.5
23	2019	2.3	2.2	-0.2	6.0	2.2	1.7	3.7
24	2020	-3.4	-0.7	-4.5	2.2	-2.7	-9.3	-6.6

In [409]:

sobi.columns = ['상승률']
sobi

Out[409]:

	상승률
0	1.049270
1	1.045217
2	1.074875
3	1.007740
4	1.023041
5	1.040541
6	1.027417
7	1.035112
8	1.035278
9	1.027523
10	1.022959
11	1.024938
12	1.047445
13	1.027875
14	1.029379
15	1.039517
16	1.022175
17	1.012397
18	1.013265
19	1.007049
20	1.010000
21	1.018812
22	1.015549
23	1.003828
24	1.004766

In [410]:

result3 = pd.concat([df_GDP2020,sobi],axis=1)
result3

Out[410]:

	시점	미국	대한민국	일본	중국	러시아	영국	인도	상승률
0	1996	3.8	7.9	3.1	9.9	-3.8	2.4	7.5	1.049270
1	1997	4.4	6.2	1.0	9.2	1.4	4.9	4.0	1.045217
2	1998	4.5	-5.1	-1.3	7.8	-5.3	3.2	6.2	1.074875
3	1999	4.8	11.5	-0.3	7.7	6.4	3.0	8.8	1.007740
4	2000	4.1	9.1	2.8	8.5	10.0	3.7	3.8	1.023041
5	2001	1.0	4.9	0.4	8.3	5.1	2.1	4.8	1.040541
6	2002	1.7	7.7	0.0	9.1	4.7	2.1	3.8	1.027417
7	2003	2.8	3.1	1.5	10.0	7.3	3.0	7.9	1.035112
8	2004	3.9	5.2	2.2	10.1	7.2	2.4	7.9	1.035278
9	2005	3.5	4.3	1.8	11.4	6.4	2.6	7.9	1.027523
10	2006	2.8	5.3	1.4	12.7	8.2	2.6	8.1	1.022959
11	2007	2.0	5.8	1.5	14.2	8.5	2.3	7.7	1.024938
12	2008	0.1	3.0	-1.2	9.7	5.2	-0.2	3.1	1.047445
13	2009	-2.6	0.8	-5.7	9.4	-7.8	-4.2	7.9	1.027875
14	2010	2.7	6.8	4.1	10.6	4.5	2.1	8.5	1.029379
15	2011	1.5	3.7	0.0	9.6	4.3	1.5	5.2	1.039517
16	2012	2.3	2.4	1.4	7.9	4.0	1.5	5.5	1.022175
17	2013	1.8	3.2	2.0	7.8	1.8	1.9	6.4	1.012397
18	2014	2.3	3.2	0.3	7.4	0.7	3.0	7.4	1.013265
19	2015	2.7	2.8	1.6	7.0	-2.0	2.6	8.0	1.007049
20	2016	1.7	2.9	0.8	6.8	0.2	2.3	8.3	1.010000
21	2017	2.3	3.2	1.7	6.9	1.8	2.1	6.8	1.018812
22	2018	2.9	2.9	0.6	6.7	2.8	1.7	6.5	1.015549
23	2019	2.3	2.2	-0.2	6.0	2.2	1.7	3.7	1.003828
24	2020	-3.4	-0.7	-4.5	2.2	-2.7	-9.3	-6.6	1.004766

In [426]:

plt.figure(figsize = (10, 5), dpi = 300)#
plt.rc('font', family='Malgun Gothic')



plt.title('한국 GDP 와 물가상승률 비교')
plt.subplot(2,1,1)
plt.plot(result3['시점'], result3['대한민국'], label = 'GDP ')
plt.xlabel('한국 GDP ')
plt.legend()

plt.subplots_adjust(left=0.125, bottom=0.1,  right=0.9, top=0.9, wspace=0.2, hspace=0.35)
plt.subplot(2,1,2)
plt.plot(result3['시점'], result3['상승률'], label = '소비자물가')
plt.xlabel('소비자물가 상승율')
plt.legend()


plt.savefig('한미 GDP 상승률.png', facecolor='#eeeeee')
plt.show()

In [ ]:

plt.subplot(3,1,2)
plt.plot(df20['시점'], df20['남자'], label = '남자 취업인구')
plt.plot(df20['시점'], df20['여자'], label = '여자 취업인구')
plt.xlabel('년도별 성별 20대 취업자수')
plt.legend()

In [400]:

sobi=pd.DataFrame(kor_price_growth)

In [392]:

#년도별 한중일 소비자 물가지수 상승률
import csv
f = open('sss.csv')
data = csv.reader(f)
max_date = []

kor =[]
cha =[]
jpn =[]

kor_price_growth=[]
cha_price_growth=[]
jpn_price_growth=[]

name1 = '대한민국'
name2 = '중국'
name3 = '일본'
name4 = '국가별'

for row in data:
    if name1 in row[0]:
        for i in row[1:27:]:
            kor.append(float(i))   
    elif name2 in row[0]:
        for i in row[1:27:]:
            cha.append(float(i))
    elif name3 in row[0]:
        for i in row[1:27:]:
            jpn.append(float(i))
            
    elif name4 in row[0]:
        for i in row[2:27:]:
            max_date.append(int(i))       

for i in range(0,25):
    kor_price_growth.append(kor[i+1]/kor[i])
    cha_price_growth.append(cha[i+1]/cha[i])
    jpn_price_growth.append(jpn[i+1]/jpn[i])
(kor_price_growth

[1.0492700729927007, 1.0452173913043479, 1.0748752079866888, 1.0077399380804954, 1.023041474654378, 1.0405405405405406, 1.0274170274170276, 1.0351123595505618, 1.0352781546811396, 1.0275229357798166, 1.0229591836734693, 1.0249376558603491, 1.0474452554744524, 1.0278745644599303, 1.0293785310734462, 1.039517014270033, 1.0221752903907075, 1.012396694214876, 1.013265306122449, 1.0070493454179255, 1.01, 1.0188118811881188, 1.0155490767735664, 1.0038277511961724, 1.0047664442326025]

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [392], in <cell line: 42>()
     40 import matplotlib.pyplot as plt
     41 plt.figure(figsize = (10, 3), dpi = 300)
---> 42 plt.plot(max_date, kor_price_growth, label = 'kor_price_growth')
     43 plt.plot(max_date, cha_price_growth, label = 'cha_price_growth')
     44 plt.plot(max_date, jpn_price_growth, label = 'jpn_price_growth')

File ~\anaconda3\lib\site-packages\matplotlib\pyplot.py:2757, in plot(scalex, scaley, data, *args, **kwargs)
   2755 @_copy_docstring_and_deprecators(Axes.plot)
   2756 def plot(*args, scalex=True, scaley=True, data=None, **kwargs):
-> 2757     return gca().plot(
   2758         *args, scalex=scalex, scaley=scaley,
   2759         **({"data": data} if data is not None else {}), **kwargs)

File ~\anaconda3\lib\site-packages\matplotlib\axes\_axes.py:1632, in Axes.plot(self, scalex, scaley, data, *args, **kwargs)
   1390 """
   1391 Plot y versus x as lines and/or markers.
   1392 
   (...)
   1629 (``'green'``) or hex strings (``'#008000'``).
   1630 """
   1631 kwargs = cbook.normalize_kwargs(kwargs, mlines.Line2D)
-> 1632 lines = [*self._get_lines(*args, data=data, **kwargs)]
   1633 for line in lines:
   1634     self.add_line(line)

File ~\anaconda3\lib\site-packages\matplotlib\axes\_base.py:312, in _process_plot_var_args.__call__(self, data, *args, **kwargs)
    310     this += args[0],
    311     args = args[1:]
--> 312 yield from self._plot_args(this, kwargs)

File ~\anaconda3\lib\site-packages\matplotlib\axes\_base.py:498, in _process_plot_var_args._plot_args(self, tup, kwargs, return_kwargs)
    495     self.axes.yaxis.update_units(y)
    497 if x.shape[0] != y.shape[0]:
--> 498     raise ValueError(f"x and y must have same first dimension, but "
    499                      f"have shapes {x.shape} and {y.shape}")
    500 if x.ndim > 2 or y.ndim > 2:
    501     raise ValueError(f"x and y can be no greater than 2D, but have "
    502                      f"shapes {x.shape} and {y.shape}")

ValueError: x and y must have same first dimension, but have shapes (0,) and (25,)

문자열 함수 이용해보기 (0)	2022.10.24
크롤링에서 정규화를 하는 이유(마음대로 문자 변경) (0)	2022.10.24

with_open_형준

간단한 데이터 전처리 특정 행 조회 등등

20대 청년 취직율¶

20대 취업인구 총계 남 여 별 전처리¶

2010년이후에 데이터만 저장¶

20대 취업인구 시각화¶

20대 내국인 인구수전처리¶

지역구컬럼 삭제¶

20대 만 추력¶

20_24, 25-29인 행을 합해서 출력¶

취업 인구 데이터¶

불필요한 나이대 컬럼은 삭제¶

df2015는 단위가 천단위기 떄문에 곱하기 1000해서 맞춰줌¶

인구와 취업인구 시각화¶

주요국가 GDP¶

세계 국가들 중 주요 국가 데이터만 출력 저장¶

주요국가 시각화¶

한중일 GDP 차이 시각화¶

한미 GDP그래프¶

'파이썬 기본 > 정규화(문자열)' 카테고리의 다른 글

티스토리툴바

간단한 데이터 전처리 특정 행 조회 등등

20대 청년 취직율¶

20대 취업인구 총계 남 여 별 전처리¶

2010년이후에 데이터만 저장¶

20대 취업인구 시각화¶

20대 내국인 인구수전처리¶

지역구컬럼 삭제¶

20대 만 추력¶

20_24, 25-29인 행을 합해서 출력¶

취업 인구 데이터¶

불필요한 나이대 컬럼은 삭제¶

df2015는 단위가 천단위기 떄문에 곱하기 1000해서 맞춰줌¶

인구와 취업인구 시각화¶

주요국가 GDP¶

세계 국가들 중 주요 국가 데이터만 출력 저장¶

주요국가 시각화¶

한중일 GDP 차이 시각화¶

한미 GDP그래프¶

'파이썬 기본 > 정규화(문자열)' 카테고리의 다른 글

'파이썬 기본/정규화(문자열)' Related Articles

티스토리툴바