Time Series Data Analysis¶
ⅰ. Importing Modules & Data Skimming¶
In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
In [2]:
pd.options.display.max_columns=30
df = pd.read_excel("Data/Superstore.xls", index_col=0)
df.head()
Out[2]:
Order ID | Order Date | Ship Date | Ship Mode | Customer ID | Customer Name | Segment | Country | City | State | Postal Code | Region | Product ID | Category | Sub-Category | Product Name | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Row ID | ||||||||||||||||||||
1 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-BO-10001798 | Furniture | Bookcases | Bush Somerset Collection Bookcase | 261.9600 | 2 | 0.00 | 41.9136 |
2 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-CH-10000454 | Furniture | Chairs | Hon Deluxe Fabric Upholstered Stacking Chairs,... | 731.9400 | 3 | 0.00 | 219.5820 |
3 | CA-2016-138688 | 2016-06-12 | 2016-06-16 | Second Class | DV-13045 | Darrin Van Huff | Corporate | United States | Los Angeles | California | 90036 | West | OFF-LA-10000240 | Office Supplies | Labels | Self-Adhesive Address Labels for Typewriters b... | 14.6200 | 2 | 0.00 | 6.8714 |
4 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | FUR-TA-10000577 | Furniture | Tables | Bretford CR4500 Series Slim Rectangular Table | 957.5775 | 5 | 0.45 | -383.0310 |
5 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | OFF-ST-10000760 | Office Supplies | Storage | Eldon Fold 'N Roll Cart System | 22.3680 | 2 | 0.20 | 2.5164 |
In [3]:
df.describe()
Out[3]:
Postal Code | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|
count | 9994.000000 | 9994.000000 | 9994.000000 | 9994.000000 | 9994.000000 |
mean | 55190.379428 | 229.858001 | 3.789574 | 0.156203 | 28.656896 |
std | 32063.693350 | 623.245101 | 2.225110 | 0.206452 | 234.260108 |
min | 1040.000000 | 0.444000 | 1.000000 | 0.000000 | -6599.978000 |
25% | 23223.000000 | 17.280000 | 2.000000 | 0.000000 | 1.728750 |
50% | 56430.500000 | 54.490000 | 3.000000 | 0.200000 | 8.666500 |
75% | 90008.000000 | 209.940000 | 5.000000 | 0.200000 | 29.364000 |
max | 99301.000000 | 22638.480000 | 14.000000 | 0.800000 | 8399.976000 |
ⅱ. Data Filtering¶
In [4]:
df['Category'].value_counts()
Out[4]:
Office Supplies 6026 Furniture 2121 Technology 1847 Name: Category, dtype: int64
In [5]:
df_furniture =df[df['Category'] == 'Furniture']
df_furniture.head()
Out[5]:
Order ID | Order Date | Ship Date | Ship Mode | Customer ID | Customer Name | Segment | Country | City | State | Postal Code | Region | Product ID | Category | Sub-Category | Product Name | Sales | Quantity | Discount | Profit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Row ID | ||||||||||||||||||||
1 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-BO-10001798 | Furniture | Bookcases | Bush Somerset Collection Bookcase | 261.9600 | 2 | 0.00 | 41.9136 |
2 | CA-2016-152156 | 2016-11-08 | 2016-11-11 | Second Class | CG-12520 | Claire Gute | Consumer | United States | Henderson | Kentucky | 42420 | South | FUR-CH-10000454 | Furniture | Chairs | Hon Deluxe Fabric Upholstered Stacking Chairs,... | 731.9400 | 3 | 0.00 | 219.5820 |
4 | US-2015-108966 | 2015-10-11 | 2015-10-18 | Standard Class | SO-20335 | Sean O'Donnell | Consumer | United States | Fort Lauderdale | Florida | 33311 | South | FUR-TA-10000577 | Furniture | Tables | Bretford CR4500 Series Slim Rectangular Table | 957.5775 | 5 | 0.45 | -383.0310 |
6 | CA-2014-115812 | 2014-06-09 | 2014-06-14 | Standard Class | BH-11710 | Brosina Hoffman | Consumer | United States | Los Angeles | California | 90032 | West | FUR-FU-10001487 | Furniture | Furnishings | Eldon Expressions Wood and Plastic Desk Access... | 48.8600 | 7 | 0.00 | 14.1694 |
11 | CA-2014-115812 | 2014-06-09 | 2014-06-14 | Standard Class | BH-11710 | Brosina Hoffman | Consumer | United States | Los Angeles | California | 90032 | West | FUR-TA-10001539 | Furniture | Tables | Chromcraft Rectangular Conference Tables | 1706.1840 | 9 | 0.20 | 85.3092 |
In [6]:
df_furniture = df_furniture[['Order Date', 'Sales']].sort_values('Order Date')
df_furniture
Out[6]:
Order Date | Sales | |
---|---|---|
Row ID | ||
7475 | 2014-01-06 | 2573.820 |
7661 | 2014-01-07 | 76.728 |
867 | 2014-01-10 | 51.940 |
717 | 2014-01-11 | 9.940 |
2979 | 2014-01-13 | 545.940 |
... | ... | ... |
7637 | 2017-12-29 | 258.750 |
7633 | 2017-12-29 | 1207.840 |
5132 | 2017-12-29 | 101.120 |
7636 | 2017-12-29 | 300.980 |
907 | 2017-12-30 | 323.136 |
2121 rows × 2 columns
In [7]:
df_furniture = df_furniture.groupby('Order Date').sum()
df_furniture
Out[7]:
Sales | |
---|---|
Order Date | |
2014-01-06 | 2573.8200 |
2014-01-07 | 76.7280 |
2014-01-10 | 51.9400 |
2014-01-11 | 9.9400 |
2014-01-13 | 879.9390 |
... | ... |
2017-12-24 | 1393.4940 |
2017-12-25 | 832.4540 |
2017-12-28 | 551.2568 |
2017-12-29 | 2330.7180 |
2017-12-30 | 323.1360 |
889 rows × 1 columns
In [8]:
plt.figure(figsize=(20, 10))
sns.lineplot(x=df_furniture.index, y=df_furniture['Sales'], marker='o')
Out[8]:
<AxesSubplot:xlabel='Order Date', ylabel='Sales'>
일 단위로는 매출이 너무 들쭉날쭉해서 의미있는 시계열 데이터 분석이 어렵다 → 월 단위로 데이터 묶어본 다음 다시 시각화
In [9]:
# 월별 평균 매출액
df_furniture_month = df_furniture.resample('MS').mean()
df_furniture_month.head()
Out[9]:
Sales | |
---|---|
Order Date | |
2014-01-01 | 480.194231 |
2014-02-01 | 367.931600 |
2014-03-01 | 857.291529 |
2014-04-01 | 567.488357 |
2014-05-01 | 432.049188 |
In [10]:
df_furniture_month.plot(figsize=(20, 10))
Out[10]:
<AxesSubplot:xlabel='Order Date'>
ⅲ. Modeling by FBProphet Lib¶
In [11]:
from fbprophet import Prophet
# prophet을 사용하기 위한 DataFrame 사전작업
# 시계열변수 col명을 ds 로 설정
# 종속변수 col명을 y 로 설정
df = df_furniture_month.reset_index()
df.columns = ['ds', 'y']
df.head()
Out[11]:
ds | y | |
---|---|---|
0 | 2014-01-01 | 480.194231 |
1 | 2014-02-01 | 367.931600 |
2 | 2014-03-01 | 857.291529 |
3 | 2014-04-01 | 567.488357 |
4 | 2014-05-01 | 432.049188 |
In [12]:
model = Prophet()
model.fit(df)
INFO:numexpr.utils:NumExpr defaulting to 8 threads. INFO:fbprophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Out[12]:
<fbprophet.forecaster.Prophet at 0x25a22439df0>
In [13]:
# 시계열 데이터를 예측할 범위 데이터 프레임
future = model.make_future_dataframe(periods = 12, freq = 'MS')
pred = model.predict(future)
pred.head()
Out[13]:
ds | trend | yhat_lower | yhat_upper | trend_lower | trend_upper | additive_terms | additive_terms_lower | additive_terms_upper | yearly | yearly_lower | yearly_upper | multiplicative_terms | multiplicative_terms_lower | multiplicative_terms_upper | yhat | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2014-01-01 | 726.057713 | 378.598521 | 690.510121 | 726.057713 | 726.057713 | -190.685662 | -190.685662 | -190.685662 | -190.685662 | -190.685662 | -190.685662 | 0.0 | 0.0 | 0.0 | 535.372051 |
1 | 2014-02-01 | 727.494023 | 290.790180 | 599.278408 | 727.494023 | 727.494023 | -276.377703 | -276.377703 | -276.377703 | -276.377703 | -276.377703 | -276.377703 | 0.0 | 0.0 | 0.0 | 451.116320 |
2 | 2014-03-01 | 728.791335 | 544.490813 | 854.041798 | 728.791335 | 728.791335 | -22.389755 | -22.389755 | -22.389755 | -22.389755 | -22.389755 | -22.389755 | 0.0 | 0.0 | 0.0 | 706.401580 |
3 | 2014-04-01 | 730.227645 | 474.123664 | 786.956816 | 730.227645 | 730.227645 | -100.141158 | -100.141158 | -100.141158 | -100.141158 | -100.141158 | -100.141158 | 0.0 | 0.0 | 0.0 | 630.086487 |
4 | 2014-05-01 | 731.617622 | 405.451940 | 719.629716 | 731.617622 | 731.617622 | -160.815662 | -160.815662 | -160.815662 | -160.815662 | -160.815662 | -160.815662 | 0.0 | 0.0 | 0.0 | 570.801960 |
In [14]:
pred = pred[['ds', 'yhat', 'yhat_lower', 'yhat_upper']]
pred.head()
Out[14]:
ds | yhat | yhat_lower | yhat_upper | |
---|---|---|---|---|
0 | 2014-01-01 | 535.372051 | 378.598521 | 690.510121 |
1 | 2014-02-01 | 451.116320 | 290.790180 | 599.278408 |
2 | 2014-03-01 | 706.401580 | 544.490813 | 854.041798 |
3 | 2014-04-01 | 630.086487 | 474.123664 | 786.956816 |
4 | 2014-05-01 | 570.801960 | 405.451940 | 719.629716 |
In [15]:
model.plot(pred)
plt.show()
ⅳ. Validating the Predicted Values from Model¶
In [16]:
df_new = df_furniture_month[:'2016']
df_new.tail()
Out[16]:
Sales | |
---|---|
Order Date | |
2016-08-01 | 734.307782 |
2016-09-01 | 1135.953371 |
2016-10-01 | 624.872474 |
2016-11-01 | 1271.345152 |
2016-12-01 | 1410.719808 |
In [17]:
df_new = df_new.reset_index()
df_new.columns = ['ds', 'y']
df_new.head()
Out[17]:
ds | y | |
---|---|---|
0 | 2014-01-01 | 480.194231 |
1 | 2014-02-01 | 367.931600 |
2 | 2014-03-01 | 857.291529 |
3 | 2014-04-01 | 567.488357 |
4 | 2014-05-01 | 432.049188 |
In [18]:
model = Prophet()
model.fit(df_new)
INFO:fbprophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
Out[18]:
<fbprophet.forecaster.Prophet at 0x25a2462da60>
In [19]:
future = model.make_future_dataframe(periods = 12, freq = 'MS')
pred = model.predict(future)
pred = pred[['ds', 'yhat']].set_index('ds')
pred = pred['2017':]
pred
Out[19]:
yhat | |
---|---|
ds | |
2017-01-01 | 804.654438 |
2017-02-01 | 601.206518 |
2017-03-01 | 782.905069 |
2017-04-01 | 754.585297 |
2017-05-01 | 685.352803 |
2017-06-01 | 805.507654 |
2017-07-01 | 792.293714 |
2017-08-01 | 714.448420 |
2017-09-01 | 1208.615981 |
2017-10-01 | 747.521868 |
2017-11-01 | 1271.011086 |
2017-12-01 | 1473.348777 |
In [20]:
ax = df_furniture_month.plot()
pred.plot(ax=ax, figsize=(20, 10))
Out[20]:
<AxesSubplot:xlabel='ds'>
Prophet 모델을 통해 얻은 결과값과 실제값을 비교한 plot
In [21]:
model = Prophet()
model.fit(df_new)
future = model.make_future_dataframe(periods = 12, freq = 'MS')
pred = model.predict(future)
INFO:fbprophet:Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. INFO:fbprophet:Disabling daily seasonality. Run prophet with daily_seasonality=True to override this.
In [22]:
model.plot_components(pred)
plt.show()
'데이터 분석 > Proj. E-Commerce' 카테고리의 다른 글
E-Commerce Part Ⅷ: 자연어분석(NLP) 응용 (0) | 2021.01.18 |
---|---|
E-Commerce Part Ⅵ: K Means Clustering 응용 (0) | 2021.01.16 |
E-Commerce Part Ⅴ: Random Forest 모델 응용 (0) | 2021.01.16 |
E-Commerce Part Ⅳ: Decision Tree 모델 응용 (0) | 2021.01.16 |
E-Commerce Part Ⅲ: KNN 모델 응용 (0) | 2021.01.15 |
E-Commerce Part Ⅱ: 로지스틱회귀분석 응용 (0) | 2021.01.15 |
E-Commerce Part Ⅰ: 선형회귀분석 응용 (0) | 2021.01.12 |