05d Моя первая модель ML

Далее необходимо создать функции, помогающие алгоритму машинного обучения. В Интернете есть некоторые стандартные функции машинного обучения для финансовых временных рядов, поэтому я последовал их совету.

В Datacamp есть несколько хороших курсов по машинному обучению, машинному обучению временных рядов и машинному обучению для финансов.

Сначала вычисляется процентное изменение. Я создал четыре новых функции: процентное изменение за 5 и 10 дней в функциях lastTradedVolume и openPrice.bid.

def calc_pct_change(df, features, resolution):
    #creates 5, 10, pct change based on all features
    #if resolution is a day, then its n days percent change
    
    N = [5, 10]
    new_features_list = []
    for n in N:
        for feature in features:
            new_col_name = feature+'_'+ str(n)+'_'+resolution+'_pct_change'
            df[new_col_name] = df[feature].pct_change(n)
            new_features_list.append(new_col_name)
    return(df)
resolution = 'Day'
features = ['lastTradedVolume', 'openPrice.bid']
prices_df= calc_pct_change(prices_df, features, resolution)

Во-вторых, это расчет индекса относительной силы, который является стандартным расчетом в финансовых кругах. Как и процентное изменение, оно рассчитывается для различных периодов, наиболее распространенными из которых являются 14, 30, 50 и 200. Я применяю это только к функции openPrice.bid.

def create_n_resolution_RSI_change(df, features,resolution):
    #creates 14, 30, 50, 200 RSI based on all input values AKA only provide the 
    #columns you want to calc RSI for
    N = [14, 30, 50, 200]
    new_features_list = []
    
    for n in N:
        for feature in features:
            new_col_name = feature+'_'+ str(n)+'_'+resolution+'_RSI'
            df[new_col_name] = talib.RSI(df[feature].values, timeperiod=n)
            #new_features_list.append(new_col_name)
    return(df)
prices_df = create_n_resolution_RSI_change(prices_df, ['openPrice.bid'], 'Day')

В-третьих, процентное изменение по сравнению со всем предыдущим временем (отличается от скользящего среднего, рассчитанного выше).

def create_pct_change_2(df):
    previous_values = df[:-1]
    last_value = df[-1]
    percent_change = (last_value - np.mean(previous_values))/np.mean(previous_values)
    return(percent_change)
features = ['lastTradedVolume', 'openPrice.bid']
for feature in features:
    new_name = feature + '_pct_change_2'
    prices_df[new_name] = prices_df[feature].rolling(window=20).aggregate(create_pct_change_2)

В-четвертых, удалить выбросы.

def replace_outliers(series):
    absolute_differences_from_mean = np.abs(series - np.mean(series))
    this_mask = absolute_differences_from_mean > (np.std(series)*3)
    series[this_mask] = np.nanmedian(series)
    return(series)
features = ['lastTradedVolume', 'openPrice.bid',
       'lastTradedVolume_5_Day_pct_change', 'openPrice.bid_5_Day_pct_change',
       'lastTradedVolume_10_Day_pct_change', 'openPrice.bid_10_Day_pct_change',
       'openPrice.bid_14_Day_RSI', 'openPrice.bid_30_Day_RSI',
       'openPrice.bid_50_Day_RSI', 'openPrice.bid_200_Day_RSI',
       'lastTradedVolume_pct_change_2', 'openPrice.bid_pct_change_2']
for feature in features:
    new_name = feature + 'no_outliers'
    prices_df[new_name] = replace_outliers(prices_df[feature])

В-пятых, необходимо создать скользящее стандартное отклонение и скользящее максимальное значение.

def calc_rolling_std_max(df):
    new_df_std = df.rolling(20).aggregate([np.std])
    new_df_max = df.rolling(20).aggregate([np.max])
    new_df_std.columns = new_df_std.columns.to_flat_index()
    new_df_max.columns = new_df_max.columns.to_flat_index()
    new_df_std = new_df_std.join(new_df_max)
return(new_df_std)
prices_df = prices_df.join(calc_rolling_std_max(prices_df[['lastTradedVolume', 'openPrice.bid']]))

В-шестых, необходимо создать данные, сдвинутые во времени, чтобы учесть эффекты запаздывания.

def make_time_shift_data(series, feature):
    shifts = [0, 1, 2, 3, 4, 5, 6, 7 ]
    many_shifts = {feature+'_lag{}'.format(ii): series.shift(ii) for ii in shifts}
    many_shifts = pd.DataFrame(many_shifts)
    return(many_shifts)
features = ['lastTradedVolume', 'openPrice.bid']
for feature in features:
    prices_df = prices_df.join(make_time_shift_data(prices_df[feature], feature))

Вот и все. Выполнена разработка функций.

05d Моя первая модель ML — Feature Engineering

Вопросы по теме