doubt.datasets.stocks

Stocks data set.

This data set is from the UCI data set archive, with the description being the original description verbatim. Some feature names may have been altered, based on the description.

  1"""Stocks data set.
  2
  3This data set is from the UCI data set archive, with the description being the original
  4description verbatim. Some feature names may have been altered, based on the
  5description.
  6"""
  7
  8import io
  9
 10import pandas as pd
 11
 12from .dataset import BASE_DATASET_DESCRIPTION, BaseDataset
 13
 14
 15class Stocks(BaseDataset):
 16    __doc__ = f"""
 17    There are three disadvantages of weighted scoring stock selection models. First,
 18    they cannot identify the relations between weights of stock-picking concepts and
 19    performances of portfolios. Second, they cannot systematically discover the optimal
 20    combination for weights of concepts to optimize the performances. Third, they are
 21    unable to meet various investors' preferences.
 22
 23    This study aims to more efficiently construct weighted scoring stock selection
 24    models to overcome these disadvantages. Since the weights of stock-picking concepts
 25    in a weighted scoring stock selection model can be regarded as components in a
 26    mixture, we used the simplex centroid mixture design to obtain the experimental
 27    sets of weights. These sets of weights are simulated with US stock market
 28    historical data to obtain their performances. Performance prediction models were
 29    built with the simulated performance data set and artificial neural networks.
 30
 31    Furthermore, the optimization models to reflect investors' preferences were built
 32    up, and the performance prediction models were employed as the kernel of the
 33    optimization models so that the optimal solutions can now be solved with
 34    optimization techniques. The empirical values of the performances of the optimal
 35    weighting combinations generated by the optimization models showed that they can
 36    meet various investors' preferences and outperform those of S&P's 500 not only
 37    during the training period but also during the testing period.
 38
 39    {BASE_DATASET_DESCRIPTION}
 40
 41    Features:
 42        bp (float):
 43            Large B/P
 44        roe (float):
 45            Large ROE
 46        sp (float):
 47            Large S/P
 48        return_rate (float):
 49            Large return rate in the last quarter
 50        market_value (float):
 51            Large market value
 52        small_risk (float):
 53            Small systematic risk
 54        orig_annual_return (float):
 55            Annual return
 56        orig_excess_return (float):
 57            Excess return
 58        orig_risk (float):
 59            Systematic risk
 60        orig_total_risk (float):
 61            Total risk
 62        orig_abs_win_rate (float):
 63            Absolute win rate
 64        orig_rel_win_rate (float):
 65            Relative win rate
 66
 67    Targets:
 68        annual_return (float):
 69            Annual return
 70        excess_return (float):
 71            Excess return
 72        risk (float):
 73            Systematic risk
 74        total_risk (float):
 75            Total risk
 76        abs_win_rate (float):
 77            Absolute win rate
 78        rel_win_rate (float):
 79            Relative win rate
 80
 81    Source:
 82        https://archive.ics.uci.edu/ml/datasets/Stock+portfolio+performance
 83
 84    Examples:
 85        Load in the data set::
 86
 87            >>> dataset = Stocks()
 88            >>> dataset.shape
 89            (252, 19)
 90
 91        Split the data set into features and targets, as NumPy arrays::
 92
 93            >>> X, y = dataset.split()
 94            >>> X.shape, y.shape
 95            ((252, 12), (252, 6))
 96
 97        Perform a train/test split, also outputting NumPy arrays::
 98
 99            >>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
100            >>> X_train, X_test, y_train, y_test = train_test_split
101            >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
102            ((197, 12), (197, 6), (55, 12), (55, 6))
103
104        Output the underlying Pandas DataFrame::
105
106            >>> df = dataset.to_pandas()
107            >>> type(df)
108            <class 'pandas.core.frame.DataFrame'>
109    """
110
111    _url = (
112        "https://archive.ics.uci.edu/ml/machine-learning-databases/"
113        "00390/stock%20portfolio%20performance%20data%20set.xlsx"
114    )
115
116    _features = range(12)
117    _targets = range(12, 18)
118
119    def _prep_data(self, data: bytes) -> pd.DataFrame:
120        """Prepare the data set.
121
122        Args:
123            data (bytes): The raw data
124
125        Returns:
126            Pandas dataframe: The prepared data
127        """
128        # Convert the bytes into a file-like object
129        xlsx_file = io.BytesIO(data)
130
131        # Load in the dataframes
132        cols = [
133            "id",
134            "bp",
135            "roe",
136            "sp",
137            "return_rate",
138            "market_value",
139            "small_risk",
140            "orig_annual_return",
141            "orig_excess_return",
142            "orig_risk",
143            "orig_total_risk",
144            "orig_abs_win_rate",
145            "orig_rel_win_rate",
146            "annual_return",
147            "excess_return",
148            "risk",
149            "total_risk",
150            "abs_win_rate",
151            "rel_win_rate",
152        ]
153        sheets = ["1st period", "2nd period", "3rd period", "4th period"]
154        dfs = pd.read_excel(
155            xlsx_file, sheet_name=sheets, names=cols, skiprows=[0, 1], header=None
156        )
157
158        # Concatenate the dataframes
159        df = pd.concat([dfs[sheet] for sheet in sheets], ignore_index=True)
160
161        return df
class Stocks(doubt.datasets.dataset.BaseDataset):
 16class Stocks(BaseDataset):
 17    __doc__ = f"""
 18    There are three disadvantages of weighted scoring stock selection models. First,
 19    they cannot identify the relations between weights of stock-picking concepts and
 20    performances of portfolios. Second, they cannot systematically discover the optimal
 21    combination for weights of concepts to optimize the performances. Third, they are
 22    unable to meet various investors' preferences.
 23
 24    This study aims to more efficiently construct weighted scoring stock selection
 25    models to overcome these disadvantages. Since the weights of stock-picking concepts
 26    in a weighted scoring stock selection model can be regarded as components in a
 27    mixture, we used the simplex centroid mixture design to obtain the experimental
 28    sets of weights. These sets of weights are simulated with US stock market
 29    historical data to obtain their performances. Performance prediction models were
 30    built with the simulated performance data set and artificial neural networks.
 31
 32    Furthermore, the optimization models to reflect investors' preferences were built
 33    up, and the performance prediction models were employed as the kernel of the
 34    optimization models so that the optimal solutions can now be solved with
 35    optimization techniques. The empirical values of the performances of the optimal
 36    weighting combinations generated by the optimization models showed that they can
 37    meet various investors' preferences and outperform those of S&P's 500 not only
 38    during the training period but also during the testing period.
 39
 40    {BASE_DATASET_DESCRIPTION}
 41
 42    Features:
 43        bp (float):
 44            Large B/P
 45        roe (float):
 46            Large ROE
 47        sp (float):
 48            Large S/P
 49        return_rate (float):
 50            Large return rate in the last quarter
 51        market_value (float):
 52            Large market value
 53        small_risk (float):
 54            Small systematic risk
 55        orig_annual_return (float):
 56            Annual return
 57        orig_excess_return (float):
 58            Excess return
 59        orig_risk (float):
 60            Systematic risk
 61        orig_total_risk (float):
 62            Total risk
 63        orig_abs_win_rate (float):
 64            Absolute win rate
 65        orig_rel_win_rate (float):
 66            Relative win rate
 67
 68    Targets:
 69        annual_return (float):
 70            Annual return
 71        excess_return (float):
 72            Excess return
 73        risk (float):
 74            Systematic risk
 75        total_risk (float):
 76            Total risk
 77        abs_win_rate (float):
 78            Absolute win rate
 79        rel_win_rate (float):
 80            Relative win rate
 81
 82    Source:
 83        https://archive.ics.uci.edu/ml/datasets/Stock+portfolio+performance
 84
 85    Examples:
 86        Load in the data set::
 87
 88            >>> dataset = Stocks()
 89            >>> dataset.shape
 90            (252, 19)
 91
 92        Split the data set into features and targets, as NumPy arrays::
 93
 94            >>> X, y = dataset.split()
 95            >>> X.shape, y.shape
 96            ((252, 12), (252, 6))
 97
 98        Perform a train/test split, also outputting NumPy arrays::
 99
100            >>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
101            >>> X_train, X_test, y_train, y_test = train_test_split
102            >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
103            ((197, 12), (197, 6), (55, 12), (55, 6))
104
105        Output the underlying Pandas DataFrame::
106
107            >>> df = dataset.to_pandas()
108            >>> type(df)
109            <class 'pandas.core.frame.DataFrame'>
110    """
111
112    _url = (
113        "https://archive.ics.uci.edu/ml/machine-learning-databases/"
114        "00390/stock%20portfolio%20performance%20data%20set.xlsx"
115    )
116
117    _features = range(12)
118    _targets = range(12, 18)
119
120    def _prep_data(self, data: bytes) -> pd.DataFrame:
121        """Prepare the data set.
122
123        Args:
124            data (bytes): The raw data
125
126        Returns:
127            Pandas dataframe: The prepared data
128        """
129        # Convert the bytes into a file-like object
130        xlsx_file = io.BytesIO(data)
131
132        # Load in the dataframes
133        cols = [
134            "id",
135            "bp",
136            "roe",
137            "sp",
138            "return_rate",
139            "market_value",
140            "small_risk",
141            "orig_annual_return",
142            "orig_excess_return",
143            "orig_risk",
144            "orig_total_risk",
145            "orig_abs_win_rate",
146            "orig_rel_win_rate",
147            "annual_return",
148            "excess_return",
149            "risk",
150            "total_risk",
151            "abs_win_rate",
152            "rel_win_rate",
153        ]
154        sheets = ["1st period", "2nd period", "3rd period", "4th period"]
155        dfs = pd.read_excel(
156            xlsx_file, sheet_name=sheets, names=cols, skiprows=[0, 1], header=None
157        )
158
159        # Concatenate the dataframes
160        df = pd.concat([dfs[sheet] for sheet in sheets], ignore_index=True)
161
162        return df

There are three disadvantages of weighted scoring stock selection models. First, they cannot identify the relations between weights of stock-picking concepts and performances of portfolios. Second, they cannot systematically discover the optimal combination for weights of concepts to optimize the performances. Third, they are unable to meet various investors' preferences.

This study aims to more efficiently construct weighted scoring stock selection models to overcome these disadvantages. Since the weights of stock-picking concepts in a weighted scoring stock selection model can be regarded as components in a mixture, we used the simplex centroid mixture design to obtain the experimental sets of weights. These sets of weights are simulated with US stock market historical data to obtain their performances. Performance prediction models were built with the simulated performance data set and artificial neural networks.

Furthermore, the optimization models to reflect investors' preferences were built up, and the performance prediction models were employed as the kernel of the optimization models so that the optimal solutions can now be solved with optimization techniques. The empirical values of the performances of the optimal weighting combinations generated by the optimization models showed that they can meet various investors' preferences and outperform those of S&P's 500 not only during the training period but also during the testing period.

Arguments:
  • cache (str or None, optional): The name of the cache. It will be saved to cache in the current working directory. If None then no cache will be saved. Defaults to '.dataset_cache'.
Attributes:
  • cache (str or None): The name of the cache.
  • shape (tuple of integers): Dimensions of the data set
  • columns (list of strings): List of column names in the data set
Features:

bp (float): Large B/P roe (float): Large ROE sp (float): Large S/P return_rate (float): Large return rate in the last quarter market_value (float): Large market value small_risk (float): Small systematic risk orig_annual_return (float): Annual return orig_excess_return (float): Excess return orig_risk (float): Systematic risk orig_total_risk (float): Total risk orig_abs_win_rate (float): Absolute win rate orig_rel_win_rate (float): Relative win rate

Targets:

annual_return (float): Annual return excess_return (float): Excess return risk (float): Systematic risk total_risk (float): Total risk abs_win_rate (float): Absolute win rate rel_win_rate (float): Relative win rate

Source:

https://archive.ics.uci.edu/ml/datasets/Stock+portfolio+performance

Examples:

Load in the data set::

>>> dataset = Stocks()
>>> dataset.shape
(252, 19)

Split the data set into features and targets, as NumPy arrays::

>>> X, y = dataset.split()
>>> X.shape, y.shape
((252, 12), (252, 6))

Perform a train/test split, also outputting NumPy arrays::

>>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
>>> X_train, X_test, y_train, y_test = train_test_split
>>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
((197, 12), (197, 6), (55, 12), (55, 6))

Output the underlying Pandas DataFrame::

>>> df = dataset.to_pandas()
>>> type(df)
<class 'pandas.core.frame.DataFrame'>