doubt.datasets.yacht

Yacht data set.

This data set is from the UCI data set archive, with the description being the original description verbatim. Some feature names may have been altered, based on the description.

  1"""Yacht data set.
  2
  3This data set is from the UCI data set archive, with the description being the original
  4description verbatim. Some feature names may have been altered, based on the
  5description.
  6"""
  7
  8import io
  9
 10import pandas as pd
 11
 12from .dataset import BASE_DATASET_DESCRIPTION, BaseDataset
 13
 14
 15class Yacht(BaseDataset):
 16    __doc__ = f"""
 17    Prediction of residuary resistance of sailing yachts at the initial design stage is
 18    of a great value for evaluating the ship's performance and for estimating the
 19    required propulsive power. Essential inputs include the basic hull dimensions and
 20    the boat velocity.
 21
 22    The Delft data set comprises 251 full-scale experiments, which were performed at
 23    the Delft Ship Hydromechanics Laboratory for that purpose.
 24
 25    These experiments include 22 different hull forms, derived from a parent form
 26    closely related to the "Standfast 43" designed by Frans Maas.
 27
 28    {BASE_DATASET_DESCRIPTION}
 29
 30    Features:
 31        pos (float):
 32            Longitudinal position of the center of buoyancy, adimensional
 33        prismatic (float):
 34            Prismatic coefficient, adimensional
 35        displacement (float):
 36            Length-displacement ratio, adimensional
 37        beam_draught (float):
 38            Beam-draught ratio, adimensional
 39        length_beam (float):
 40            Length-beam ratio, adimensional
 41        froude_no (float):
 42            Froude number, adimensional
 43
 44    Targets:
 45        resistance (float):
 46            Residuary resistance per unit weight of displacement, adimensional
 47
 48    Source:
 49        https://archive.ics.uci.edu/ml/datasets/Yacht+Hydrodynamics
 50
 51    Examples:
 52        Load in the data set::
 53
 54            >>> dataset = Yacht()
 55            >>> dataset.shape
 56            (251, 7)
 57
 58        Split the data set into features and targets, as NumPy arrays::
 59
 60            >>> X, y = dataset.split()
 61            >>> X.shape, y.shape
 62            ((251, 6), (251,))
 63
 64        Perform a train/test split, also outputting NumPy arrays::
 65
 66            >>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
 67            >>> X_train, X_test, y_train, y_test = train_test_split
 68            >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
 69            ((196, 6), (196,), (55, 6), (55,))
 70
 71        Output the underlying Pandas DataFrame::
 72
 73            >>> df = dataset.to_pandas()
 74            >>> type(df)
 75            <class 'pandas.core.frame.DataFrame'>
 76    """
 77
 78    _url = (
 79        "https://archive.ics.uci.edu/ml/machine-learning-databases/"
 80        "00243/yacht_hydrodynamics.data"
 81    )
 82
 83    _features = range(6)
 84    _targets = [6]
 85
 86    def _prep_data(self, data: bytes) -> pd.DataFrame:
 87        """Prepare the data set.
 88
 89        Args:
 90            data (bytes): The raw data
 91
 92        Returns:
 93            Pandas dataframe: The prepared data
 94        """
 95        # Convert the bytes into a file-like object
 96        txt_file = io.BytesIO(data)
 97
 98        # Load it into dataframe
 99        cols = [
100            "pos",
101            "prismatic",
102            "displacement",
103            "beam_draught",
104            "length_beam",
105            "froude_no",
106            "resistance",
107        ]
108        df = pd.read_csv(
109            txt_file,
110            header=None,
111            sep=" ",
112            names=cols,
113            on_bad_lines="skip",
114        )
115        return df
class Yacht(doubt.datasets.dataset.BaseDataset):
 16class Yacht(BaseDataset):
 17    __doc__ = f"""
 18    Prediction of residuary resistance of sailing yachts at the initial design stage is
 19    of a great value for evaluating the ship's performance and for estimating the
 20    required propulsive power. Essential inputs include the basic hull dimensions and
 21    the boat velocity.
 22
 23    The Delft data set comprises 251 full-scale experiments, which were performed at
 24    the Delft Ship Hydromechanics Laboratory for that purpose.
 25
 26    These experiments include 22 different hull forms, derived from a parent form
 27    closely related to the "Standfast 43" designed by Frans Maas.
 28
 29    {BASE_DATASET_DESCRIPTION}
 30
 31    Features:
 32        pos (float):
 33            Longitudinal position of the center of buoyancy, adimensional
 34        prismatic (float):
 35            Prismatic coefficient, adimensional
 36        displacement (float):
 37            Length-displacement ratio, adimensional
 38        beam_draught (float):
 39            Beam-draught ratio, adimensional
 40        length_beam (float):
 41            Length-beam ratio, adimensional
 42        froude_no (float):
 43            Froude number, adimensional
 44
 45    Targets:
 46        resistance (float):
 47            Residuary resistance per unit weight of displacement, adimensional
 48
 49    Source:
 50        https://archive.ics.uci.edu/ml/datasets/Yacht+Hydrodynamics
 51
 52    Examples:
 53        Load in the data set::
 54
 55            >>> dataset = Yacht()
 56            >>> dataset.shape
 57            (251, 7)
 58
 59        Split the data set into features and targets, as NumPy arrays::
 60
 61            >>> X, y = dataset.split()
 62            >>> X.shape, y.shape
 63            ((251, 6), (251,))
 64
 65        Perform a train/test split, also outputting NumPy arrays::
 66
 67            >>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
 68            >>> X_train, X_test, y_train, y_test = train_test_split
 69            >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
 70            ((196, 6), (196,), (55, 6), (55,))
 71
 72        Output the underlying Pandas DataFrame::
 73
 74            >>> df = dataset.to_pandas()
 75            >>> type(df)
 76            <class 'pandas.core.frame.DataFrame'>
 77    """
 78
 79    _url = (
 80        "https://archive.ics.uci.edu/ml/machine-learning-databases/"
 81        "00243/yacht_hydrodynamics.data"
 82    )
 83
 84    _features = range(6)
 85    _targets = [6]
 86
 87    def _prep_data(self, data: bytes) -> pd.DataFrame:
 88        """Prepare the data set.
 89
 90        Args:
 91            data (bytes): The raw data
 92
 93        Returns:
 94            Pandas dataframe: The prepared data
 95        """
 96        # Convert the bytes into a file-like object
 97        txt_file = io.BytesIO(data)
 98
 99        # Load it into dataframe
100        cols = [
101            "pos",
102            "prismatic",
103            "displacement",
104            "beam_draught",
105            "length_beam",
106            "froude_no",
107            "resistance",
108        ]
109        df = pd.read_csv(
110            txt_file,
111            header=None,
112            sep=" ",
113            names=cols,
114            on_bad_lines="skip",
115        )
116        return df

Prediction of residuary resistance of sailing yachts at the initial design stage is of a great value for evaluating the ship's performance and for estimating the required propulsive power. Essential inputs include the basic hull dimensions and the boat velocity.

The Delft data set comprises 251 full-scale experiments, which were performed at the Delft Ship Hydromechanics Laboratory for that purpose.

These experiments include 22 different hull forms, derived from a parent form closely related to the "Standfast 43" designed by Frans Maas.

Arguments:
  • cache (str or None, optional): The name of the cache. It will be saved to cache in the current working directory. If None then no cache will be saved. Defaults to '.dataset_cache'.
Attributes:
  • cache (str or None): The name of the cache.
  • shape (tuple of integers): Dimensions of the data set
  • columns (list of strings): List of column names in the data set
Features:

pos (float): Longitudinal position of the center of buoyancy, adimensional prismatic (float): Prismatic coefficient, adimensional displacement (float): Length-displacement ratio, adimensional beam_draught (float): Beam-draught ratio, adimensional length_beam (float): Length-beam ratio, adimensional froude_no (float): Froude number, adimensional

Targets:

resistance (float): Residuary resistance per unit weight of displacement, adimensional

Source:

https://archive.ics.uci.edu/ml/datasets/Yacht+Hydrodynamics

Examples:

Load in the data set::

>>> dataset = Yacht()
>>> dataset.shape
(251, 7)

Split the data set into features and targets, as NumPy arrays::

>>> X, y = dataset.split()
>>> X.shape, y.shape
((251, 6), (251,))

Perform a train/test split, also outputting NumPy arrays::

>>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
>>> X_train, X_test, y_train, y_test = train_test_split
>>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
((196, 6), (196,), (55, 6), (55,))

Output the underlying Pandas DataFrame::

>>> df = dataset.to_pandas()
>>> type(df)
<class 'pandas.core.frame.DataFrame'>