doubt.datasets.yacht
Yacht data set.
This data set is from the UCI data set archive, with the description being the original description verbatim. Some feature names may have been altered, based on the description.
1"""Yacht data set. 2 3This data set is from the UCI data set archive, with the description being the original 4description verbatim. Some feature names may have been altered, based on the 5description. 6""" 7 8import io 9 10import pandas as pd 11 12from .dataset import BASE_DATASET_DESCRIPTION, BaseDataset 13 14 15class Yacht(BaseDataset): 16 __doc__ = f""" 17 Prediction of residuary resistance of sailing yachts at the initial design stage is 18 of a great value for evaluating the ship's performance and for estimating the 19 required propulsive power. Essential inputs include the basic hull dimensions and 20 the boat velocity. 21 22 The Delft data set comprises 251 full-scale experiments, which were performed at 23 the Delft Ship Hydromechanics Laboratory for that purpose. 24 25 These experiments include 22 different hull forms, derived from a parent form 26 closely related to the "Standfast 43" designed by Frans Maas. 27 28 {BASE_DATASET_DESCRIPTION} 29 30 Features: 31 pos (float): 32 Longitudinal position of the center of buoyancy, adimensional 33 prismatic (float): 34 Prismatic coefficient, adimensional 35 displacement (float): 36 Length-displacement ratio, adimensional 37 beam_draught (float): 38 Beam-draught ratio, adimensional 39 length_beam (float): 40 Length-beam ratio, adimensional 41 froude_no (float): 42 Froude number, adimensional 43 44 Targets: 45 resistance (float): 46 Residuary resistance per unit weight of displacement, adimensional 47 48 Source: 49 https://archive.ics.uci.edu/ml/datasets/Yacht+Hydrodynamics 50 51 Examples: 52 Load in the data set:: 53 54 >>> dataset = Yacht() 55 >>> dataset.shape 56 (251, 7) 57 58 Split the data set into features and targets, as NumPy arrays:: 59 60 >>> X, y = dataset.split() 61 >>> X.shape, y.shape 62 ((251, 6), (251,)) 63 64 Perform a train/test split, also outputting NumPy arrays:: 65 66 >>> train_test_split = dataset.split(test_size=0.2, random_seed=42) 67 >>> X_train, X_test, y_train, y_test = train_test_split 68 >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape 69 ((196, 6), (196,), (55, 6), (55,)) 70 71 Output the underlying Pandas DataFrame:: 72 73 >>> df = dataset.to_pandas() 74 >>> type(df) 75 <class 'pandas.core.frame.DataFrame'> 76 """ 77 78 _url = ( 79 "https://archive.ics.uci.edu/ml/machine-learning-databases/" 80 "00243/yacht_hydrodynamics.data" 81 ) 82 83 _features = range(6) 84 _targets = [6] 85 86 def _prep_data(self, data: bytes) -> pd.DataFrame: 87 """Prepare the data set. 88 89 Args: 90 data (bytes): The raw data 91 92 Returns: 93 Pandas dataframe: The prepared data 94 """ 95 # Convert the bytes into a file-like object 96 txt_file = io.BytesIO(data) 97 98 # Load it into dataframe 99 cols = [ 100 "pos", 101 "prismatic", 102 "displacement", 103 "beam_draught", 104 "length_beam", 105 "froude_no", 106 "resistance", 107 ] 108 df = pd.read_csv( 109 txt_file, 110 header=None, 111 sep=" ", 112 names=cols, 113 on_bad_lines="skip", 114 ) 115 return df
16class Yacht(BaseDataset): 17 __doc__ = f""" 18 Prediction of residuary resistance of sailing yachts at the initial design stage is 19 of a great value for evaluating the ship's performance and for estimating the 20 required propulsive power. Essential inputs include the basic hull dimensions and 21 the boat velocity. 22 23 The Delft data set comprises 251 full-scale experiments, which were performed at 24 the Delft Ship Hydromechanics Laboratory for that purpose. 25 26 These experiments include 22 different hull forms, derived from a parent form 27 closely related to the "Standfast 43" designed by Frans Maas. 28 29 {BASE_DATASET_DESCRIPTION} 30 31 Features: 32 pos (float): 33 Longitudinal position of the center of buoyancy, adimensional 34 prismatic (float): 35 Prismatic coefficient, adimensional 36 displacement (float): 37 Length-displacement ratio, adimensional 38 beam_draught (float): 39 Beam-draught ratio, adimensional 40 length_beam (float): 41 Length-beam ratio, adimensional 42 froude_no (float): 43 Froude number, adimensional 44 45 Targets: 46 resistance (float): 47 Residuary resistance per unit weight of displacement, adimensional 48 49 Source: 50 https://archive.ics.uci.edu/ml/datasets/Yacht+Hydrodynamics 51 52 Examples: 53 Load in the data set:: 54 55 >>> dataset = Yacht() 56 >>> dataset.shape 57 (251, 7) 58 59 Split the data set into features and targets, as NumPy arrays:: 60 61 >>> X, y = dataset.split() 62 >>> X.shape, y.shape 63 ((251, 6), (251,)) 64 65 Perform a train/test split, also outputting NumPy arrays:: 66 67 >>> train_test_split = dataset.split(test_size=0.2, random_seed=42) 68 >>> X_train, X_test, y_train, y_test = train_test_split 69 >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape 70 ((196, 6), (196,), (55, 6), (55,)) 71 72 Output the underlying Pandas DataFrame:: 73 74 >>> df = dataset.to_pandas() 75 >>> type(df) 76 <class 'pandas.core.frame.DataFrame'> 77 """ 78 79 _url = ( 80 "https://archive.ics.uci.edu/ml/machine-learning-databases/" 81 "00243/yacht_hydrodynamics.data" 82 ) 83 84 _features = range(6) 85 _targets = [6] 86 87 def _prep_data(self, data: bytes) -> pd.DataFrame: 88 """Prepare the data set. 89 90 Args: 91 data (bytes): The raw data 92 93 Returns: 94 Pandas dataframe: The prepared data 95 """ 96 # Convert the bytes into a file-like object 97 txt_file = io.BytesIO(data) 98 99 # Load it into dataframe 100 cols = [ 101 "pos", 102 "prismatic", 103 "displacement", 104 "beam_draught", 105 "length_beam", 106 "froude_no", 107 "resistance", 108 ] 109 df = pd.read_csv( 110 txt_file, 111 header=None, 112 sep=" ", 113 names=cols, 114 on_bad_lines="skip", 115 ) 116 return df
Prediction of residuary resistance of sailing yachts at the initial design stage is of a great value for evaluating the ship's performance and for estimating the required propulsive power. Essential inputs include the basic hull dimensions and the boat velocity.
The Delft data set comprises 251 full-scale experiments, which were performed at the Delft Ship Hydromechanics Laboratory for that purpose.
These experiments include 22 different hull forms, derived from a parent form closely related to the "Standfast 43" designed by Frans Maas.
Arguments:
- cache (str or None, optional): The name of the cache. It will be saved to
cache
in the current working directory. If None then no cache will be saved. Defaults to '.dataset_cache'.
Attributes:
- cache (str or None): The name of the cache.
- shape (tuple of integers): Dimensions of the data set
- columns (list of strings): List of column names in the data set
Features:
pos (float): Longitudinal position of the center of buoyancy, adimensional prismatic (float): Prismatic coefficient, adimensional displacement (float): Length-displacement ratio, adimensional beam_draught (float): Beam-draught ratio, adimensional length_beam (float): Length-beam ratio, adimensional froude_no (float): Froude number, adimensional
Targets:
resistance (float): Residuary resistance per unit weight of displacement, adimensional
Source:
Examples:
Load in the data set::
>>> dataset = Yacht() >>> dataset.shape (251, 7)
Split the data set into features and targets, as NumPy arrays::
>>> X, y = dataset.split() >>> X.shape, y.shape ((251, 6), (251,))
Perform a train/test split, also outputting NumPy arrays::
>>> train_test_split = dataset.split(test_size=0.2, random_seed=42) >>> X_train, X_test, y_train, y_test = train_test_split >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape ((196, 6), (196,), (55, 6), (55,))
Output the underlying Pandas DataFrame::
>>> df = dataset.to_pandas() >>> type(df) <class 'pandas.core.frame.DataFrame'>