doubt.datasets.airfoil
Airfoil data set.
This data set is from the UCI data set archive, with the description being the original description verbatim. Some feature names may have been altered, based on the description.
1"""Airfoil data set. 2 3This data set is from the UCI data set archive, with the description being the original 4description verbatim. Some feature names may have been altered, based on the 5description. 6""" 7 8import io 9 10import pandas as pd 11 12from .dataset import BASE_DATASET_DESCRIPTION, BaseDataset 13 14 15class Airfoil(BaseDataset): 16 __doc__ = f""" 17 The NASA data set comprises different size NACA 0012 airfoils at various wind 18 tunnel speeds and angles of attack. The span of the airfoil and the observer 19 position were the same in all of the experiments. 20 21 {BASE_DATASET_DESCRIPTION} 22 23 Features: 24 int: 25 Frequency, in Hertzs 26 float: 27 Angle of attack, in degrees 28 float: 29 Chord length, in meters 30 float: 31 Free-stream velocity, in meters per second 32 float: 33 Suction side displacement thickness, in meters 34 35 Targets: 36 float: 37 Scaled sound pressure level, in decibels 38 39 Source: 40 https://archive.ics.uci.edu/ml/datasets/Airfoil+Self-Noise 41 42 Examples: 43 Load in the data set:: 44 45 >>> dataset = Airfoil() 46 >>> dataset.shape 47 (1503, 6) 48 49 Split the data set into features and targets, as NumPy arrays:: 50 51 >>> X, y = dataset.split() 52 >>> X.shape, y.shape 53 ((1503, 5), (1503,)) 54 55 Perform a train/test split, also outputting NumPy arrays:: 56 57 >>> train_test_split = dataset.split(test_size=0.2, random_seed=42) 58 >>> X_train, X_test, y_train, y_test = train_test_split 59 >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape 60 ((1181, 5), (1181,), (322, 5), (322,)) 61 62 Output the underlying Pandas DataFrame:: 63 64 >>> df = dataset.to_pandas() 65 >>> type(df) 66 <class 'pandas.core.frame.DataFrame'> 67 """ 68 69 _url = ( 70 "https://archive.ics.uci.edu/ml/machine-learning-databases/" 71 "00291/airfoil_self_noise.dat" 72 ) 73 74 _features = range(5) 75 _targets = [5] 76 77 def _prep_data(self, data: bytes) -> pd.DataFrame: 78 """Prepare the data set. 79 80 Args: 81 data (bytes): The raw data 82 83 Returns: 84 Pandas dataframe: The prepared data 85 """ 86 # Convert the bytes into a file-like object 87 csv_file = io.BytesIO(data) 88 89 # Read the file-like object into a data frame 90 df = pd.read_csv(csv_file, sep="\t", header=None) 91 return df
16class Airfoil(BaseDataset): 17 __doc__ = f""" 18 The NASA data set comprises different size NACA 0012 airfoils at various wind 19 tunnel speeds and angles of attack. The span of the airfoil and the observer 20 position were the same in all of the experiments. 21 22 {BASE_DATASET_DESCRIPTION} 23 24 Features: 25 int: 26 Frequency, in Hertzs 27 float: 28 Angle of attack, in degrees 29 float: 30 Chord length, in meters 31 float: 32 Free-stream velocity, in meters per second 33 float: 34 Suction side displacement thickness, in meters 35 36 Targets: 37 float: 38 Scaled sound pressure level, in decibels 39 40 Source: 41 https://archive.ics.uci.edu/ml/datasets/Airfoil+Self-Noise 42 43 Examples: 44 Load in the data set:: 45 46 >>> dataset = Airfoil() 47 >>> dataset.shape 48 (1503, 6) 49 50 Split the data set into features and targets, as NumPy arrays:: 51 52 >>> X, y = dataset.split() 53 >>> X.shape, y.shape 54 ((1503, 5), (1503,)) 55 56 Perform a train/test split, also outputting NumPy arrays:: 57 58 >>> train_test_split = dataset.split(test_size=0.2, random_seed=42) 59 >>> X_train, X_test, y_train, y_test = train_test_split 60 >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape 61 ((1181, 5), (1181,), (322, 5), (322,)) 62 63 Output the underlying Pandas DataFrame:: 64 65 >>> df = dataset.to_pandas() 66 >>> type(df) 67 <class 'pandas.core.frame.DataFrame'> 68 """ 69 70 _url = ( 71 "https://archive.ics.uci.edu/ml/machine-learning-databases/" 72 "00291/airfoil_self_noise.dat" 73 ) 74 75 _features = range(5) 76 _targets = [5] 77 78 def _prep_data(self, data: bytes) -> pd.DataFrame: 79 """Prepare the data set. 80 81 Args: 82 data (bytes): The raw data 83 84 Returns: 85 Pandas dataframe: The prepared data 86 """ 87 # Convert the bytes into a file-like object 88 csv_file = io.BytesIO(data) 89 90 # Read the file-like object into a data frame 91 df = pd.read_csv(csv_file, sep="\t", header=None) 92 return df
The NASA data set comprises different size NACA 0012 airfoils at various wind tunnel speeds and angles of attack. The span of the airfoil and the observer position were the same in all of the experiments.
Arguments:
- cache (str or None, optional): The name of the cache. It will be saved to
cache
in the current working directory. If None then no cache will be saved. Defaults to '.dataset_cache'.
Attributes:
- cache (str or None): The name of the cache.
- shape (tuple of integers): Dimensions of the data set
- columns (list of strings): List of column names in the data set
Features:
int: Frequency, in Hertzs float: Angle of attack, in degrees float: Chord length, in meters float: Free-stream velocity, in meters per second float: Suction side displacement thickness, in meters
Targets:
float: Scaled sound pressure level, in decibels
Source:
Examples:
Load in the data set::
>>> dataset = Airfoil() >>> dataset.shape (1503, 6)
Split the data set into features and targets, as NumPy arrays::
>>> X, y = dataset.split() >>> X.shape, y.shape ((1503, 5), (1503,))
Perform a train/test split, also outputting NumPy arrays::
>>> train_test_split = dataset.split(test_size=0.2, random_seed=42) >>> X_train, X_test, y_train, y_test = train_test_split >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape ((1181, 5), (1181,), (322, 5), (322,))
Output the underlying Pandas DataFrame::
>>> df = dataset.to_pandas() >>> type(df) <class 'pandas.core.frame.DataFrame'>