doubt.datasets.airfoil

Airfoil data set.

This data set is from the UCI data set archive, with the description being the original description verbatim. Some feature names may have been altered, based on the description.

 1"""Airfoil data set.
 2
 3This data set is from the UCI data set archive, with the description being the original
 4description verbatim. Some feature names may have been altered, based on the
 5description.
 6"""
 7
 8import io
 9
10import pandas as pd
11
12from .dataset import BASE_DATASET_DESCRIPTION, BaseDataset
13
14
15class Airfoil(BaseDataset):
16    __doc__ = f"""
17    The NASA data set comprises different size NACA 0012 airfoils at various wind
18    tunnel speeds and angles of attack. The span of the airfoil and the observer
19    position were the same in all of the experiments.
20
21    {BASE_DATASET_DESCRIPTION}
22
23    Features:
24        int:
25            Frequency, in Hertzs
26        float:
27            Angle of attack, in degrees
28        float:
29            Chord length, in meters
30        float:
31            Free-stream velocity, in meters per second
32        float:
33            Suction side displacement thickness, in meters
34
35    Targets:
36        float:
37            Scaled sound pressure level, in decibels
38
39    Source:
40        https://archive.ics.uci.edu/ml/datasets/Airfoil+Self-Noise
41
42    Examples:
43        Load in the data set::
44
45            >>> dataset = Airfoil()
46            >>> dataset.shape
47            (1503, 6)
48
49        Split the data set into features and targets, as NumPy arrays::
50
51            >>> X, y = dataset.split()
52            >>> X.shape, y.shape
53            ((1503, 5), (1503,))
54
55        Perform a train/test split, also outputting NumPy arrays::
56
57            >>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
58            >>> X_train, X_test, y_train, y_test = train_test_split
59            >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
60            ((1181, 5), (1181,), (322, 5), (322,))
61
62        Output the underlying Pandas DataFrame::
63
64            >>> df = dataset.to_pandas()
65            >>> type(df)
66            <class 'pandas.core.frame.DataFrame'>
67    """
68
69    _url = (
70        "https://archive.ics.uci.edu/ml/machine-learning-databases/"
71        "00291/airfoil_self_noise.dat"
72    )
73
74    _features = range(5)
75    _targets = [5]
76
77    def _prep_data(self, data: bytes) -> pd.DataFrame:
78        """Prepare the data set.
79
80        Args:
81            data (bytes): The raw data
82
83        Returns:
84            Pandas dataframe: The prepared data
85        """
86        # Convert the bytes into a file-like object
87        csv_file = io.BytesIO(data)
88
89        # Read the file-like object into a data frame
90        df = pd.read_csv(csv_file, sep="\t", header=None)
91        return df
class Airfoil(doubt.datasets.dataset.BaseDataset):
16class Airfoil(BaseDataset):
17    __doc__ = f"""
18    The NASA data set comprises different size NACA 0012 airfoils at various wind
19    tunnel speeds and angles of attack. The span of the airfoil and the observer
20    position were the same in all of the experiments.
21
22    {BASE_DATASET_DESCRIPTION}
23
24    Features:
25        int:
26            Frequency, in Hertzs
27        float:
28            Angle of attack, in degrees
29        float:
30            Chord length, in meters
31        float:
32            Free-stream velocity, in meters per second
33        float:
34            Suction side displacement thickness, in meters
35
36    Targets:
37        float:
38            Scaled sound pressure level, in decibels
39
40    Source:
41        https://archive.ics.uci.edu/ml/datasets/Airfoil+Self-Noise
42
43    Examples:
44        Load in the data set::
45
46            >>> dataset = Airfoil()
47            >>> dataset.shape
48            (1503, 6)
49
50        Split the data set into features and targets, as NumPy arrays::
51
52            >>> X, y = dataset.split()
53            >>> X.shape, y.shape
54            ((1503, 5), (1503,))
55
56        Perform a train/test split, also outputting NumPy arrays::
57
58            >>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
59            >>> X_train, X_test, y_train, y_test = train_test_split
60            >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
61            ((1181, 5), (1181,), (322, 5), (322,))
62
63        Output the underlying Pandas DataFrame::
64
65            >>> df = dataset.to_pandas()
66            >>> type(df)
67            <class 'pandas.core.frame.DataFrame'>
68    """
69
70    _url = (
71        "https://archive.ics.uci.edu/ml/machine-learning-databases/"
72        "00291/airfoil_self_noise.dat"
73    )
74
75    _features = range(5)
76    _targets = [5]
77
78    def _prep_data(self, data: bytes) -> pd.DataFrame:
79        """Prepare the data set.
80
81        Args:
82            data (bytes): The raw data
83
84        Returns:
85            Pandas dataframe: The prepared data
86        """
87        # Convert the bytes into a file-like object
88        csv_file = io.BytesIO(data)
89
90        # Read the file-like object into a data frame
91        df = pd.read_csv(csv_file, sep="\t", header=None)
92        return df

The NASA data set comprises different size NACA 0012 airfoils at various wind tunnel speeds and angles of attack. The span of the airfoil and the observer position were the same in all of the experiments.

Arguments:
  • cache (str or None, optional): The name of the cache. It will be saved to cache in the current working directory. If None then no cache will be saved. Defaults to '.dataset_cache'.
Attributes:
  • cache (str or None): The name of the cache.
  • shape (tuple of integers): Dimensions of the data set
  • columns (list of strings): List of column names in the data set
Features:

int: Frequency, in Hertzs float: Angle of attack, in degrees float: Chord length, in meters float: Free-stream velocity, in meters per second float: Suction side displacement thickness, in meters

Targets:

float: Scaled sound pressure level, in decibels

Source:

https://archive.ics.uci.edu/ml/datasets/Airfoil+Self-Noise

Examples:

Load in the data set::

>>> dataset = Airfoil()
>>> dataset.shape
(1503, 6)

Split the data set into features and targets, as NumPy arrays::

>>> X, y = dataset.split()
>>> X.shape, y.shape
((1503, 5), (1503,))

Perform a train/test split, also outputting NumPy arrays::

>>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
>>> X_train, X_test, y_train, y_test = train_test_split
>>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
((1181, 5), (1181,), (322, 5), (322,))

Output the underlying Pandas DataFrame::

>>> df = dataset.to_pandas()
>>> type(df)
<class 'pandas.core.frame.DataFrame'>