doubt.datasets.bike_sharing_hourly

Hourly bike sharing data set.

This data set is from the UCI data set archive, with the description being the original description verbatim. Some feature names may have been altered, based on the description.

  1"""Hourly bike sharing data set.
  2
  3This data set is from the UCI data set archive, with the description being the original
  4description verbatim. Some feature names may have been altered, based on the
  5description.
  6"""
  7
  8import io
  9import zipfile
 10
 11import pandas as pd
 12
 13from .dataset import BASE_DATASET_DESCRIPTION, BaseDataset
 14
 15
 16class BikeSharingHourly(BaseDataset):
 17    __doc__ = f"""
 18    Bike sharing systems are new generation of traditional bike rentals where whole
 19    process from membership, rental and return back has become automatic. Through these
 20    systems, user is able to easily rent a bike from a particular position and return
 21    back at another position. Currently, there are about over 500 bike-sharing programs
 22    around the world which is composed of over 500 thousands bicycles. Today, there
 23    exists great interest in these systems due to their important role in traffic,
 24    environmental and health issues.
 25
 26    Apart from interesting real world applications of bike sharing systems, the
 27    characteristics of data being generated by these systems make them attractive for
 28    the research. Opposed to other transport services such as bus or subway, the
 29    duration of travel, departure and arrival position is explicitly recorded in these
 30    systems. This feature turns bike sharing system into a virtual sensor network that
 31    can be used for sensing mobility in the city. Hence, it is expected that most of
 32    important events in the city could be detected via monitoring these data.
 33
 34    {BASE_DATASET_DESCRIPTION}
 35
 36    Features:
 37        instant (int):
 38            Record index
 39        season (int):
 40            The season, with 1 = winter, 2 = spring, 3 = summer and 4 = autumn
 41        yr (int):
 42            The year, with 0 = 2011 and 1 = 2012
 43        mnth (int):
 44            The month, from 1 to 12 inclusive
 45        hr (int):
 46            The hour of the day, from 0 to 23 inclusive
 47        holiday (int):
 48            Whether day is a holiday or not, binary valued
 49        weekday (int):
 50            The day of the week, from 0 to 6 inclusive
 51        workingday (int):
 52            Working day, 1 if day is neither weekend nor holiday, otherwise 0
 53        weathersit (int):
 54            Weather, encoded as
 55
 56            1. Clear, few clouds, partly cloudy
 57            2. Mist and cloudy, mist and broken clouds, mist and few clouds
 58            3. Light snow, light rain and thunderstorm and scattered clouds, light rain
 59            and scattered clouds
 60            4. Heavy rain and ice pallets and thunderstorm and mist, or snow and fog
 61
 62        temp (float):
 63            Max-min normalised temperature in Celsius, from -8 to +39
 64        atemp (float):
 65            Max-min normalised feeling temperature in Celsius, from -16 to +50
 66        hum (float):
 67            Scaled max-min normalised humidity, from 0 to 1
 68        windspeed (float):
 69            Scaled max-min normalised wind speed, from 0 to 1
 70
 71    Targets:
 72        casual (int):
 73            Count of casual users
 74        registered (int):
 75            Count of registered users
 76        cnt (int):
 77            Sum of casual and registered users
 78
 79    Source:
 80        https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
 81
 82    Examples:
 83        Load in the data set::
 84
 85            >>> dataset = BikeSharingHourly()
 86            >>> dataset.shape
 87            (17379, 16)
 88
 89        Split the data set into features and targets, as NumPy arrays::
 90
 91            >>> X, y = dataset.split()
 92            >>> X.shape, y.shape
 93            ((17379, 13), (17379, 3))
 94
 95        Perform a train/test split, also outputting NumPy arrays::
 96
 97            >>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
 98            >>> X_train, X_test, y_train, y_test = train_test_split
 99            >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
100            ((13873, 13), (13873, 3), (3506, 13), (3506, 3))
101
102        Output the underlying Pandas DataFrame::
103
104            >>> df = dataset.to_pandas()
105            >>> type(df)
106            <class 'pandas.core.frame.DataFrame'>
107    """
108
109    _url = (
110        "https://archive.ics.uci.edu/ml/machine-learning-databases/"
111        "00275/Bike-Sharing-Dataset.zip"
112    )
113
114    _features = range(13)
115    _targets = [13, 14, 15]
116
117    def _prep_data(self, data: bytes) -> pd.DataFrame:
118        """Prepare the data set.
119
120        Args:
121            data (bytes): The raw data
122
123        Returns:
124            Pandas dataframe: The prepared data
125        """
126        # Convert the bytes into a file-like object
127        buffer = io.BytesIO(data)
128
129        # Unzip the file and pull out hour.csv as a string
130        with zipfile.ZipFile(buffer, "r") as zip_file:
131            csv = zip_file.read("hour.csv").decode("utf-8")
132
133        # Convert the string into a file-like object
134        csv_file = io.StringIO(csv)
135
136        # Read the file-like object into a dataframe
137        cols = [0] + list(range(2, 17))
138        df = pd.read_csv(csv_file, usecols=cols)
139        return df
class BikeSharingHourly(doubt.datasets.dataset.BaseDataset):
 17class BikeSharingHourly(BaseDataset):
 18    __doc__ = f"""
 19    Bike sharing systems are new generation of traditional bike rentals where whole
 20    process from membership, rental and return back has become automatic. Through these
 21    systems, user is able to easily rent a bike from a particular position and return
 22    back at another position. Currently, there are about over 500 bike-sharing programs
 23    around the world which is composed of over 500 thousands bicycles. Today, there
 24    exists great interest in these systems due to their important role in traffic,
 25    environmental and health issues.
 26
 27    Apart from interesting real world applications of bike sharing systems, the
 28    characteristics of data being generated by these systems make them attractive for
 29    the research. Opposed to other transport services such as bus or subway, the
 30    duration of travel, departure and arrival position is explicitly recorded in these
 31    systems. This feature turns bike sharing system into a virtual sensor network that
 32    can be used for sensing mobility in the city. Hence, it is expected that most of
 33    important events in the city could be detected via monitoring these data.
 34
 35    {BASE_DATASET_DESCRIPTION}
 36
 37    Features:
 38        instant (int):
 39            Record index
 40        season (int):
 41            The season, with 1 = winter, 2 = spring, 3 = summer and 4 = autumn
 42        yr (int):
 43            The year, with 0 = 2011 and 1 = 2012
 44        mnth (int):
 45            The month, from 1 to 12 inclusive
 46        hr (int):
 47            The hour of the day, from 0 to 23 inclusive
 48        holiday (int):
 49            Whether day is a holiday or not, binary valued
 50        weekday (int):
 51            The day of the week, from 0 to 6 inclusive
 52        workingday (int):
 53            Working day, 1 if day is neither weekend nor holiday, otherwise 0
 54        weathersit (int):
 55            Weather, encoded as
 56
 57            1. Clear, few clouds, partly cloudy
 58            2. Mist and cloudy, mist and broken clouds, mist and few clouds
 59            3. Light snow, light rain and thunderstorm and scattered clouds, light rain
 60            and scattered clouds
 61            4. Heavy rain and ice pallets and thunderstorm and mist, or snow and fog
 62
 63        temp (float):
 64            Max-min normalised temperature in Celsius, from -8 to +39
 65        atemp (float):
 66            Max-min normalised feeling temperature in Celsius, from -16 to +50
 67        hum (float):
 68            Scaled max-min normalised humidity, from 0 to 1
 69        windspeed (float):
 70            Scaled max-min normalised wind speed, from 0 to 1
 71
 72    Targets:
 73        casual (int):
 74            Count of casual users
 75        registered (int):
 76            Count of registered users
 77        cnt (int):
 78            Sum of casual and registered users
 79
 80    Source:
 81        https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
 82
 83    Examples:
 84        Load in the data set::
 85
 86            >>> dataset = BikeSharingHourly()
 87            >>> dataset.shape
 88            (17379, 16)
 89
 90        Split the data set into features and targets, as NumPy arrays::
 91
 92            >>> X, y = dataset.split()
 93            >>> X.shape, y.shape
 94            ((17379, 13), (17379, 3))
 95
 96        Perform a train/test split, also outputting NumPy arrays::
 97
 98            >>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
 99            >>> X_train, X_test, y_train, y_test = train_test_split
100            >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
101            ((13873, 13), (13873, 3), (3506, 13), (3506, 3))
102
103        Output the underlying Pandas DataFrame::
104
105            >>> df = dataset.to_pandas()
106            >>> type(df)
107            <class 'pandas.core.frame.DataFrame'>
108    """
109
110    _url = (
111        "https://archive.ics.uci.edu/ml/machine-learning-databases/"
112        "00275/Bike-Sharing-Dataset.zip"
113    )
114
115    _features = range(13)
116    _targets = [13, 14, 15]
117
118    def _prep_data(self, data: bytes) -> pd.DataFrame:
119        """Prepare the data set.
120
121        Args:
122            data (bytes): The raw data
123
124        Returns:
125            Pandas dataframe: The prepared data
126        """
127        # Convert the bytes into a file-like object
128        buffer = io.BytesIO(data)
129
130        # Unzip the file and pull out hour.csv as a string
131        with zipfile.ZipFile(buffer, "r") as zip_file:
132            csv = zip_file.read("hour.csv").decode("utf-8")
133
134        # Convert the string into a file-like object
135        csv_file = io.StringIO(csv)
136
137        # Read the file-like object into a dataframe
138        cols = [0] + list(range(2, 17))
139        df = pd.read_csv(csv_file, usecols=cols)
140        return df

Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.

Arguments:
  • cache (str or None, optional): The name of the cache. It will be saved to cache in the current working directory. If None then no cache will be saved. Defaults to '.dataset_cache'.
Attributes:
  • cache (str or None): The name of the cache.
  • shape (tuple of integers): Dimensions of the data set
  • columns (list of strings): List of column names in the data set
Features:

instant (int): Record index season (int): The season, with 1 = winter, 2 = spring, 3 = summer and 4 = autumn yr (int): The year, with 0 = 2011 and 1 = 2012 mnth (int): The month, from 1 to 12 inclusive hr (int): The hour of the day, from 0 to 23 inclusive holiday (int): Whether day is a holiday or not, binary valued weekday (int): The day of the week, from 0 to 6 inclusive workingday (int): Working day, 1 if day is neither weekend nor holiday, otherwise 0 weathersit (int): Weather, encoded as

1. Clear, few clouds, partly cloudy
2. Mist and cloudy, mist and broken clouds, mist and few clouds
3. Light snow, light rain and thunderstorm and scattered clouds, light rain
and scattered clouds
4. Heavy rain and ice pallets and thunderstorm and mist, or snow and fog

temp (float): Max-min normalised temperature in Celsius, from -8 to +39 atemp (float): Max-min normalised feeling temperature in Celsius, from -16 to +50 hum (float): Scaled max-min normalised humidity, from 0 to 1 windspeed (float): Scaled max-min normalised wind speed, from 0 to 1

Targets:

casual (int): Count of casual users registered (int): Count of registered users cnt (int): Sum of casual and registered users

Source:

https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset

Examples:

Load in the data set::

>>> dataset = BikeSharingHourly()
>>> dataset.shape
(17379, 16)

Split the data set into features and targets, as NumPy arrays::

>>> X, y = dataset.split()
>>> X.shape, y.shape
((17379, 13), (17379, 3))

Perform a train/test split, also outputting NumPy arrays::

>>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
>>> X_train, X_test, y_train, y_test = train_test_split
>>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
((13873, 13), (13873, 3), (3506, 13), (3506, 3))

Output the underlying Pandas DataFrame::

>>> df = dataset.to_pandas()
>>> type(df)
<class 'pandas.core.frame.DataFrame'>