doubt.datasets.bike_sharing_daily

Daily bike sharing data set.

This data set is from the UCI data set archive, with the description being the original description verbatim. Some feature names may have been altered, based on the description.

  1"""Daily bike sharing data set.
  2
  3This data set is from the UCI data set archive, with the description being the original
  4description verbatim. Some feature names may have been altered, based on the
  5description.
  6"""
  7
  8import io
  9import zipfile
 10
 11import pandas as pd
 12
 13from .dataset import BASE_DATASET_DESCRIPTION, BaseDataset
 14
 15
 16class BikeSharingDaily(BaseDataset):
 17    __doc__ = f"""
 18    Bike sharing systems are new generation of traditional bike rentals where whole
 19    process from membership, rental and return back has become automatic. Through these
 20    systems, user is able to easily rent a bike from a particular position and return
 21    back at another position. Currently, there are about over 500 bike-sharing programs
 22    around the world which is composed of over 500 thousands bicycles. Today, there
 23    exists great interest in these systems due to their important role in traffic,
 24    environmental and health issues.
 25
 26    Apart from interesting real world applications of bike sharing systems, the
 27    characteristics of data being generated by these systems make them attractive for
 28    the research. Opposed to other transport services such as bus or subway, the
 29    duration of travel, departure and arrival position is explicitly recorded in these
 30    systems. This feature turns bike sharing system into a virtual sensor network that
 31    can be used for sensing mobility in the city. Hence, it is expected that most of
 32    important events in the city could be detected via monitoring these data.
 33
 34    {BASE_DATASET_DESCRIPTION}
 35
 36    Features:
 37        instant (int):
 38            Record index
 39        season (int):
 40            The season, with 1 = winter, 2 = spring, 3 = summer and 4 = autumn
 41        yr (int):
 42            The year, with 0 = 2011 and 1 = 2012
 43        mnth (int):
 44            The month, from 1 to 12 inclusive
 45        holiday (int):
 46            Whether day is a holiday or not, binary valued
 47        weekday (int):
 48            The day of the week, from 0 to 6 inclusive
 49        workingday (int):
 50            Working day, 1 if day is neither weekend nor holiday, otherwise 0
 51        weathersit (int):
 52            Weather, encoded as
 53
 54            1. Clear, few clouds, partly cloudy
 55            2. Mist and cloudy, mist and broken clouds, mist and few clouds
 56            3. Light snow, light rain and thunderstorm and scattered clouds, light rain
 57            and scattered clouds
 58            4. Heavy rain and ice pallets and thunderstorm and mist, or snow and fog
 59        temp (float):
 60            Max-min normalised temperature in Celsius, from -8 to +39
 61        atemp (float):
 62            Max-min normalised feeling temperature in Celsius, from -16 to +50
 63        hum (float):
 64            Scaled max-min normalised humidity, from 0 to 1
 65        windspeed (float):
 66            Scaled max-min normalised wind speed, from 0 to 1
 67
 68    Targets:
 69        casual (int):
 70            Count of casual users
 71        registered (int):
 72            Count of registered users
 73        cnt (int):
 74            Sum of casual and registered users
 75
 76    Source:
 77        https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
 78
 79    Examples:
 80        Load in the data set::
 81
 82            >>> dataset = BikeSharingDaily()
 83            >>> dataset.shape
 84            (731, 15)
 85
 86        Split the data set into features and targets, as NumPy arrays::
 87
 88            >>> X, y = dataset.split()
 89            >>> X.shape, y.shape
 90            ((731, 12), (731, 3))
 91
 92        Perform a train/test split, also outputting NumPy arrays::
 93
 94            >>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
 95            >>> X_train, X_test, y_train, y_test = train_test_split
 96            >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
 97            ((574, 12), (574, 3), (157, 12), (157, 3))
 98
 99        Output the underlying Pandas DataFrame::
100
101            >>> df = dataset.to_pandas()
102            >>> type(df)
103            <class 'pandas.core.frame.DataFrame'>
104    """
105
106    _url = (
107        "https://archive.ics.uci.edu/ml/machine-learning-databases/"
108        "00275/Bike-Sharing-Dataset.zip"
109    )
110
111    _features = range(12)
112    _targets = [12, 13, 14]
113
114    def _prep_data(self, data: bytes) -> pd.DataFrame:
115        """Prepare the data set.
116
117        Args:
118            data (bytes): The raw data
119
120        Returns:
121            Pandas dataframe: The prepared data
122        """
123        # Convert the bytes into a file-like object
124        buffer = io.BytesIO(data)
125
126        # Unzip the file and pull out day.csv as a string
127        with zipfile.ZipFile(buffer, "r") as zip_file:
128            csv = zip_file.read("day.csv").decode("utf-8")
129
130        # Convert the string into a file-like object
131        csv_file = io.StringIO(csv)
132
133        # Read the file-like object into a dataframe
134        cols = [0] + list(range(2, 16))
135        df = pd.read_csv(csv_file, usecols=cols)
136        return df
class BikeSharingDaily(doubt.datasets.dataset.BaseDataset):
 17class BikeSharingDaily(BaseDataset):
 18    __doc__ = f"""
 19    Bike sharing systems are new generation of traditional bike rentals where whole
 20    process from membership, rental and return back has become automatic. Through these
 21    systems, user is able to easily rent a bike from a particular position and return
 22    back at another position. Currently, there are about over 500 bike-sharing programs
 23    around the world which is composed of over 500 thousands bicycles. Today, there
 24    exists great interest in these systems due to their important role in traffic,
 25    environmental and health issues.
 26
 27    Apart from interesting real world applications of bike sharing systems, the
 28    characteristics of data being generated by these systems make them attractive for
 29    the research. Opposed to other transport services such as bus or subway, the
 30    duration of travel, departure and arrival position is explicitly recorded in these
 31    systems. This feature turns bike sharing system into a virtual sensor network that
 32    can be used for sensing mobility in the city. Hence, it is expected that most of
 33    important events in the city could be detected via monitoring these data.
 34
 35    {BASE_DATASET_DESCRIPTION}
 36
 37    Features:
 38        instant (int):
 39            Record index
 40        season (int):
 41            The season, with 1 = winter, 2 = spring, 3 = summer and 4 = autumn
 42        yr (int):
 43            The year, with 0 = 2011 and 1 = 2012
 44        mnth (int):
 45            The month, from 1 to 12 inclusive
 46        holiday (int):
 47            Whether day is a holiday or not, binary valued
 48        weekday (int):
 49            The day of the week, from 0 to 6 inclusive
 50        workingday (int):
 51            Working day, 1 if day is neither weekend nor holiday, otherwise 0
 52        weathersit (int):
 53            Weather, encoded as
 54
 55            1. Clear, few clouds, partly cloudy
 56            2. Mist and cloudy, mist and broken clouds, mist and few clouds
 57            3. Light snow, light rain and thunderstorm and scattered clouds, light rain
 58            and scattered clouds
 59            4. Heavy rain and ice pallets and thunderstorm and mist, or snow and fog
 60        temp (float):
 61            Max-min normalised temperature in Celsius, from -8 to +39
 62        atemp (float):
 63            Max-min normalised feeling temperature in Celsius, from -16 to +50
 64        hum (float):
 65            Scaled max-min normalised humidity, from 0 to 1
 66        windspeed (float):
 67            Scaled max-min normalised wind speed, from 0 to 1
 68
 69    Targets:
 70        casual (int):
 71            Count of casual users
 72        registered (int):
 73            Count of registered users
 74        cnt (int):
 75            Sum of casual and registered users
 76
 77    Source:
 78        https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset
 79
 80    Examples:
 81        Load in the data set::
 82
 83            >>> dataset = BikeSharingDaily()
 84            >>> dataset.shape
 85            (731, 15)
 86
 87        Split the data set into features and targets, as NumPy arrays::
 88
 89            >>> X, y = dataset.split()
 90            >>> X.shape, y.shape
 91            ((731, 12), (731, 3))
 92
 93        Perform a train/test split, also outputting NumPy arrays::
 94
 95            >>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
 96            >>> X_train, X_test, y_train, y_test = train_test_split
 97            >>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
 98            ((574, 12), (574, 3), (157, 12), (157, 3))
 99
100        Output the underlying Pandas DataFrame::
101
102            >>> df = dataset.to_pandas()
103            >>> type(df)
104            <class 'pandas.core.frame.DataFrame'>
105    """
106
107    _url = (
108        "https://archive.ics.uci.edu/ml/machine-learning-databases/"
109        "00275/Bike-Sharing-Dataset.zip"
110    )
111
112    _features = range(12)
113    _targets = [12, 13, 14]
114
115    def _prep_data(self, data: bytes) -> pd.DataFrame:
116        """Prepare the data set.
117
118        Args:
119            data (bytes): The raw data
120
121        Returns:
122            Pandas dataframe: The prepared data
123        """
124        # Convert the bytes into a file-like object
125        buffer = io.BytesIO(data)
126
127        # Unzip the file and pull out day.csv as a string
128        with zipfile.ZipFile(buffer, "r") as zip_file:
129            csv = zip_file.read("day.csv").decode("utf-8")
130
131        # Convert the string into a file-like object
132        csv_file = io.StringIO(csv)
133
134        # Read the file-like object into a dataframe
135        cols = [0] + list(range(2, 16))
136        df = pd.read_csv(csv_file, usecols=cols)
137        return df

Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.

Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.

Arguments:
  • cache (str or None, optional): The name of the cache. It will be saved to cache in the current working directory. If None then no cache will be saved. Defaults to '.dataset_cache'.
Attributes:
  • cache (str or None): The name of the cache.
  • shape (tuple of integers): Dimensions of the data set
  • columns (list of strings): List of column names in the data set
Features:

instant (int): Record index season (int): The season, with 1 = winter, 2 = spring, 3 = summer and 4 = autumn yr (int): The year, with 0 = 2011 and 1 = 2012 mnth (int): The month, from 1 to 12 inclusive holiday (int): Whether day is a holiday or not, binary valued weekday (int): The day of the week, from 0 to 6 inclusive workingday (int): Working day, 1 if day is neither weekend nor holiday, otherwise 0 weathersit (int): Weather, encoded as

1. Clear, few clouds, partly cloudy
2. Mist and cloudy, mist and broken clouds, mist and few clouds
3. Light snow, light rain and thunderstorm and scattered clouds, light rain
and scattered clouds
4. Heavy rain and ice pallets and thunderstorm and mist, or snow and fog

temp (float): Max-min normalised temperature in Celsius, from -8 to +39 atemp (float): Max-min normalised feeling temperature in Celsius, from -16 to +50 hum (float): Scaled max-min normalised humidity, from 0 to 1 windspeed (float): Scaled max-min normalised wind speed, from 0 to 1

Targets:

casual (int): Count of casual users registered (int): Count of registered users cnt (int): Sum of casual and registered users

Source:

https://archive.ics.uci.edu/ml/datasets/Bike+Sharing+Dataset

Examples:

Load in the data set::

>>> dataset = BikeSharingDaily()
>>> dataset.shape
(731, 15)

Split the data set into features and targets, as NumPy arrays::

>>> X, y = dataset.split()
>>> X.shape, y.shape
((731, 12), (731, 3))

Perform a train/test split, also outputting NumPy arrays::

>>> train_test_split = dataset.split(test_size=0.2, random_seed=42)
>>> X_train, X_test, y_train, y_test = train_test_split
>>> X_train.shape, y_train.shape, X_test.shape, y_test.shape
((574, 12), (574, 3), (157, 12), (157, 3))

Output the underlying Pandas DataFrame::

>>> df = dataset.to_pandas()
>>> type(df)
<class 'pandas.core.frame.DataFrame'>