How To Download Mushroom Data Set From Uci
UCI download/process software repository
Open source Python repository for downloading, processing, folding and describing supervised machine learning datasets from UCI and others raw repositories
This Github repository is a set of scripts for downloading supervised machine learning datasets from UCI Motorcar Learning Repository, and procedure them into a common format. Originally, information technology was a fork of Julia repository JackDunnNZ/uci-data, from which configuration files were extracted. The UCI ML repository is a useful source for machine learning datasets for testing and benchmarking, but the format of datasets is not consequent. This means effort is required in order to make use of new datasets since they demand to be read differently.
The main goal of this repository is to process the datasets into a format to be read from PyRidge, where each row of final data is every bit follows:
attribute_1 attribute_2 ... attribute_n class
This makes it piece of cake to switch out datasets in ML problems, which is great when automating things.
Converting to mutual format
The datasets are not checked in to git in order to minimise the size of the repository and to avoid rehosting the data. As such, the script downloads any missing datasets direct from UCI equally it runs.
Running the code
At that place are ii means of running the code. Easy/obscure mode is to run first the install_requirements.sh
script, using bash
fustigate install_requirements.sh
Which install the Python 3 requirements from requirements.txt
. Packages necessaries for this library:
- numpy
- pandas
- sklearn
- rarfile
- PyLaTeX
After that, the main script
Yet, it is recommended to use a virtual environment for Python three, which can be done easily following an explanation here. In this virtual enviroment, previous requirements must be installed. Then, you only have to run the scripts in the chief directory
python download_data.py python process_data.py python fold_data.py python describe_data.py
The information will be downloaded, processed, k-folded and described, in that order. Customizable parameters, such as folders to procedure and number of folds, are establish in parameter_config.ini
:
[DOWNLOAD] config_folders = datafiles/regression,datafiles/classification raw_folder = raw_data remove_older = Truthful [Procedure] config_folders = datafiles/regression,datafiles/nomenclature processed_folder = processed_data remove_older = True [FOLD] processed_folders = processed_data/regression,processed_data/classification data_folder = information remove_older = True n_fold = 10 [Describe] data_folders = information/regression,information/nomenclature description_folder = description remove_older = True
Citation policy
Perales-González, Carlos, (2020). UCI download-procedure, v1.3, GitHub repository, https://github.com/cperales/uci-download-process
@misc{UCI-download-process, writer = {Carlos, Perales-González}, title = {UCI download/procedure}, year = {2020}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/cperales/uci-download-process}}, tag = {1.3} }
How To Download Mushroom Data Set From Uci,
Source: https://github.com/cperales/uci-download-process
Posted by: monarrezyousses.blogspot.com
0 Response to "How To Download Mushroom Data Set From Uci"
Post a Comment