ULTIMATE RESOURCES TO FIND DATA PART-4

Subhash Achutha
2 min readSep 30, 2021

WEBSITE

Awesome list of datasets in 100+ categories https://www.kdnuggets.com/2021/05/awesome-list-datasets.html

https://sebastianraschka.com/blog/2021/ml-dl-datasets.html https://enoumen.com/2021/04/23/data-sciences-datasets-data-visualization-data-analytics-big-data-data-lakes/

https://serokell.io/blog/best-machine-learning-datasets https://medium.com/@ODSC/25-excellent-machine-learning-open-datasets-940ca2124dfc

1)kaggle-https://www.kaggle.com/datasets , 𝚙𝚒𝚙 𝚒𝚗𝚜𝚝𝚊𝚕𝚕 𝚔𝚊𝚐𝚐𝚕𝚎𝚍𝚊𝚝𝚊𝚜𝚎𝚝𝚜

Downloading Kaggle datasets directly into Google Colab -https://towardsdatascience.com/downloading-kaggle-datasets-directly-into-google-colab-c8f0f407d73a

How to Download Kaggle Datasets using Jupyter Notebook https://www.analyticsvidhya.com/blog/2021/04/how-to-download-kaggle-datasets-using-jupyter-notebook/

2)https://sebastianraschka.com/blog/2021/ml-dl-datasets.html

movielens-https://grouplens.org/datasets/movielens/latest/

3)data.gov-https://data.gov.in/

4)uci-https://archive.ics.uci.edu/ml/datasets.php https://github.com/tirthajyoti/UCI-ML-API

5)Group Lens dataset https://grouplens.org/

Wikipedia ML Datasets https://en.wikipedia.org/wiki/List_of_datasets_for_machine-learning_research

6)world3bank https://data.world/ , worldbank

7)Google Cloud BigQuery public datasets

Google Public Datasets-cloud.google.com/bigquery/public-data/

Google Cloud Data Catalog https://cloud.google.com/data-catalog

Academic Torrents-https://academictorrents.com/check.htm?returnto=%2Fbrowse.php

8)online hacktons

Datasets https://www.paperswithcode.com/datasets

9)image data from google_images_download

https://www.visualdata.io/discovery

http://xviewdataset.org/#dataset

https://ai.googleblog.com/2016/09/introducing-open-images-dataset.html

10)image data from Bing_Search

image data from simple_image_download https://github.com/RiddlerQ/simple_image_download

11)https://www.columnfivemedia.com/100-best-free-data-sources-infographic

12)Reddit:https://lnkd.in/dv5UCD4 https://www.reddit.com/r/datasets/

13)https://datasets.bifrost.ai/?ref=producthunt

14)data.world:https://lnkd.in/gEK897K

15)https://data.world/datasets/open-data

https://tinyletter.com/data-is-plural

16)FiveThirtyEight :- https://lnkd.in/gyh-HDj , https://data.fivethirtyeight.com/

17)BuzzFeed :- https://lnkd.in/gzPWyHj

Buzzfeed News -github.com/BuzzFeedNews

Socrata - https://opendata.socrata.com/

18)Google public datasets :- https://lnkd.in/g5dH8qE

Statistics Canada https://www.statcan.gc.ca/eng/start https://towardsdatascience.com/how-to-collect-data-from-statistics-canada-using-python-db8a81ce6475

Deep Image Search AI-based image search engine https://github.com/TechyNilesh/DeepImageSearch

https://www.datasciencecentral.com/profiles/blogs/big-data-sets-available-for-free

19)Quandl :- https://www.quandl.com stock data

statista : https://www.statista.com/ stock data

20)socorateopendata :- https://lnkd.in/gea7JMz

21)AcedemicTorrents :- https://lnkd.in/g-Ur9Xy



23)tensorflow_datasets as tfds https://www.tensorflow.org/datasets (import tensorflow_datasets as tfds)

https://lionbridge.ai/datasets/tensorflow-datasets-machine-learning/

24)https://datasets.bifrost.ai/?ref=producthunt

25)https://ourworldindata.org/

26)https://data.worldbank.org/

27)google open images:https://storage.googleapis.com/openimages/web/download.html

30 Largest TensorFlow Datasets for Machine Learning https://lionbridge.ai/datasets/tensorflow-datasets-machine-learning/

https://cloud.google.com/bigquery/public-data/ https://towardsdatascience.com/bigquery-public-datasets-936e1c50e6bc

https://christopherzita.medium.com/how-to-download-google-images-using-python-2021-82e69c637d59

28)https://data.gov.in/

29)imagenet dataset-http://www.image-net.org/

30)https://parulpandey.com/2020/08/09/getting-datasets-for-data-analysis-tasks%e2%80%8a-%e2%80%8aadvanced-google-search/

31)https://storage.googleapis.com/openimages/web/index.html ,

https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=segmentation&r=false&c=%2Fm%2F09qck

https://console.cloud.google.com/marketplace/browse?filter=solution-type:dataset&_ga=2.35328417.1459465882.1589693499-869920574.1589693499

https://catalog.data.gov/dataset?groups=education2168#topic=education_navigation

https://vincentarelbundock.github.io/Rdatasets/datasets.html

32)coco dataset https://cocodataset.org/#explore

33)huggingface datasets-https://github.com/huggingface/datasets https://huggingface.co/datasets https://huggingface.co/languages

pip install datasets

34)Big Bad NLP Database-https://datasets.quantumstat.com/

fast.ai Datasets https://course.fast.ai/datasets

https://github.com/niderhoff/nlp-datasets

600 NLP Datasets and Glory https://pub.towardsai.net/600-nlp-datasets-and-glory-4b0080bf5ab

nlp-datasets https://github.com/karthikncode/nlp-datasets

https://analyticsindiamag.com/15-most-important-nlp-datasets/ https://medium.com/ai-in-plain-english/25-free-datasets-for-natural-language-processing-57e407402c60

35)https://www.edureka.co/blog/25-best-free-datasets-machine-learning/

36)bigquery public dataset ,Google Public Data Explorer

https://cloud.google.com/public-datasets https://guides.library.cmu.edu/machine-learning/datasets

37)inbuilt library data eg:iris dataset,mnist dataset,etc...

pandas-datareader https://github.com/pydata/pandas-datareader

tf.data.Datasets for TensorFlow Datasets

38)https://data.gov.sg/ https://data.gov.au/ https://data.europa.eu/euodp/en/data https://data.europa.eu/euodp/en/data https://data.govt.nz/

data.gov.be ,data.egov.bg/ ,data.gov.cz/english ,portal.opendata.dk,govdata.de,opendata.riik.ee,data.gov.ie,data.gov.gr,datos.gob.es,data.gouv.fr,data.gov.hr

dati.gov.it,data.gov.cy,opendata.gov.lt,data.gov.lv,data.public.lu,data.gov.mt,data.overheid.nl,data.gv.at,danepubliczne.gov.pl,dados.gov.pt,data.gov.ro,podatki.gov.si

data.gov.sk,avoindata.fi,oppnadata.se,https://data.adb.org/ ,https://data.iadb.org/ ,https://www.weforum.org/agenda/2018/03/latin-america-smart-cities-big-data/

https://data.fivethirtyeight.com/ , https://wiki.dbpedia.org/ ,https://www.europeandataportal.eu/en ,https://data.europa.eu/ ,https://www.census.gov/,

https://www.who.int/data/gho ,https://data.unicef.org/open-data/ ,http://data.un.org/ ,https://data.oecd.org/ ,https://data.worldbank.org/

39.Awesome Public Dataset- https://github.com/awesomedata/awesome-public-datasets

Get OpenML’s Dataset in One Line of Code https://mathdatasimplified.com/2021/04/23/fetch_openml-get-openmls-dataset-in-one-line-of-code/

https://github.com/the-pudding/data

datasets https://github.com/benedekrozemberczki/datasets

kdnuggets https://www.kdnuggets.com/datasets/index.html

Hub https://github.com/activeloopai/Hub

40.Datasets for Machine Learning on Graphs-https://ogb.stanford.edu/

41.https://www.johnsnowlabs.com/data/

42.30 largest tensorflow datasets-https://lionbridge.ai/datasets/tensorflow-datasets-machine-learning/
If like to explore more dataset then visit my github repositoryhttps://github.com/achuthasubhash/Complete-Life-Cycle-of-a-Data-Science-Project

--

--