Create a Dataset by Scraping Websites

Sat, Mar 4, 2023

Read in 1 minutes

Modern web scraping techniques using Python and new libraries that help you reduce a lot of work.

First we’ll create the cookiecutter directory structure.

I’ve found this one to be nice, since it incorporates Poetry as a vm and package manager, which is really useful.

First be sure to have cookiecutter installed

pip install -U cookiecutter

Then copypaste this command in order to replicate the directory structure from here

cookiecutter https://github.com/albertorios/cookiecutter-poetry-pypackage.git