simple web scraping: extract texts and links as CSV, and save images of multiple websites
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Mahdi Dibaiee 632a840572 fix: memory improvement 6 years ago
.gitignore initial commit 6 years ago chore: mention results 6 years ago fix: optimize memory usage 6 years ago fix: memory improvement 6 years ago
requirements initial commit 6 years ago
test_websites initial commit 6 years ago


A simple script that scrapes a website, extracting texts in a CSV file with the format below, and saving images.

Page Tag Text Link Image
page path element tag (h{1,6}, a, p, etc) text content link url (if any) image address (if any)


First, install dependencies (python3):

pip install -r requirements

Then create a file containing urls of the websites you want to scrape, one line for each website, for example (I'll call this file test_websites):

Now you are ready to execute the script:

python test_websites
                # ^ path to your file

After the script is done with it's job, you can find the results in results/<website_hostname> folder.

To see available options, try python -h.