A simple web scraper for scraping articles and their metadata from bioRxiv.

Install

git clone https://github.com/JohnGiorgi/biorxiv_scraper.git
cd biorxiv-scraper
pip install -e .

How to use

Everything happens via the bioRxivScraper class. Begin by creating an instance

from biorxiv_scraper.core import bioRxivScraper

scraper = bioRxivScraper()

You can then call its various methods for scraping bioRxiv. For example, to scrape all articles uploaded in 2019 under the subject area "Animal Behavior and Cognition"

scraped_content = scraper.by_year(2019, subject_areas="Animal Behavior and Cognition")

scraped_content is a dictionary keyed by doi, that contains the scraped data and metadata for each article