A simple web scraper for scraping articles and their metadata from bioRxiv.
Install¶
git clone https://github.com/JohnGiorgi/biorxiv_scraper.git
cd biorxiv-scraper
pip install -e .
How to use¶
Everything happens via the bioRxivScraper
class. Begin by creating an instance
from biorxiv_scraper.core import bioRxivScraper
scraper = bioRxivScraper()
You can then call its various methods for scraping bioRxiv. For example, to scrape all articles uploaded in 2019 under the subject area "Animal Behavior and Cognition"
scraped_content = scraper.by_year(2019, subject_areas="Animal Behavior and Cognition")
scraped_content
is a dictionary keyed by doi, that contains the scraped data and metadata for each article