Google-Patents-Scraper
Google Patents Scraper
(1) Automatically download all PDF files of searching results & their patent families.
(2) Generate an overview report of searching results.
Table of contents
Application Demo
Google Patents Scraper – Demo (YouTube)
Introduction
This application scrape Google Patents by two steps:
- Set Proxy (Optional)
- Search & Download Patents
Set Proxy (Optional)
- Set proxy to avoid current ip blocked by Google Patents
Search & Download Patents
- Select an output directory to store downloaded/generated files
- Search whatever you like (search terms' format same as Google Patents)
- Download PDF files of searching results & their patent families
PDF files and auto-generated overview.md
will then be stored in selected directory
File Structure of Output Directory
├── PDFs │ ├── CN104321947A.pdf │ ├── ... │ └── readme.txt ├── Family_PDFs │ ├── CN104321947A's\ Family │ │ ├── EP2850716B1.pdf │ │ ├── ... │ │ └── readme.txt │ ├── ... │ └── ... └── overview.md
- Output directory of demo located at Demo_outdir
- overview.md represents the summary of completed searching
Built With
Modules besides python built-ins
- Web Scarping - Selenium / Beautiful Soup / requests
- GUI framework - PyQt5
- Others - fake-useragent / tqdm
Getting Started
Prerequisites
- Download a ChromeDriver which correspond with your Chrome version
- Replace the one in src/resources
Installation
- Clone the repo
git clone https://github.com/wenyalintw/Google-Patents-Scraper.git
- Install required modules listed in requirements.txt
pip install -r /path/to/requirements.txt
- Ready to go
cd src
python main.py
Acknowledgments
- Checking process of proxies modified from ApsOps's repo
- search.png licensed under "CC BY 3.0" downloaded from ICONFINDER