Google-Patents-Scraper

Wen-Ya Lin

Aug 11, 2019

View on GitHub

Google Patents Scraper

(1) Automatically download all PDF files of searching results & their patent families.

(2) Generate an overview report of searching results.

Application Demo
Introduction
Built With
Getting Started
Acknowledgments

Application Demo

Google Patents Scraper – Demo (YouTube)

Introduction

This application scrape Google Patents by two steps:

Set Proxy (Optional)
Search & Download Patents

Set Proxy (Optional)

Set proxy to avoid current ip blocked by Google Patents

preprocessing

Search & Download Patents

Select an output directory to store downloaded/generated files
Search whatever you like (search terms' format same as Google Patents)
Download PDF files of searching results & their patent families

PDF files and auto-generated overview.md will then be stored in selected directory

preprocessing

File Structure of Output Directory

├── PDFs
│   ├── CN104321947A.pdf
│   ├── ...
│   └── readme.txt
├── Family_PDFs
│   ├── CN104321947A's\ Family
│   │   ├── EP2850716B1.pdf
│   │   ├── ...
│   │   └── readme.txt
│   ├── ...
│   └── ...
└── overview.md

Output directory of demo located at Demo_outdir
overview.md represents the summary of completed searching

Built With

Modules besides python built-ins

Web Scarping - Selenium / Beautiful Soup / requests
GUI framework - PyQt5
Others - fake-useragent / tqdm

Getting Started

Prerequisites

Download a ChromeDriver which correspond with your Chrome version
Replace the one in src/resources

Installation

Clone the repo

git clone https://github.com/wenyalintw/Google-Patents-Scraper.git

Install required modules listed in requirements.txt

pip install -r /path/to/requirements.txt

Ready to go

cd src
python main.py

Acknowledgments

Checking process of proxies modified from ApsOps's repo
search.png licensed under "CC BY 3.0" downloaded from ICONFINDER

Wen-Ya Lin

M.S. Student in Mechanical Engineering

My research interests include Image Processing, Artificial Intelligence and Internet of Things.

Google-Patents-Scraper

Google Patents Scraper

Table of contents

Application Demo

Google Patents Scraper – Demo (YouTube)

Introduction

Set Proxy (Optional)

Search & Download Patents

File Structure of Output Directory

Built With

Getting Started

Prerequisites

Installation

Acknowledgments

Wen-Ya Lin

M.S. Student in Mechanical Engineering

Related