Google-Patents-Scraper

Spoken-Digit Recognizer

Google Patents Scraper

(1) Automatically download all PDF files of searching results & their patent families.
(2) Generate an overview report of searching results.

Table of contents

Application Demo

Google Patents Scraper – Demo (YouTube)

Introduction

This application scrape Google Patents by two steps:

  • Set Proxy (Optional)
  • Search & Download Patents

Set Proxy (Optional)

  • Set proxy to avoid current ip blocked by Google Patents

preprocessing

Search & Download Patents

  • Select an output directory to store downloaded/generated files
  • Search whatever you like (search terms' format same as Google Patents)
  • Download PDF files of searching results & their patent families

PDF files and auto-generated overview.md will then be stored in selected directory

preprocessing

File Structure of Output Directory

├── PDFs
│   ├── CN104321947A.pdf
│   ├── ...
│   └── readme.txt
├── Family_PDFs
│   ├── CN104321947A's\ Family
│   │   ├── EP2850716B1.pdf
│   │   ├── ...
│   │   └── readme.txt
│   ├── ...
│   └── ...
└── overview.md

Built With

Modules besides python built-ins

Getting Started

Prerequisites

Installation

  • Clone the repo
git clone https://github.com/wenyalintw/Google-Patents-Scraper.git
pip install -r /path/to/requirements.txt
  • Ready to go
cd src
python main.py

Acknowledgments

Avatar
Wen-Ya Lin
M.S. Student in Mechanical Engineering

My research interests include Image Processing, Artificial Intelligence and Internet of Things.

Related