patent_design

command module
v0.0.0-...-6bd80de Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 11, 2023 License: Apache-2.0 Imports: 12 Imported by: 0

README

USPTO Design Patent Search Engine

This project implements a search engine for USPTO design patents based on various criteria. Users can search for design patents by patent title, patent number, inventor(s) name, assignee (owner) name, application date, issue date, and design class (if available).

Table of Contents

Introduction

The United States Patent and Trademark Office (USPTO) provides a dataset of design patents, including information about various design patents granted by the USPTO. This project aims to create a search engine that enables users to search for design patents based on specific criteria.

Search Engine Architecture

Flow 1 -> Json Extraction and bulk inserts into postgreSQL and elasticSearch
image
Flow 2 -> Optimised wildcard search from ElasticSearch with all hits and Pagination
image
Flow 3 -> Optimised Query to get all the patent metadata from PostgreSQL
image

Features

  • Search design patents by patent title, patent number, inventor(s) name, assignee (owner) name, application date, issue date, and design class.
  • Efficiently parse and store USPTO design patent data.
  • Optimize search engine performance for large datasets.

Data Extraction

type Patent struct {
	PatentNumber    string         `json:"PatentNumber" gorm:"primaryKey"`
	PatentTitle     string         `json:"PatentTitle"`
	Authors         pq.StringArray `json:"Authors" gorm:"type:text[]"`
	Assignee        string         `json:"Assignee"`
	ApplicationDate string         `json:"ApplicationDate"`
	IssueDate       string         `json:"IssueDate"`
	DesignClass     string         `json:"DesignClass"`
	ReferencesCited pq.StringArray `json:"ReferencesCited" gorm:"type:text[]"`
	Description     pq.StringArray `json:"Description" gorm:"type:text[]"`
}

The fields have been highly optimiised to hold list of data, the extraction has been done refering to the dtd from the USPTO page for design patents.

For extaction, i'd written a script to first unzip all the data and extract the XML files to a folder called all_xml. The second step was to use use encoding/xml and encoding/json to derive all the extracted fields by specifing model structs.

type Inventor struct {
	LastName  string `xml:"addressbook>last-name"`
	FirstName string `xml:"addressbook>first-name"`
}

type UsPatentGrant struct {
	PatentTitle     string      `xml:"us-bibliographic-data-grant>invention-title"`
	PatentNumber    string      `xml:"us-bibliographic-data-grant>publication-reference>document-id>doc-number"`
	Authors         []Inventor  `xml:"us-bibliographic-data-grant>us-parties>inventors>inventor"`
	Assignee        string      `xml:"us-bibliographic-data-grant>us-parties>us-applicants>us-applicant>addressbook>orgname"`
	ApplicationDate CustomTime  `xml:"us-bibliographic-data-grant>application-reference>document-id>date"`
	IssueDate       CustomTime  `xml:"us-bibliographic-data-grant>publication-reference>document-id>date"`
	DesignClass     string      `xml:"us-bibliographic-data-grant>classification-national>main-classification"`
	ReferencesCited []Reference `xml:"us-bibliographic-data-grant>us-references-cited>us-citation,omitempty"`
	Description     Description `xml:"description"`
}

type Reference struct {
	Name string `xml:"patcit>document-id>name"`
}

type Description struct {
	DescriptionDrawings []string `xml:"description-of-drawings>p"`
}

type CustomTime struct {
	Time string `xml:",chardata"`
}

Above you can see the Etree mappings, to extract the data from the xml and map it to the respective json attribute. The xml data was extracted using the NewEncoder method and appended to a combined json file.

Please refer json_generator.go and xml_file_extractor.go

Bulk Insertion

Bulk insertion was done in two places from the combined_json generated from the file extraction with all the metadata. db_bulk_insertion - the file handling the bulk insert into postgres. The code is extremely modular and inserts data according to the specifed schema defined in models. es_bulk_insertion - This file handles chunking of the json_data and effeciently inserting the data into ES_INDEX = design_patents

Performance Optimization

This repository demonstrates a performance-optimized search functionality using Elasticsearch (ES) for Postgres data. The optimization involves a two-step process:

Elasticsearch Indexing
  • Elasticsearch is utilized to index searchable fields, optimizing search performance.
  • Only specific searchable fields are stored in Elasticsearch, enhancing efficiency.
Search and Retrieval
  1. Search Process:

    • The search process involves querying Elasticsearch for relevant results based on the search query.
  2. Data Retrieval:

    • Once search results are obtained, a second query is made to the original data source (e.g., Postgres) using the retrieved unique identifiers (e.g., Patent Number).

This two-step approach minimizes the load on the original data source, enhancing response speed and efficiency. By implementing pagination within the Elasticsearch query and selectively indexing necessary fields, we achieve an efficient search mechanism. Additionally, leveraging Elasticsearch for primary search operations optimizes the overall system's performance.

Search Functionality

The Search engine uses fuzzy logic coupled with ElasticSearch (indexed against a postgres DB) The search engine allows users to search for design patents based on various criteria, including patent title, patent number, inventor(s) name, assignee (owner) name, application date, issue date, and design class (if available).

Getting Started

Prerequisites

To run this project, you need the following prerequisites:

  • GoLang (v1.20)
  • PostgreSQL (v12+)
  • ElasticSearch (v17.17)
Installation
  1. Clone the repository:
    git clone https://github.com/yourusername/patent_designs.git
    cd patent_designs
    
    
Usage
go mod download
go run main.go

Postman Documentation

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL