safebrowsing

package
v0.0.0-...-8237a13 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 2, 2016 License: MPL-2.0 Imports: 24 Imported by: 0

README

Google Safe Browsing API

Build Status Coverage Status

This library provides client functionality for version 3 of the Google safe browsing API as per: https://developers.google.com/safe-browsing/developers_guide_v3

Installation

This should do the trick:

go get github.com/golang/protobuf/proto
go get github.com/rjohnsondev/go-safe-browsing-api

Usage

The library requires at least your Safe Browsing API key and a writable directory to store the list data.

It it recommended you also set the Client, AppVersion and ProtocolVersion globals to something appropriate:

safebrowsing.Client := "api"
safebrowsing.AppVersion := "1.5.2"
safebrowsing.ProtocolVersion := "3.0"

Calling NewSafeBrowsing immediately attempts to contact the google servers and perform an update/inital download. If this succeeds, it returns a SafeBrowsing instance after spawning a new goroutine which will update itself at the interval requested by google.

package main

import (
       safebrowsing "github.com/rjohnsondev/go-safe-browsing-api"
       "os"
       "fmt"
)

func main() {
    key := "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA_BBBBBBBBB"
    dataDir := "./data"
	sb, err := safebrowsing.NewSafeBrowsing(key, dataDir)
	if err != nil {
		fmt.Println(err)
        os.Exit(1)
	}
}
Looking up a URL

There are two methods for looking up URLs, IsListed and MightBeListed. Both of these return either an empty string in the case of an unlisted URL, or the name of the list on which the URL is listed. If there was an error requesting confirmation from Google for a listed URL, or if the last update request was over 45 mins ago, it will be returned along with an empty string.

IsListed(string) is the recommended method to use if displaying a message to a user. It may however make a blocking request to Google's servers for pages that have partial hash matches to perform a full hash match (if it has not already done so for that URL) which can be slow.

response, err := sb.IsListed(url)
if err != nil {
    fmt.Println("Error quering URL:", err)
}
if response == "" {
    fmt.Println("not listed")
} else {
    fmt.Println("URL listed on:", response)
}

If a quick return time is required, it may be worth using the MightBeListed(string) method. This will not contact Google for confirmation, so it can only be used to display a message to the user if the fullHashMatch return value is True AND the last successful update from Google was in the last 45 mins:

response, fullHashMatch, err := sb.MightBeListed(url)
if err != nil {
    fmt.Println("Error quering URL:", err)
}
if response == "" {
    fmt.Println("not listed")
} else {
    if fullHashMatch && sb.IsUpToDate() {
        fmt.Println("URL listed on:", response)
    } else {
        fmt.Println("URL may be listed on:", response)
    }
}

It is recommended you combine the two calls when a non-blocking response is required, so a full hash can be requested and used for future queries about the same url:

response, fullHashMatch, err := sb.MightBeListed(url)
if err != nil {
    fmt.Println("Error quering URL:", err)
}
if response != "" {
    if fullHashMatch && sb.IsUpToDate() {
        fmt.Println("URL listed on:", response)
    } else {
        fmt.Println("URL may be listed on:", response)
        // Requesting full hash in background...
        go sb.IsListed(url)
    }
}
Offline Mode

The library can work in "offline" mode, where it will not attempt to contact Google's servers and work purely from local files. This can be activated by setting the OfflineMode global variable:

package main

import (
	safebrowsing "github.com/rjohnsondev/go-safe-browsing-api"
)

func main() {
    key := "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA_BBBBBBBBB"
    dataDir := "./data"

    // only work from local files.
	safebrowsing.OfflineMode = true

	sb, err = safebrowsing.NewSafeBrowsing(key, dataDir)
	...
}

In this mode IsListed will always return an error complaining that the list has not been updated within the last 45 mins and no warnings may be shown to users.

Example Webserver

The package also includes a small JSON endpoint for the bulk querying of URLs. It has an additional config dependency, so it can be installed with something like:

go get github.com/rjohnsondev/go-safe-browsing-api
go get github.com/BurntSushi/toml
go install github.com/rjohnsondev/go-safe-browsing-api/webserver

The server takes a config file as a parameter, an example one is provided with the source, but here's the contents for convenience:

# example config file for safe browsing server
address = "0.0.0.0:8080"
googleApiKey = ""
dataDir = "/tmp/safe-browsing-data"
# enable example usage page at /form
enableFormPage = true

The config requires at a minimum your Google API key to be added (otherwise you'll get a nice non-friendly go panic). Once up and running it provides a helpful example page at http://localhost:8080/form

Other Notes

Memory Usage

The current implementation stores hashes in a reasonably effecient hat-trie data structure (bundled from https://github.com/dcjones/hat-trie). This results in a memory footprint of approximately 35MB.

File Format

The files stored by the library are gob streams of Chunks. They should be portable between identical versions of the library.

Documentation

Overview

Package safebrowsing is a generated protocol buffer package.

It is generated from these files:

chunkdata.proto

It has these top-level messages:

ChunkData

Index

Constants

View Source
const CHUNK_TYPE_ADD = ChunkData_ChunkType(0)
View Source
const CHUNK_TYPE_SUB = ChunkData_ChunkType(1)
View Source
const PREFIX_32B = ChunkData_PrefixType(1)
View Source
const PREFIX_32B_SZ = 32
View Source
const PREFIX_4B = ChunkData_PrefixType(0)
View Source
const PREFIX_4B_SZ = 4

Variables

View Source
var AppVersion string = "1.5.2"
View Source
var ChunkData_ChunkType_name = map[int32]string{
	0: "ADD",
	1: "SUB",
}
View Source
var ChunkData_ChunkType_value = map[string]int32{
	"ADD": 0,
	"SUB": 1,
}
View Source
var ChunkData_PrefixType_name = map[int32]string{
	0: "PREFIX_4B",
	1: "FULL_32B",
}
View Source
var ChunkData_PrefixType_value = map[string]int32{
	"PREFIX_4B": 0,
	"FULL_32B":  1,
}
View Source
var Client string = "api"
View Source
var ErrOutOfDateHashes = errors.New("Unable to check listing, list hasn't been updated for 45 mins")
View Source
var Logger logger = new(DefaultLogger)
View Source
var OfflineMode bool = false
View Source
var ProtocolVersion string = "3.0"
View Source
var SupportedLists map[string]bool = map[string]bool{
	"goog-malware-shavar":  true,
	"googpub-phish-shavar": true,
}
View Source
var Transport *http.Transport = &http.Transport{}

Functions

func Canonicalize

func Canonicalize(fullurl string) (canonicalized string)

Canonicalize a URL as needed for safe browsing lookups. This is required before obtaining the host key or generating url lookup iterations.

func ExtractHostKey

func ExtractHostKey(fullUrl string) (url string)

Extract the host from a URL in a format suitable for hashing to generate a Host Key. NOTE: We assume that the URL has already be Canonicalized

func GenerateTestCandidates

func GenerateTestCandidates(url string) (urls []string)

Generate all required iterations of the URL for checking against the lookup table. NOTE: We assume that the URL has already be Canonicalized

Types

type ChunkData

type ChunkData struct {
	ChunkNumber *int32                `protobuf:"varint,1,req,name=chunk_number" json:"chunk_number,omitempty"`
	ChunkType   *ChunkData_ChunkType  `protobuf:"varint,2,opt,name=chunk_type,enum=safebrowsing.ChunkData_ChunkType,def=0" json:"chunk_type,omitempty"`
	PrefixType  *ChunkData_PrefixType `protobuf:"varint,3,opt,name=prefix_type,enum=safebrowsing.ChunkData_PrefixType,def=0" json:"prefix_type,omitempty"`
	// Stores all SHA256 add or sub prefixes or full-length hashes. The number
	// of hashes can be inferred from the length of the hashes string and the
	// prefix type above.
	Hashes []byte `protobuf:"bytes,4,opt,name=hashes" json:"hashes,omitempty"`
	// Sub chunks also encode one add chunk number for every hash stored above.
	AddNumbers       []int32 `protobuf:"varint,5,rep,packed,name=add_numbers" json:"add_numbers,omitempty"`
	XXX_unrecognized []byte  `json:"-"`
}

Chunk data encoding format for the shavar-proto list format.

func ReadChunk

func ReadChunk(data []byte, length uint32) (chunk *ChunkData, new_len uint32, err error)

func (*ChunkData) GetAddNumbers

func (m *ChunkData) GetAddNumbers() []int32

func (*ChunkData) GetChunkNumber

func (m *ChunkData) GetChunkNumber() int32

func (*ChunkData) GetChunkType

func (m *ChunkData) GetChunkType() ChunkData_ChunkType

func (*ChunkData) GetHashes

func (m *ChunkData) GetHashes() []byte

func (*ChunkData) GetPrefixType

func (m *ChunkData) GetPrefixType() ChunkData_PrefixType

func (*ChunkData) ProtoMessage

func (*ChunkData) ProtoMessage()

func (*ChunkData) Reset

func (m *ChunkData) Reset()

func (*ChunkData) String

func (m *ChunkData) String() string

type ChunkData_ChunkType

type ChunkData_ChunkType int32

The chunk type is either an add or sub chunk.

const (
	ChunkData_ADD ChunkData_ChunkType = 0
	ChunkData_SUB ChunkData_ChunkType = 1
)
const Default_ChunkData_ChunkType ChunkData_ChunkType = ChunkData_ADD

func (ChunkData_ChunkType) Enum

func (ChunkData_ChunkType) String

func (x ChunkData_ChunkType) String() string

func (*ChunkData_ChunkType) UnmarshalJSON

func (x *ChunkData_ChunkType) UnmarshalJSON(data []byte) error

type ChunkData_PrefixType

type ChunkData_PrefixType int32

Prefix type which currently is either 4B or 32B. The default is set to the prefix length, so it doesn't have to be set at all for most chunks.

const (
	ChunkData_PREFIX_4B ChunkData_PrefixType = 0
	ChunkData_FULL_32B  ChunkData_PrefixType = 1
)
const Default_ChunkData_PrefixType ChunkData_PrefixType = ChunkData_PREFIX_4B

func (ChunkData_PrefixType) Enum

func (ChunkData_PrefixType) String

func (x ChunkData_PrefixType) String() string

func (*ChunkData_PrefixType) UnmarshalJSON

func (x *ChunkData_PrefixType) UnmarshalJSON(data []byte) error

type ChunkNum

type ChunkNum int32

type DefaultLogger

type DefaultLogger struct{}

Default logger provides a simple console output implementation of the logger interface. This is intended for logger dependency injection, such as log4go.

func (*DefaultLogger) Critical

func (dl *DefaultLogger) Critical(arg0 interface{}, args ...interface{}) error

func (*DefaultLogger) Debug

func (dl *DefaultLogger) Debug(arg0 interface{}, args ...interface{})

func (*DefaultLogger) Error

func (dl *DefaultLogger) Error(arg0 interface{}, args ...interface{}) error

func (*DefaultLogger) Fine

func (dl *DefaultLogger) Fine(arg0 interface{}, args ...interface{})

func (*DefaultLogger) Finest

func (dl *DefaultLogger) Finest(arg0 interface{}, args ...interface{})

func (*DefaultLogger) Info

func (dl *DefaultLogger) Info(arg0 interface{}, args ...interface{})

func (*DefaultLogger) Trace

func (dl *DefaultLogger) Trace(arg0 interface{}, args ...interface{})

func (*DefaultLogger) Warn

func (dl *DefaultLogger) Warn(arg0 interface{}, args ...interface{}) error

type FullHashCache

type FullHashCache struct {
	CreationDate  time.Time
	CacheLifeTime int
}

type HatTrie

type HatTrie struct {
	// contains filtered or unexported fields
}

func NewTrie

func NewTrie() *HatTrie

func (*HatTrie) Delete

func (h *HatTrie) Delete(key string)

func (*HatTrie) Get

func (h *HatTrie) Get(key string) bool

func (*HatTrie) Iterator

func (h *HatTrie) Iterator() *HatTrieIterator

func (*HatTrie) Set

func (h *HatTrie) Set(key string)

type HatTrieIterator

type HatTrieIterator struct {
	// contains filtered or unexported fields
}

func (*HatTrieIterator) Next

func (i *HatTrieIterator) Next() string

type HostHash

type HostHash string

type LookupHash

type LookupHash string

type SafeBrowsing

type SafeBrowsing struct {
	DataDir string

	Key             string
	Client          string
	AppVersion      string
	ProtocolVersion string

	UpdateDelay int
	LastUpdated time.Time

	Lists map[string]*SafeBrowsingList
	Cache map[HostHash]*FullHashCache

	Logger logger
	// contains filtered or unexported fields
}

func NewSafeBrowsing

func NewSafeBrowsing(apiKey string, dataDirectory string) (sb *SafeBrowsing, err error)

func (*SafeBrowsing) IsListed

func (sb *SafeBrowsing) IsListed(url string) (list string, err error)

Check to see if a URL is marked as unsafe by Google. Returns what list the URL is on, or an empty string if the URL is unlisted. Note that this query may perform a blocking HTTP request; if speed is important it may be preferable to use MightBeListed which will return quickly. If showing a warning to the user however, this call must be used.

func (*SafeBrowsing) IsUpToDate

func (sb *SafeBrowsing) IsUpToDate() bool

Checks to ensure we have had a successful update in the last 45 mins

func (*SafeBrowsing) MightBeListed

func (sb *SafeBrowsing) MightBeListed(url string) (list string, fullHashMatch bool, err error)

Check to see if a URL is likely marked as unsafe by Google. Returns what list the URL may be listed on, or an empty string if the URL is not listed. Note that this query does not perform a "request for full hashes" and MUST NOT be used to show a warning to the user.

func (*SafeBrowsing) UpdateProcess

func (sb *SafeBrowsing) UpdateProcess() (err error)

type SafeBrowsingList

type SafeBrowsingList struct {
	Name     string
	FileName string

	DataRedirects []string
	DeleteChunks  map[ChunkData_ChunkType]map[ChunkNum]bool
	ChunkRanges   map[ChunkData_ChunkType]string

	// lookup map only contain prefix hash
	Lookup            *HatTrie
	FullHashRequested *HatTrie
	FullHashes        *HatTrie

	Logger logger
	// contains filtered or unexported fields
}

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL