grawler

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 14, 2023 License: MIT Imports: 6 Imported by: 0

README

codecov Go Report Card

Grawler

Simple and performant web crawler in Go

How it works

The crawler is using 2 types of workers:

  • Page loaders
  • Page analyzers

Page loaders are consuming the remaining URL channel and are downloading pages from the internet and putting them into a cache, also putting the downloaded page's URL into the downloaded URL channel.

Page analyzers are consuming the downloaded URL channel and reading the page's content from the cache, then analyzing the content, extracting additional URLs and the wanted model (if possible). The extracted new URLs are being put into the remaining URL channel, the found model in the result channel.

The whole process is being started with putting the starting URL into the remaining URL channel.

The number of Page loaders and Page analyzers are configurable.

Your possibilities are endless: you can implement your own cache, page loader and analyzer, the mocks and interfaces in the source will help you.

For guidance, please have a look at crawler_test.go.

The gopher was made with the Gopher Konstructor

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CrawlerConfig

type CrawlerConfig struct {
	PageLoaders         int
	PageAnalyzers       int
	RemainingUrlChSize  int
	DownloadedUrlChSize int
	ResultChSize        int
}

type ExapleModel

type ExapleModel struct {
	Title   string
	Content string
}

type IAnalyzer

type IAnalyzer[T any] interface {
	GetUrls() []string
	GetModel() *T
}

type ICrawler

type ICrawler[T any] interface {
	Crawl(startingUrl string) chan *T
	Stop()
	WaitStopped()
}

func NewCrawler

func NewCrawler[T any](
	cache cache.ICache,
	createAnalyzer NewAnalyzer[T],
	pageLoader page_loader.IPageLoader,
	logger *gotils.Logger,
	baseUrl string,
	config CrawlerConfig,
) ICrawler[T]

type MockAnalyzer

type MockAnalyzer struct {
	GetUrls_  func() []string
	GetModel_ func() *ExapleModel
}

func (*MockAnalyzer) GetModel

func (m *MockAnalyzer) GetModel() *ExapleModel

func (*MockAnalyzer) GetUrls

func (m *MockAnalyzer) GetUrls() []string

type MockCrawler

type MockCrawler[T any] struct {
	Crawl_       func(startingUrl string) chan *T
	Stop_        func()
	WaitStopped_ func()
}

func (MockCrawler[T]) Crawl

func (c MockCrawler[T]) Crawl(startingUrl string) chan *T

func (MockCrawler[T]) Stop

func (c MockCrawler[T]) Stop()

func (MockCrawler[T]) WaitStopped

func (c MockCrawler[T]) WaitStopped()

type NewAnalyzer

type NewAnalyzer[T any] func(html, source *string) (IAnalyzer[T], error)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL