krangio

package module
v0.2.12 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 16, 2023 License: BSD-3-Clause Imports: 5 Imported by: 0

README

Krang IO

This module contains the Go bindings and Proto file definitions and types for Krang and it's microservices to be used by GRPC servers.

made-with-Go Test codecov

Installation

go get -u github.com/krang-backlink/io

Types

All tasks should accept and return a krangio.Page, see types.go for more detailed insight into it's field data. Below is an example of the Page struct stored in Mongo.

type (
	// Page represents an individual task scrape including
	// metadata from the Task.
	Page struct {
		ID             primitive.ObjectID  `bson:"_id,omitempty" json:"id"`
		ScrapeID       *primitive.ObjectID `bson:"scrape_id" json:"scrape_id"`
		URL            string              `bson:"url" json:"url"`
		GroupSlug      string              `bson:"group_slug,omitempty" json:"group_slug"`
		TaskID         int64               `bson:"task_id,omitempty" json:"task_id"`
		SearchTerm     string              `json:"search_term" bson:"search_term"`
		RelevancyScore uint                `json:"relevancy_score" bson:"relevancy_score"`
		SiteScore      uint                `json:"site_score" bson:"site_score"`
		Scrape         Scrape              `bson:"scrape" json:"scrape"`
		UpdatedAt      time.Time           `bson:"updated_at" json:"updated_at"`
		CreatedAt      time.Time           `bson:"created_at" json:"created_at"`
	}
)

Errors

Any error that is returned from a Lambda function should be of type krangio.Error for error reporting. See below on how to create and consume one.

Types
type (
	// Error represents an error that occurred during the
	// processing of a Krang Lambda function.
	Error struct {
		Err     *errors.Error `json:"error"`
		Service string        `json:"service"` // Currently running function, for example "scrape"
		Meta    Meta          `json:"meta"`
	}
	// Meta represents the attributes of a failed task.
	Meta struct {
		GroupSlug  string         `json:"group_slug"`
		TaskID     int64          `json:"task_id"`
		ScrapeID   string         `json:"scrape_id"`
		URL        string         `json:"url"`
		SearchTerm string         `json:"search_term"`
		// Any additional data
		Data       map[string]any `json:"data"`
	}
)

Create a new Lambda error

meta := krangio.Meta{
	GroupSlug:  in.GroupSlug,
	TaskID:     in.TaskID,
	URL:        in.URL,
	SearchTerm: in.SearchTerm,
}

status, err := myThing()
if err != nil {
	return in, lambda.NewError(err, ServiceName, meta)
}

Proto

Each service contains its own proto definition along with its implementation for Server/Client.

message CompleteRequest {
	string id = 1;
	string scrape_id = 2;
	string url = 3;
	string group_slug = 4;
	int64 task_id = 5;
	string search_term = 6;
}

message Response {
	bool error = 2;
	string message = 3;
}

service TasksService {
	rpc CompleteTask(CompleteRequest) returns(Response) {}
}

Usage

func Send() error {
	conn, err := grpc.Dial(":9000", grpc.WithTransportCredentials(insecure.NewCredentials())
	if err != nil {
	return err
	}

	s := proto.NewTasksServiceClient(conn)

	response, err := s.Scrape(context.Background(), &proto.CompleteRequest{
		GroupSlug: "",
		TaskId:    0,
		PageId:    "",
		Url:       "",
	})
	if err != nil {
		return err
	}

	fmt.Println(response)

	return nil
}

Development

To set up this repository, run:

make setup

To generate the proto files run:

make generate

Documentation

Index

Constants

View Source
const (
	// LogDatabase defines the database name for log entries
	// via Mongo.
	LogDatabase = "logs"
	// LogService defines the collection name for log entries
	// via Mongo.
	LogService = "api"
)

Variables

This section is empty.

Functions

func GetObjectID added in v0.2.0

func GetObjectID(hex string) *primitive.ObjectID

GetObjectID returns the primitive.ObjectID if there is one set, otherwise it returns nil.

Types

type BackLinkCheck added in v0.0.15

type BackLinkCheck struct {
	GroupSlug string `json:"group_slug" bson:"group_slug"`
	LinkID    int64  `json:"link_id" bson:"link_id"`
	URL       string `json:"url" bson:"url"`
	Link      string `json:"link" bson:"link"`

} //@name BackLinkCheck

BackLinkCheck represents the data sent to the Lambda function for checking if a backlink appears on the page.

type Error

type Error struct {
	Err     *errors.Error `json:"error" bson:"error"`
	Service string        `json:"service" bson:"service"` // Currently running function, for example "scrape"
}

Error represents an error that occurred during the processing of a Krang Lambda function.

func NewError

func NewError(err error, service string) *Error

NewError returns a new Lambda error.

func (*Error) Error

func (e *Error) Error() string

Error returns the JSON representation of the error message by implementing the error interface.

func (*Error) ToMap added in v0.1.3

func (e *Error) ToMap() map[string]any

ToMap returns a map of the error if there is one.

type Page

type Page struct {
	ID             primitive.ObjectID  `json:"id" bson:"_id,omitempty"`
	ScrapeID       *primitive.ObjectID `json:"scrape_id" bson:"scrape_id"`
	UUID           string              `json:"uuid,omitempty" bson:"-"` // Used for SQS dedupe.
	URL            string              `json:"url" bson:"url"`
	GroupSlug      string              `json:"group_slug" bson:"group_slug"`
	ProjectID      int64               `json:"project_id" bson:"project_id"`
	TaskID         int64               `json:"task_id" bson:"task_id"`
	SearchTerm     string              `json:"search_term" bson:"search_term"`
	RelevancyScore int                 `json:"relevancy_score" bson:"relevancy_score"`
	SiteScore      int                 `json:"site_score" bson:"site_score"`
	Scrape         Scrape              `json:"scrape" bson:"scrape,omitempty"`
	Status         ScrapeStatus        `json:"status" bson:"status"`
	Usage          PageUsage           `json:"usage" bson:"usage"`
	UpdatedAt      time.Time           `json:"updated_at" bson:"updated_at"`
	CreatedAt      time.Time           `json:"created_at" bson:"created_at"`

} //@name Page

Page represents an individual task scrape including metadata from the Task.

func (*Page) HasScrape added in v0.2.0

func (p *Page) HasScrape() bool

HasScrape determines if a page has a Scrape ID attached to it.

func (*Page) LogMessage added in v0.2.0

func (p *Page) LogMessage(service string) string

LogMessage returns a formatted message for processing Lambda functions.

func (*Page) LoggerFields added in v0.2.0

func (p *Page) LoggerFields(service string) map[string]any

LoggerFields returns logrus Fields to log the Page meta data.

type PageUsage added in v0.2.7

type PageUsage struct {
	Ahrefs PageUsageAhrefs `json:"ahrefs" bson:"ahrefs"`

} //@name PageUsage

PageUsage represents any costs that have been associated with the page.

type PageUsageAhrefs added in v0.2.7

type PageUsageAhrefs struct {
	Rows         int  `json:"rows_used" bson:"rows"`
	UnitCostRows int  `json:"unit_cost_rows" bson:"unit_cost_rows"`
	Cached       bool `json:"cached" bson:"cached"`
	Called       bool `json:"called" bson:"called"`

} //@name PageUsageAhrefs

PageUsageAhrefs represents the total amount of cost a singular call to Ahrefs cost.

type Scrape

type Scrape struct {
	ID         primitive.ObjectID `json:"id" bson:"_id,omitempty"`
	URL        string             `json:"-" bson:"url" swagggerignore:"true"`
	HTTPStatus int                `json:"http_status" bson:"http_status"`
	Content    ScrapeContent      `json:"content" bson:"content"`
	Metrics    ScrapeMetrics      `json:"metrics" bson:"metrics"`
	Message    string             `json:"message" bson:"message"`
	Status     ScrapeStatus       `json:"status" bson:"status"`
	Error      any                `json:"error" bson:"error"`
	Service    string             `json:"service" bson:"service"` // Currently running function, for example "scrape"`
	UpdatedAt  time.Time          `json:"updated_at" bson:"updated_at"`
	CreatedAt  time.Time          `json:"created_at" bson:"created_at"`

} //@name Scrape

Scrape represents an individual scrape of a page and its various metrics.

type ScrapeAhrefs added in v0.2.7

type ScrapeAhrefs struct {
	DR   float64  `json:"dr" bson:"dr"`     // Domain Ranking
	Rank *float64 `json:"rank" bson:"rank"` // Ahrefs Rank

} //@name ScrapeAhrefs

ScrapeAhrefs represents the metrics retrieved from the Ahrefs API including cost, rows and if it was cached.

type ScrapeContent

type ScrapeContent struct {
	H1            string          `json:"h1" bson:"h1"`
	H2            string          `json:"h2" bson:"h2"`
	Title         string          `json:"title" bson:"title"`
	ExternalLinks int             `json:"external_links" bson:"external_links"`
	Keywords      []ScrapeKeyword `json:"keywords" bson:"keywords"`

} //@name ScrapeContent

ScrapeContent represents the HTML markup of a page including any <body> content that's relevant for scoring.

type ScrapeKeyword added in v0.0.13

type ScrapeKeyword struct {
	Term     string  `json:"term" bson:"term"`
	Salience float64 `json:"salience" bson:"salience"`

} //@name ScrapeKeyword

ScrapeKeyword represents a singular entity extracted from a given piece of text.

type ScrapeMetrics

type ScrapeMetrics struct {
	Ahrefs ScrapeAhrefs `json:"ahrefs" bson:"ahrefs"`

} //@name ScrapeMetrics

ScrapeMetrics represents the scores and metrics retrieved from Ahrefs, Moz and Majestic.

type ScrapeStatus added in v0.1.3

type ScrapeStatus string

ScrapeStatus status represents the status of a page task.

const (
	// ScrapeStatusProcessing is the status that defines
	// a processing page.
	ScrapeStatusProcessing ScrapeStatus = "processing"
	// ScrapeStatusFailed is the status that defines
	// a failed page task.
	ScrapeStatusFailed ScrapeStatus = "failed"
	// ScrapeStatusTimedOut is the status that defines
	// a timed out page task.
	ScrapeStatusTimedOut ScrapeStatus = "timed-out"
	// ScrapeStatusSuccess is the status that defines
	// a successful page task.
	ScrapeStatusSuccess ScrapeStatus = "success"
)

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL