google-cloud-storage

module

v0.0.0-...-f0d81c3 Latest Latest Go to latest Published: Apr 12, 2024 License: Apache-2.0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/routeviews/google-cloud-storage

README ¶

Collect and store RouteViews data, from a Cloud Service

A collaboration between RouteViews and Google to expand RouteViews data offerings into Google Cloud products, and offload some RouteViews' data offerings to a Google cloud backup/mirror.

Overview

Routeviews(RV) provides data collected from various route collectors. The data comes from BGP peers to the route collectors, historical data from at least 2004 is available. Review the RouteViews website for specifics on data formats / timing.

This project's goal is to provide the RV data through a standard cloud storage mechanism uploading/archiving the data to at least:

Examples

For accessing RouteViews data in Google, see the examples provided in "examples/".

Implementation Notes

Archiving to cloud-storage may be accomplished through a simple signaling method from the RV archive server(s) to a service running in Google Cloud which is provided the file path/name and file content upon local archive completion at RV.

Storing the data into bigquery should be enabled at the time of cloud-storage write as well, after converting the RIB or Update data from MRT to JSON matching the bigquery data model. (NOTE: possibly AVRO is better for this than JSON. AVRO golang code from linkedin/github)

Requirements

A metadata service must be built to track the state of each file in process.

Initially a CLI client for the RV upload part of the solution which can be run with simple command-line options such as:

$ routeviews-google-upload --file <filepath> --dest https://grpc.routeviews.org

This CLI tool is implemented! See more details in the routeviews-google-upload PyPI package

The CLI tool should package up the file content, path and a MD5 checksum of the content in a Google Protobuf, and send that data over a gRPC connection to a cloud service. An upload event should be idempotent, meaning uploading the same file multiple times should not negatively impact the archive.

The cloud portion of the gRPC service should be served behind a load balancer in order to provide a resilient and scalable service. The load-balanced service will accept the gRPC request, upload the raw file content to cloud storage, and parse the file to JSON (or AVRO?) and store the result in a cloud storage location adjacent to the raw file. Once stored, the data should be loaded into the BigQuery instance and a reply to the CLI caller should be sent.

The server must provide either affirmation that the files were handled properly, or an error with appropriate status information about the fate of the file, conversion and bigquery uplaod.

Metadata

The following metadata should be retrievable during and after the gRPC request has been made. Many of these attributes may be included from the standard set of metadata API (https://cloud.google.com/storage/docs/metadata) attributes. Some of these attributes overlap with FileRequest spec. It's unclear where this metadata will sit in relation to the actual data, or if the metadata will be searchable/filterable.

name: status
type: string
description: The current status of the file transfer, e.g. "None", "In Progress", "Done"
name: filename
type: string
description: The filepath used in the FileRequest gRPC call.
name: content-type
type: string
description: The IANA media type of the file, e.g. application/octet-stream, or application/MRT?
name: content-encoding
type: string
description: The encoding of file, e.g. bzip2
name: updated
type: timestamp
description: The last date and time the file or metadata was update.
name: project
type: string description: The project used in the FileRequest gRPC call
name: collector ID
type: string
description: An identifier that can be associated with a collector, e.g. route-views.amsix
name: MRT_type
type: enum description: The type of the MRT file, e.g. RIB or UPDATE

For Developers

With the gRPC interface defined, we actually have to instantiate a server and client that implement this interface.

For simplicity of maintaining RouteViews infrastructure, the client is implemented in Python (similar to other RouteViews automation solutions). The client is implemented and owned by the University of Oregon Network and Telecom Services team, who are the primary maintainers of RouteViews.

See more details in the routeviews_google_upload project docs.

The server is implemented in Go and maintained by Chris Morrow at Google.

Directories ¶

Path	Synopsis
cmd
archive_synchronizer
archive_upload_client Package main implements a golang client to upload RV archive to a Cloud Run RV service.	Package main implements a golang client to upload RV archive to a Cloud Run RV service.
archive_upload_mass_upload Package mass_upload should upload content from a given ftp site to cloud storage.	Package mass_upload should upload content from a given ftp site to cloud storage.
archive_upload_server Package main is the grpc server which accepts file storage requests,	Package main is the grpc server which accepts file storage requests,
converter
pubsub_forwarder
utils/convert_all
utils/convert_local
utils/transfer_all
pkg
auth
bq_transfer Package bqtransfer provides utilities to transfer large amound of converted routing archives to BigQuery using Data Transfer Service.	Package bqtransfer provides utilities to transfer large amound of converted routing archives to BigQuery using Data Transfer Service.
mrt_converter
synchronizer package synchronizer uploads missing MRT archives to the gRPC archive server.	package synchronizer uploads missing MRT archives to the gRPC archive server.
utils/upload

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL