sequtils

package module
v0.18.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 23, 2021 License: MIT Imports: 8 Imported by: 1

Documentation

Overview

Sets of utilities to parse Fastq/a files

Sets of utilities to parse Fastq/a files

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func CreateFile added in v0.17.0

func CreateFile(o string) (*bufio.Writer, *os.File, error)

func CreateFileGzip added in v0.17.0

func CreateFileGzip(o string) (*gzip.Writer, *os.File, error)

func ExtractFna

func ExtractFna(file, prefix, taxid string, heads map[string]bool) error

Could be merged / removed because it can be accomplished with combination of simpler function/methods combination Load, Index, NewFasta, AddSequence, Write

func GetFileType

func GetFileType(inf *os.File) (string, error)

Read start of file to determiner if file is gzip compressed or not. Returns an error if there is a problem with the open file

func MatchFastaId added in v0.14.0

func MatchFastaId(header, id, substr string) bool

Utility to determine if an id+substr is found in a sequence header

func MatchFastqId added in v0.14.0

func MatchFastqId(header, id, substr string) bool

Utility to determine if an id+substr is found in a sequence header

func OpenFastq added in v0.17.0

func OpenFastq(file string) (*bufio.Reader, *os.File, error)

Simple routine to open and create a reader for a fastq file. Determines if the file is gzipped or not

Types

type Association added in v0.9.0

type Association struct {
	Old string
	New string
}

Type used to keep track of old and new header names. For use with Replace, AddHeaderPrefix and AddHeaderSuffix

type Converter added in v0.9.0

type Converter []*Association

func NewConverter added in v0.9.0

func NewConverter(nb int) Converter

Convenience method to create a new converter and allocate enough space for `nb` Associations

func (Converter) AddAssociation added in v0.9.0

func (c Converter) AddAssociation(idx int, o, n string) error

Adds an `Association` to the converter, replacing any existing one at position `idx` Return an error if `idx >= len(c)`

func (Converter) ToCsv added in v0.9.0

func (c Converter) ToCsv(o string) error

type Fasta

type Fasta struct {
	Header   string
	Sequence string
}

func NewFasta added in v0.8.0

func NewFasta(h, s string) *Fasta

function to create a fasta struct from two strings Take two arguments, 1. h : The header to use (will automatically add the starting '>') 2. s : The sequence itself.

func (*Fasta) Rename added in v0.8.2

func (f *Fasta) Rename(h string)

Changes the header to `h`

func (*Fasta) Subsequence added in v0.6.0

func (f *Fasta) Subsequence(s, e int) (*Fasta, error)

type Fastq

type Fastq struct {
	Fasta
	Quality string
}

func NewFastq added in v0.10.0

func NewFastq(h, s, q string) *Fastq

function to create a fastq struct from three strings Take three arguments, 1. h : The header to use (will automatically add the starting '@'); 2. s : The sequence itself, and; 3. q : The quality of the sequence

func (*Fastq) GetQuality added in v0.10.0

func (f *Fastq) GetQuality(phred64 bool) float64

func (*Fastq) Rename added in v0.10.0

func (f *Fastq) Rename(h string)

Changes the header to `h`

func (*Fastq) Subsequence added in v0.10.0

func (f *Fastq) Subsequence(s, e int) (*Fastq, error)

type Fna

type Fna []*Fasta

func LoadFasta

func LoadFasta(file string) (Fna, error)

func NewFna added in v0.8.1

func NewFna() Fna

Convinience function to create a Fna struct

func (*Fna) AddFasta added in v0.8.1

func (f *Fna) AddFasta(seq *Fasta)

Convinience method to add a fasta struct to the Fna

func (*Fna) AddHeaderPrefix added in v0.9.0

func (f *Fna) AddHeaderPrefix(h, d string) (Converter, error)

Adds prefix `h` to the sequences' header seperating both by delimiter `d` Return the associated Converter and an error.

func (*Fna) AddHeaderSuffix added in v0.9.0

func (f *Fna) AddHeaderSuffix(h, d string) (Converter, error)

Adds prefix `h` to the sequences' header seperating both by delimiter `d` Return the associated Converter and an error.

func (*Fna) FilterLength

func (f *Fna) FilterLength(min, max int)

Simple Utility to filter the lengths of fasta sequences. Iterates over a slice of Sequences (Fna) and filters out the sequences whose lengths are lower or greater than the specified mininimum and maximum. If maximum <= 0, max is set to the length of the sequence. Returns filtered Fna

func (*Fna) FilterSequences added in v0.11.3

func (f *Fna) FilterSequences(ids []string, idx SeqIndex, substr string, exact, exclude, warn bool) error

Utility to filter out fasta sequences given a list of ids. Iterates over a slice of Sequences (Fna) and filters out the sequences whose headers have been provided. There are two modes here. exact or substring match. If exact is true, we use the index to search for the headers. The requires the provided headers to be an exact match to those found in the fasta file. If exact is set to false, Then we need to iterate over the index key and look for a partial match in the headers. This may return multiple results per provided id. In the event that the headers aren't exact, it is possible to provide a pattern to limit the number of results reported by the filter. for instance, if the header is composed of two space seperated strings, you could provide `\s+` to the regexp parser using the `substr` argument. It is also possible to reverse the result, meaning removing the provided sequences instead of only keeping then. Set exclude to true if that is the desired output. Also, by default, it will return an error any of the provided headers are not found. This behavior can be modified with warn set to true, which will tell the user an id was not found instead

func (*Fna) Get added in v0.9.15

func (f *Fna) Get(r string, fIdx SeqIndex) (*Fasta, error)

Method to get a sequence from a Fna using a SeqIndex. Requires that the fasta be Indexed first Return the desired sequence and an error. Error should be NotFound, if any.

func (*Fna) HeaderFromConverter added in v0.15.0

func (f *Fna) HeaderFromConverter(c MapConverter) error

TODO

func (*Fna) Index added in v0.7.0

func (f *Fna) Index() (SeqIndex, error)

Index the loaded fasta file using fasta headers Assumes that all headers are unique, raises an error if not

func (*Fna) ReplaceHeader added in v0.9.0

func (f *Fna) ReplaceHeader(h string) (Converter, error)

Replaces all existing headers by `h` follow by a sequential number Return the associated Converter and an error.

func (*Fna) Swap added in v0.11.3

func (f *Fna) Swap(i, j int, sIdx SeqIndex)

Swap method to swap two items in an index fasta

func (Fna) Write

func (f Fna) Write(o string) error

type Fsq

type Fsq []*Fastq

func LoadFastq added in v0.10.0

func LoadFastq(file string) (Fsq, error)

Function to read in a fastq file, compressed or not and load it in a type Fsq ([]*Fastq). Return an error if the file can't be opened or if there is an error while reading

func LoadNFastq added in v0.17.0

func LoadNFastq(n int, r *bufio.Reader) (Fsq, bool, error)

Function to read in a fastq file N sequences at a time, compressed or not and load it in a type Fsq ([]*Fastq). Return an error if the file can't be opened or if there is an error while reading

func NewFsq added in v0.10.0

func NewFsq() Fsq

Convinience function to create a Fsq struct

func (*Fsq) AddFastq added in v0.10.0

func (f *Fsq) AddFastq(seq *Fastq)

Convinience method to add a fasta struct to the Fna

func (*Fsq) FilterLength added in v0.10.0

func (f *Fsq) FilterLength(min, max int)

Simple Utility to filter the lengths of fasta sequences. Iterates over a slice of Sequences (Fna) and filters out the sequences whose lengths are lower or greater than the specified mininimum and maximum. If maximum <= 0, max is set to the length of the sequence. Returns filtered Fna

func (*Fsq) FilterSequences added in v0.12.0

func (f *Fsq) FilterSequences(ids []string, idx SeqIndex, substr string, exact, exclude, warn bool) error

Utility to filter out fastq sequences given a list of ids. Iterates over a slice of Sequences (Fna) and filters out the sequences whose headers have been provided. There are two modes here. exact or substring match. If exact is true, we use the index to search for the headers. The requires the provided headers to be an exact match to those found in the fastq file. If exact is set to false, Then we need to iterate over the index key and look for a partial match in the headers. This may return multiple results per provided id. In the event that the headers aren't exact, it is possible to provide a pattern to limit the number of results reported by the filter. for instance, if the header is composed of two space seperated strings, you could provide `\s+` to the regexp parser using the `substr` argument. It is also possible to reverse the result, meaning removing the provided sequences instead of only keeping then. Set exclude to true if that is the desired output. Also, by default, it will return an error any of the provided headers are not found. This behavior can be modified with warn set to true, which will tell the user an id was not found instead

func (*Fsq) Get added in v0.10.0

func (f *Fsq) Get(r string, fIdx SeqIndex) (*Fastq, error)

Method to get a sequence from a Fna using a SeqIndex. Requires that the fasta be Indexed first Return the desired sequence and an error. Error should be NotFound, if any.

func (*Fsq) GetAvgQuality added in v0.10.0

func (f *Fsq) GetAvgQuality(phred bool) float64

func (*Fsq) Index added in v0.10.0

func (f *Fsq) Index() (SeqIndex, error)

Index the loaded fasta file using fasta headers Assumes that all headers are unique, raises an error if not

func (*Fsq) SearchIndex added in v0.16.0

func (f *Fsq) SearchIndex(i string, b, e, m int) error

func (*Fsq) Swap added in v0.12.0

func (f *Fsq) Swap(i, j int, sIdx SeqIndex)

Swap method to swap two items in an index fasta

func (Fsq) Write added in v0.10.0

func (f Fsq) Write(o string) error

func (Fsq) WriteAppend added in v0.17.0

func (f Fsq) WriteAppend(w *bufio.Writer) error

func (Fsq) WriteAppendGzip added in v0.17.0

func (f Fsq) WriteAppendGzip(w *gzip.Writer) error

func (Fsq) WriteGzip added in v0.11.0

func (f Fsq) WriteGzip(o string) error

type MapConverter added in v0.15.0

type MapConverter map[string]string

New type that will replace Converter TODO map[old]new

func LoadConverter added in v0.15.0

func LoadConverter(f string, flip bool) (MapConverter, error)

func NewMapConverter added in v0.15.0

func NewMapConverter() MapConverter

func (MapConverter) AddAssociation added in v0.15.0

func (c MapConverter) AddAssociation(o, n string) error

type SeqIndex added in v0.10.0

type SeqIndex map[string]int

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL