Class Cassiopee::Crawler
In: lib/cassiopee.rb
Parent: Object

Base class to index and search through a string

Methods

Constants

METHOD_DIRECT = 0
METHOD_SUFFIX = 1
FILE_SUFFIX_EXT = ".sfx"
FILE_SUFFIX_POS = ".sfp"
SUFFIXLEN = 'suffix_length'

Attributes

ambiguous  [RW]  Ambiguity map (Hash)
comments  [RW]  Array of comment characters to skip lines in input sequence file
file_suffix  [RW]  Suffix files name/path
maxthread  [RW]  Max number fo threads to use (not yet used)
method  [RW]  Method for search FORCE or SUFFIX
  • SUFFIX loads all suffixes and search through them afterwards, interesting for multiple searches (suffixes are reused)
  • FORCE checks matches while crossing the suffixes. Does not keep parsed data for later search FORCE method does not yet support optimal filters
useAmbiguity  [RW]  Use alphabet ambiguity (dna/rna) in search, automatically set with loadAmbiguityFile
useCache  [RW]  Manage basic cache to store previous match
use_store  [RW]  Use persistent suffix file ?

Public Class methods

Public Instance methods

Clear suffixes in memory If using use_store, clear the store too

Extract un suffix from suffix file based on md5 match

Filter the array of positions with defined position filter

Filter matches to be between min and max start position If not using use_store, search speed is improved but existing indexes are cleared If max=0, then max is string length Must be called after index creation or load

Index an input file Clear existing indexes

Index an input string Clear existing indexes

Load ambiguity rules from a file File format should be:

  • A=B,C D=E,F …

Load sequence from a previous index command

Iterates over matches

Search an approximate string

  • support insertion, deletion, substitution
  • If edit > 0, use Hamming
  • Else use Levenshtein

Search exact match

Set Logger level

[Validate]