SIS
Symmetric Index Structures
|
Template class DocumentIndexingAutomaton for the family of index structures of type AutomatonType (i.e. CDAWGs, SCDAWGs, etc.) More...
#include <DocumentIndexingAutomaton.hpp>
Public Member Functions | |
DocumentIndexingAutomaton () | |
Constructor. | |
DocumentIndexingAutomaton (UINT symbolsize) | |
Constructor. | |
virtual | ~DocumentIndexingAutomaton () |
Desctructor. | |
virtual void | AddDocuments (const std::vector< DocumentName > &documents) |
Add documents to the index structure DocumentIndexingAutomaton. | |
virtual void | Index () |
The indexing step. | |
virtual void | Close () |
The closing step. | |
virtual DocumentIndexingAutomatonFindResults && | findall (const std::string &w) |
start a search / find all occurences of a string | |
virtual void | Shrink ()=0 |
virtual SinkState | Add (const std::string &)=0 |
virtual SinkState | Add (const S8 *string)=0 |
virtual SinkState | Add (const VoidSequence *)=0 |
virtual UINT | number_of_states () const =0 |
virtual UINT | number_of_transitions () const =0 |
virtual CompressedAutomaton * | automaton () const =0 |
Protected Member Functions | |
virtual DocumentIndexingAutomatonFindResults && | find (const PairState state, UINT length) |
protected method needed by findall(), re-adjusts to explicit state / canonical representative | |
virtual DocumentIndexingAutomatonFindResults && | findRec (const UINT s, const UINT length) |
protected method needed by find(), collects concrete positions in documents | |
virtual UINT | suffixLink (const UINT &state) const |
Follow a suffixlink. | |
virtual void | SortTransitions ()=0 |
Protected Attributes | |
ManagedAutomatonStage | ManagedAutomatonStage_ = EMPTY |
DocumentIndexingAutomatonMemberDescription | |
DocumentIndexingAutomaton's bookkeeping Generally, the information that is needed to carry out lookups is
To this end, all states need to have the document information attached in an indexing step, see Index() for more information. See below for a specific description of each member
| |
std::map< DocumentName, State > | _document_sinkstate |
association document -> sinkstate | |
std::map< DocumentName, PositionsTuple > | _document_length |
association document -> document's length information (including current, absolute and actual positions in data) | |
std::vector< DocumentName > | _document_names |
a list of all documents that have been added to the automaton (template AutomatonType) | |
std::map< State, std::vector < DocumentName > > | _states_documents |
Template class DocumentIndexingAutomaton for the family of index structures of type AutomatonType (i.e. CDAWGs, SCDAWGs, etc.)
AutomatonType | DocumentIndexingAutomaton is a template class capable of adding document-indexing capbailities to the whole family of index structures (i.e. CDAWGs, SCDAWGs, etc.). |
One way to achieve giving document-indexing capbalities to these classes would be inheritance for each one of the structures or – what is done in this case – to use generative programming to have these inheritances be created for you.
Definition at line 42 of file DocumentIndexingAutomaton.hpp.
lmu::cis::sis::DocumentIndexingAutomaton< AutomatonType >::DocumentIndexingAutomaton | ( | ) | [inline] |
Constructor.
This constructor defaults and delegates to DocumentIndexingAutomaton(UINT symbolsize = 1)
Definition at line 98 of file DocumentIndexingAutomaton.hpp.
lmu::cis::sis::DocumentIndexingAutomaton< AutomatonType >::DocumentIndexingAutomaton | ( | UINT | symbolsize | ) | [inline] |
Constructor.
symbolsize |
This constructor builds the templated AutomatonType (so, for instance, an SCDAWGAdapter) with DocumentIndexingAutomaton - capabilities
Definition at line 112 of file DocumentIndexingAutomaton.hpp.
virtual lmu::cis::sis::DocumentIndexingAutomaton< AutomatonType >::~DocumentIndexingAutomaton | ( | ) | [inline, virtual] |
Desctructor.
Definition at line 121 of file DocumentIndexingAutomaton.hpp.
virtual SinkState lmu::cis::sis::CompressedAutomatonAdapterInterface::Add | ( | const std::string & | ) | [pure virtual, inherited] |
Implemented in lmu::cis::sis::CompressedAutomatonAdapter.
virtual SinkState lmu::cis::sis::CompressedAutomatonAdapterInterface::Add | ( | const S8 * | string | ) | [pure virtual, inherited] |
Implemented in lmu::cis::sis::CompressedAutomatonAdapter.
virtual SinkState lmu::cis::sis::CompressedAutomatonAdapterInterface::Add | ( | const VoidSequence * | ) | [pure virtual, inherited] |
void lmu::cis::sis::DocumentIndexingAutomaton< T >::AddDocuments | ( | const std::vector< DocumentName > & | documents | ) | [virtual] |
Add documents to the index structure DocumentIndexingAutomaton.
When adding documents, two important pieces of information are saved:
Implements lmu::cis::sis::DocumentIndexingAutomatonInterface.
Definition at line 19 of file DocumentIndexingAutomaton.cpp.
Referenced by main().
virtual CompressedAutomaton* lmu::cis::sis::CompressedAutomatonAdapterInterface::automaton | ( | ) | const [inline, pure virtual, inherited] |
void lmu::cis::sis::DocumentIndexingAutomaton< T >::Close | ( | ) | [virtual] |
The closing step.
The closing step is a non-recoverable step, see DocumentIndexingAutomaton: Stages
Implements lmu::cis::sis::DocumentIndexingAutomatonInterface.
Definition at line 64 of file DocumentIndexingAutomaton.cpp.
Referenced by main().
DocumentIndexingAutomatonFindResults && lmu::cis::sis::DocumentIndexingAutomaton< T >::find | ( | const PairState | state, |
UINT | length | ||
) | [protected, virtual] |
protected method needed by findall(), re-adjusts to explicit state / canonical representative
When searching for all occurences of a pattern p, the chain goes like this: findall() calls find() calls findRec().
In this chain, findall() will pass a state to find(), regardless whether it is an implicit or an explicit state.
find(), as the second in the row, will re-position to the next explicit state (which is the canonical representative of findall()s search pattern) and adjust the current pattern's length accordingly in order to find the correct position in the document(s).
Lastly, find() will dispatch to findRec() to collect all concrete positions.
Implements lmu::cis::sis::DocumentIndexingAutomatonInterface.
Definition at line 161 of file DocumentIndexingAutomaton.cpp.
DocumentIndexingAutomatonFindResults && lmu::cis::sis::DocumentIndexingAutomaton< T >::findall | ( | const std::string & | w | ) | [virtual] |
start a search / find all occurences of a string
ManagedAutomatonStage_ must be = INDEXED, if not the automaton will be sorted and indexed first (if needed)
Implements lmu::cis::sis::DocumentIndexingAutomatonInterface.
Definition at line 120 of file DocumentIndexingAutomaton.cpp.
Referenced by main().
DocumentIndexingAutomatonFindResults && lmu::cis::sis::DocumentIndexingAutomaton< T >::findRec | ( | const UINT | s, |
const UINT | length | ||
) | [protected, virtual] |
protected method needed by find(), collects concrete positions in documents
In the chain of findall() -> find() -> findRec(), findRec()'s part is to collect the concrete positions of pattern p in the documents.
Implements lmu::cis::sis::DocumentIndexingAutomatonInterface.
Definition at line 183 of file DocumentIndexingAutomaton.cpp.
void lmu::cis::sis::DocumentIndexingAutomaton< T >::Index | ( | ) | [virtual] |
The indexing step.
The indexing step (following Blumer&Blumer) is crucial for all follwing findall() steps. Basically, it attaches the document's names to the states whose equivalence class can be found in those files (one or more times).
Definition at line 73 of file DocumentIndexingAutomaton.cpp.
Referenced by main().
virtual UINT lmu::cis::sis::CompressedAutomatonAdapterInterface::number_of_states | ( | ) | const [pure virtual, inherited] |
Implemented in lmu::cis::sis::CompressedAutomatonAdapter.
virtual UINT lmu::cis::sis::CompressedAutomatonAdapterInterface::number_of_transitions | ( | ) | const [pure virtual, inherited] |
Implemented in lmu::cis::sis::CompressedAutomatonAdapter.
virtual void lmu::cis::sis::CompressedAutomatonAdapterInterface::Shrink | ( | ) | [pure virtual, inherited] |
Implemented in lmu::cis::sis::SCDAWGAdapter, and lmu::cis::sis::CompressedAutomatonAdapter.
virtual void lmu::cis::sis::CompressedAutomatonAdapterInterface::SortTransitions | ( | ) | [protected, pure virtual, inherited] |
Implemented in lmu::cis::sis::SCDAWGAdapter, and lmu::cis::sis::InenagaCDAWGAdapter.
Referenced by main().
UINT lmu::cis::sis::DocumentIndexingAutomaton< T >::suffixLink | ( | const UINT & | state | ) | const [protected, virtual] |
Follow a suffixlink.
Suffixlinks exist in the automaton's respective buildhelp structure, i.e. SCDAWGBuildHelp for AutomatonType SCDAWG, CDAWGBuildHelp for AutomatonType CompressedAutomaton / inenagaCDAWG.
suffixlink()s are needed by Index() in order to attach document information to the states recursively from the sinkstates.
Implements lmu::cis::sis::CompressedAutomatonAdapterInterface.
Definition at line 223 of file DocumentIndexingAutomaton.cpp.
std::map<DocumentName, PositionsTuple> lmu::cis::sis::DocumentIndexingAutomaton< AutomatonType >::_document_length [protected] |
association document -> document's length information (including current, absolute and actual positions in data)
Definition at line 85 of file DocumentIndexingAutomaton.hpp.
std::vector<DocumentName> lmu::cis::sis::DocumentIndexingAutomaton< AutomatonType >::_document_names [protected] |
a list of all documents that have been added to the automaton (template AutomatonType)
Definition at line 86 of file DocumentIndexingAutomaton.hpp.
std::map<DocumentName, State> lmu::cis::sis::DocumentIndexingAutomaton< AutomatonType >::_document_sinkstate [protected] |
association document -> sinkstate
Definition at line 84 of file DocumentIndexingAutomaton.hpp.
std::map<State, std::vector<DocumentName> > lmu::cis::sis::DocumentIndexingAutomaton< AutomatonType >::_states_documents [protected] |
association state -> list of documents that state is attached to
Definition at line 87 of file DocumentIndexingAutomaton.hpp.
ManagedAutomatonStage lmu::cis::sis::ManagedStageAutomatonInterface::ManagedAutomatonStage_ = EMPTY [protected, inherited] |
Definition at line 10 of file ManagedStageAutomatonInterface.hpp.
Referenced by lmu::cis::sis::InenagaCDAWGAdapter::Add(), lmu::cis::sis::CompressedAutomatonAdapter::Add(), lmu::cis::sis::SCDAWGAdapter::Add(), lmu::cis::sis::CompressedAutomatonAdapter::AddDictionary(), lmu::cis::sis::CompressedAutomatonAdapter::AddState(), lmu::cis::sis::CompressedAutomatonAdapter::AddTarjanTable(), lmu::cis::sis::SCDAWGAdapter::AddTarjanTable(), lmu::cis::sis::CompressedAutomatonAdapter::AddTransition(), lmu::cis::sis::InenagaCDAWGAdapter::buildhelp(), lmu::cis::sis::SCDAWGAdapter::buildhelp(), lmu::cis::sis::InenagaCDAWGAdapter::Close(), lmu::cis::sis::SCDAWGAdapter::Close(), lmu::cis::sis::CompressedAutomatonAdapter::Delta(), lmu::cis::sis::SCDAWGAdapter::Delta(), lmu::cis::sis::InenagaCDAWGAdapter::DumpGV(), lmu::cis::sis::CompressedAutomatonAdapter::DumpGV(), lmu::cis::sis::SCDAWGAdapter::DumpGV(), lmu::cis::sis::SCDAWGAdapter::DumpGVRight(), lmu::cis::sis::CompressedAutomatonAdapter::DumpStat(), lmu::cis::sis::InenagaCDAWGAdapter::DumpStat(), lmu::cis::sis::SCDAWGAdapter::DumpStat(), lmu::cis::sis::CompressedAutomatonAdapter::GenerateLanguage(), lmu::cis::sis::SCDAWGAdapter::GenerateLanguage(), lmu::cis::sis::CompressedAutomatonAdapter::GetInitialPairState(), lmu::cis::sis::SCDAWGAdapter::GetInitialPairState(), lmu::cis::sis::CompressedAutomatonAdapter::LeftAutomatonGenerateLanguage(), lmu::cis::sis::CompressedAutomatonAdapter::number_of_states(), lmu::cis::sis::CompressedAutomatonAdapter::number_of_transitions(), lmu::cis::sis::InenagaCDAWGAdapter::Read(), lmu::cis::sis::SCDAWGAdapter::Read(), lmu::cis::sis::CompressedAutomatonAdapter::Shrink(), lmu::cis::sis::SCDAWGAdapter::Shrink(), lmu::cis::sis::InenagaCDAWGAdapter::SortTransitions(), lmu::cis::sis::SCDAWGAdapter::SortTransitions(), lmu::cis::sis::InenagaCDAWGAdapter::suffixLink(), lmu::cis::sis::CompressedAutomatonAdapter::Write(), and lmu::cis::sis::SCDAWGAdapter::Write().