pl.edu.agh.cast.importer.base.tokenizer.fixedwidth
Class FixedWidthTokenizer

java.lang.Object
  extended by pl.edu.agh.cast.importer.base.tokenizer.AbstractImportTokenizer
      extended by pl.edu.agh.cast.importer.base.tokenizer.fixedwidth.FixedWidthTokenizer
All Implemented Interfaces:
IImportTokenizer

public class FixedWidthTokenizer
extends AbstractImportTokenizer

Tokenizer for text files where fields are aligned in columns with spaces between each field.

Author:
AGH CAST Team

Field Summary
static String COMMENT_CHAR_OPTION_NAME
          The comment character option indicating that a line is a comment and is not supposed to be imported.
static String CUT_POINTS_OPTION_NAME
          Indices of column cuts separated by white chars.
 
Fields inherited from class pl.edu.agh.cast.importer.base.tokenizer.AbstractImportTokenizer
options
 
Constructor Summary
FixedWidthTokenizer()
           
 
Method Summary
 List<String> getInputFileLines()
           
 List<Integer> getLineCutPoints()
          Returns cut points indices.
static String pointListToString(List<Integer> indices)
          Converts list of cut points indices to string.
 List<RawTabularData> tokenize(InputStream is, long rowsLimit, org.eclipse.core.runtime.IProgressMonitor monitor)
          Splits a given input stream into tokens, using specified tokenizer options.
 
Methods inherited from class pl.edu.agh.cast.importer.base.tokenizer.AbstractImportTokenizer
equals, getEncoding, getOptionValue, getTokenizerOptions, hashCode, removeEmptyCellsFromRowEnd, removeQualifier, setEncoding, setTokenizerOptions
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

COMMENT_CHAR_OPTION_NAME

public static final String COMMENT_CHAR_OPTION_NAME
The comment character option indicating that a line is a comment and is not supposed to be imported.

See Also:
Constant Field Values

CUT_POINTS_OPTION_NAME

public static final String CUT_POINTS_OPTION_NAME
Indices of column cuts separated by white chars.

See Also:
Constant Field Values
Constructor Detail

FixedWidthTokenizer

public FixedWidthTokenizer()
Method Detail

tokenize

public List<RawTabularData> tokenize(InputStream is,
                                     long rowsLimit,
                                     org.eclipse.core.runtime.IProgressMonitor monitor)
                              throws IOException
Splits a given input stream into tokens, using specified tokenizer options.

Parameters:
is - the data input stream to tokenize
rowsLimit - the maximum number of rows to be imported
monitor - the progress monitor for the tokenization operation
Returns:
the tokenized data in an unanalyzed tabular form
Throws:
IOException
See Also:
IImportTokenizer.tokenize(java.io.InputStream, long, org.eclipse.core.runtime.IProgressMonitor)

pointListToString

public static String pointListToString(List<Integer> indices)
Converts list of cut points indices to string. Result string can be used as tokenizer option value.

Parameters:
indices - cut points indices
Returns:
indices in string format

getInputFileLines

public List<String> getInputFileLines()

getLineCutPoints

public List<Integer> getLineCutPoints()
Returns cut points indices.

Returns:
cut points indices.


Copyright © 2007-2009 IISG AGH-UST Krakow, Poland. All Rights Reserved.