pl.edu.agh.cast.importer.base.util
Class AbstractEncodingRecognizer

java.lang.Object
  extended by pl.edu.agh.cast.importer.base.util.AbstractEncodingRecognizer

public abstract class AbstractEncodingRecognizer
extends Object

Recognizes the encoding and set of characters, used inside specified files.

Author:
AGH CAST Team

Field Summary
static Set<String> avalibleCharsets
          The available charsets.
static Charset CP1250
          Set of characters for CP1250 encoding.
static Charset ISO88592
          Set of characters for ISO-8856-2 encoding.
static Charset UTF8
          Set of characters for UTF-8 encoding.
 
Constructor Summary
AbstractEncodingRecognizer()
           
 
Method Summary
static Charset getFileCharset(String filePath, IImportTokenizer tokenizer, org.eclipse.core.runtime.IProgressMonitor monitor)
          Imports a given file into a list of raw tabular data and retrieves its encoding and the set of characters being used within it.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

avalibleCharsets

public static Set<String> avalibleCharsets
The available charsets.


CP1250

public static final Charset CP1250
Set of characters for CP1250 encoding.


ISO88592

public static final Charset ISO88592
Set of characters for ISO-8856-2 encoding.


UTF8

public static final Charset UTF8
Set of characters for UTF-8 encoding.

Constructor Detail

AbstractEncodingRecognizer

public AbstractEncodingRecognizer()
Method Detail

getFileCharset

public static Charset getFileCharset(String filePath,
                                     IImportTokenizer tokenizer,
                                     org.eclipse.core.runtime.IProgressMonitor monitor)
                              throws IOException
Imports a given file into a list of raw tabular data and retrieves its encoding and the set of characters being used within it.

Parameters:
filePath - the path of the file to be tokenized
tokenizer - the import tokenizer
monitor - the progress monitor for the tokenization operation
Returns:
set of characters used within the given file
Throws:
IOException


Copyright © 2007-2009 IISG AGH-UST Krakow, Poland. All Rights Reserved.