Class RemoveMinorityScriptsTextFilter

java.lang.Object
com.optimaize.langdetect.text.RemoveMinorityScriptsTextFilter
All Implemented Interfaces:
TextFilter

public class RemoveMinorityScriptsTextFilter extends Object implements TextFilter
Removes text written in scripts that are not the dominant script of the text. TODO this does not do special handling for Japanese (3 scripts) and Korean (2 scripts), they should be counted together and kept.
Author:
Fabian Kessler
  • Method Details

    • forThreshold

      public static RemoveMinorityScriptsTextFilter forThreshold(double threshold)
      If a script has less than this fraction of content compared to the most used one, its text is removed. Example: Latin 10%, Cyrillic 80%, Common 10% (punctuation n'stuff). Now 10 is put in relation to 80.
      Parameters:
      threshold - 0-1, suggested value is 0.3. If smaller then removed, equal remains.
    • filter

      public String filter(CharSequence text)
      Specified by:
      filter in interface TextFilter