RegexSegmenter (DKPro Core 1.9.0 API)

java.lang.Object
- org.apache.uima.analysis_component.AnalysisComponent_ImplBase
- - org.apache.uima.analysis_component.Annotator_ImplBase
  - - org.apache.uima.analysis_component.JCasAnnotator_ImplBase
    - - org.apache.uima.fit.component.JCasAnnotator_ImplBase
      - de.tudarmstadt.ukp.dkpro.core.api.segmentation.SegmenterBase
        
        de.tudarmstadt.ukp.dkpro.core.tokit.RegexSegmenter

All Implemented Interfaces:

org.apache.uima.analysis_component.AnalysisComponent
```
public class RegexSegmenter
extends SegmenterBase
```
This segmenter splits sentences and tokens based on regular expressions that define the sentence and token boundaries.
The default behaviour is to split sentences by a line break and tokens by whitespace.

Field Summary

Fields
Modifier and Type	Field and Description
`static String`	`PARAM_SENTENCE_BOUNDARY_REGEX` Define the sentence boundary.
`static String`	`PARAM_TOKEN_BOUNDARY_REGEX` Defines the pattern that is used as token end boundary.

Fields inherited from class de.tudarmstadt.ukp.dkpro.core.api.segmentation.SegmenterBase
PARAM_LANGUAGE, PARAM_STRICT_ZONING, PARAM_WRITE_FORM, PARAM_WRITE_SENTENCE, PARAM_WRITE_TOKEN, PARAM_ZONE_TYPES

Constructor Summary

Constructors
Constructor and Description

RegexSegmenter()

Constructors
Constructor and Description
`RegexSegmenter()`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`void`	`initialize(org.apache.uima.UimaContext context)`
`protected void`	`process(org.apache.uima.jcas.JCas aJCas, String text, int zoneBegin)`

Methods inherited from class de.tudarmstadt.ukp.dkpro.core.api.segmentation.SegmenterBase
createSentence, createToken, createToken, createToken, getLanguage, getLocale, getZoneTypes, isEmpty, isStrictZoning, isWriteSentence, isWriteToken, limit, process, trim, trimChar

Methods inherited from class org.apache.uima.fit.component.JCasAnnotator_ImplBase
getLogger

Methods inherited from class org.apache.uima.analysis_component.JCasAnnotator_ImplBase
getRequiredCasInterface, process

Methods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase
getCasInstancesRequired, hasNext, next

Methods inherited from class org.apache.uima.analysis_component.AnalysisComponent_ImplBase
batchProcessComplete, collectionProcessComplete, destroy, getContext, getResultSpecification, reconfigure, setResultSpecification

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - PARAM_TOKEN_BOUNDARY_REGEX
```
public static final String PARAM_TOKEN_BOUNDARY_REGEX
```
    Defines the pattern that is used as token end boundary. Default: [\s\n]+ (matching whitespace and linebreaks.
    When setting custom patterns, take into account that the final token is often terminated by a linebreak rather than the boundary character. Therefore, the newline typically has to be added to the group of matching characters, e.g. "tokenized-text" is correctly tokenized with the pattern [-\n].
    
    See Also:
    
    Constant Field Values
  - PARAM_SENTENCE_BOUNDARY_REGEX
```
public static final String PARAM_SENTENCE_BOUNDARY_REGEX
```
    Define the sentence boundary. Default: \n (assume one sentence per line).
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - RegexSegmenter
```
public RegexSegmenter()
```
- Method Detail
  - initialize
```
public void initialize(org.apache.uima.UimaContext context)
                throws org.apache.uima.resource.ResourceInitializationException
```
    Specified by:
    
    initialize in interface org.apache.uima.analysis_component.AnalysisComponent
    
    Overrides:
    
    initialize in class org.apache.uima.fit.component.JCasAnnotator_ImplBase
    
    Throws:
    
    org.apache.uima.resource.ResourceInitializationException
  - process
```
protected void process(org.apache.uima.jcas.JCas aJCas,
                       String text,
                       int zoneBegin)
                throws org.apache.uima.analysis_engine.AnalysisEngineProcessException
```
    Specified by:
    
    process in class SegmenterBase
    
    Throws:
    
    org.apache.uima.analysis_engine.AnalysisEngineProcessException

Class RegexSegmenter

Field Summary

Fields inherited from class de.tudarmstadt.ukp.dkpro.core.api.segmentation.SegmenterBase

Constructor Summary

Method Summary

Methods inherited from class de.tudarmstadt.ukp.dkpro.core.api.segmentation.SegmenterBase

Methods inherited from class org.apache.uima.fit.component.JCasAnnotator_ImplBase

Methods inherited from class org.apache.uima.analysis_component.JCasAnnotator_ImplBase

Methods inherited from class org.apache.uima.analysis_component.Annotator_ImplBase

Methods inherited from class org.apache.uima.analysis_component.AnalysisComponent_ImplBase

Methods inherited from class java.lang.Object

Field Detail

PARAM_TOKEN_BOUNDARY_REGEX

PARAM_SENTENCE_BOUNDARY_REGEX

Constructor Detail

RegexSegmenter

Method Detail

initialize

process