DKPro JWPL - Documentation


Lately, Wikipedia has been recognized as a promising lexical semantic resource. If Wikipedia is to be used for large-scale NLP tasks, efficient programmatic access to the knowledge therein is required.
The high-performance Wikipedia API provides structured access to information nuggets like redirects, categories, articles and link structure.
JWPL contains a Mediawiki Markup parser that can be used to further analyze the contents of a Wikipedia page. The parser can also be used stand-alone with other texts using MediaWiki markup.
Further, JWPL contains the tool JWPLDataMachine that can be used to create JWPL dumps from the publicly available dumps at

In addition to that, JWPL now contains the Wikipedia Revision Toolkit, which consists of two tools, the TimeMachine and the RevisionMachine. The TimeMachine can be used to reconstruct a snapshot of Wikipedia from a specific date, or to create multiple snapshots from a time span. The RevisionMachine offers efficient access to the edit history of Wikipedia articles while storing the revisions in a dedicated storage format which decreases the demand of storage space by 98%.

Is JWPL for you?

JWPL is for you:
  • If you need structured access to Wikipedia in Java.
JWPL is not for you:
  • If you need to query live data. JWPL works on an optimized database, i.e. you are querying a static Wikipedia dump. This gives much better performance and lightens the load on the Wikipedia servers.



If you have any technical questions, please write to the JWPL Mailing List.


Are you using UIMA?
Then you might be interested in the JWPL integration provided by DKPro Core

Overview Poster

For a first overview over the JWPL components, have a look at the ACL 2011 poster. Its main focus is the Wikipedia Revision Toolkit, but it also contains some information about JWPL Core.