DescriptionLately, Wikipedia has been recognized as a promising lexical semantic resource. If Wikipedia is to be used for large-scale NLP tasks, efficient programmatic access to the knowledge therein is required.
The high-performance Wikipedia API provides structured access to information nuggets like redirects, categories, articles and link structure.
JWPL contains a Mediawiki Markup parser that can be used to further analyze the contents of a Wikipedia page. The parser can also be used stand-alone with other texts using MediaWiki markup.
Further, JWPL contains the tool JWPLDataMachine that can be used to create JWPL dumps from the publicly available dumps at http://download.wikimedia.org.
In addition to that, JWPL now contains the Wikipedia Revision Toolkit, which consists of two tools, the TimeMachine and the RevisionMachine. The TimeMachine can be used to reconstruct a snapshot of Wikipedia from a specific date, or to create multiple snapshots from a time span. The RevisionMachine offers efficient access to the edit history of Wikipedia articles while storing the revisions in a dedicated storage format which decreases the demand of storage space by 98%.
Is JWPL for you?JWPL is for you:
SupportIf you have any technical questions, please write to the JWPL Mailing List.
JWPL and UIMA
Overview PosterFor a first overview over the JWPL components, have a look at the ACL 2011 poster. Its main focus is the Wikipedia Revision Toolkit, but it also contains some information about JWPL Core.