DescriptionLately, Wikipedia has been recognized as a promising lexical semantic resource. If Wikipedia is to be used for large-scale NLP tasks, efficient programmatic access to the knowledge therein is required.The high-performance Wikipedia API provides structured access to information nuggets like redirects, categories, articles and link structure. JWPL contains a Mediawiki Markup parser that can be used to further analyze the contents of a Wikipedia page. The parser can also be used stand-alone with other texts using MediaWiki markup. Further, JWPL contains the tool JWPLDataMachine that can be used to create JWPL dumps from the publicly available dumps at http://download.wikimedia.org. In addition to that, JWPL now contains the Wikipedia Revision Toolkit, which consists of two tools, the TimeMachine and the RevisionMachine. The TimeMachine can be used to reconstruct a snapshot of Wikipedia from a specific date, or to create multiple snapshots from a time span. The RevisionMachine offers efficient access to the edit history of Wikipedia articles while storing the revisions in a dedicated storage format which decreases the demand of storage space by 98%. Is JWPL for you?JWPL is for you:
Documentation
SupportIf you have any technical questions, please write to the JWPL Mailing List.JWPL and UIMA
Overview PosterFor a first overview over the JWPL components, have a look at the ACL 2011 poster. Its main focus is the Wikipedia Revision Toolkit, but it also contains some information about JWPL Core. |