| AGTK |
|
A suite of software components for building tools for annotating linguistic signals, time-series data which documents any kind of linguistic behavior (e.g. audio, video). The internal data structures are based on annotation graphs.
|
| Arithmetic Coding |
|
A java package Arithmetic Coding and PPM (adaptive variable-length > n-gram language models for compression)
|
| ComLinToo |
|
A set of Perl tools for computational linguistics (esp. corpus
handling and (permutation) statistics).
|
| Attribute-Logic Engine (ALE) |
|
A freeware logic programming and grammar parsing and generation system
|
| EDG |
|
A Lisp system for developing and displaying HPSG
|
| Ellogon |
|
An LGPL component-based natural language engineering platform written
in C, C++, Java, Tcl, Perl, and Python
|
| Emdros |
|
A text database engine for analyzed or annotated text.
|
| FreeLing |
|
An open source suite of language analyzers.
|
| GuiTAR |
|
A General Tool for Anaphora Resolution.
|
| Heart of Gold |
|
Middleware for combining shallow and deep NLP components.
|
| Leo |
|
A project to provide an architecture for defining XML specifications of grammars for different natural language parsing systems and tools for converting grammars automatically between those systems
|
| LKB |
|
The LKB system is a grammar and lexicon development environment for use with
constraint-based linguistic formalisms.
|
| Mallet |
|
A Machine Learning for Language Toolkit written in Java
|
| MinorThird |
|
A collection of Java classes for storing text, annotating text, and learning to extract entities and categorize text.
|
| Ngram Statistics Package |
|
Allows for the counting and measuring of Ngrams in text.
|
| NLTK |
|
A Python package intended to simplify the task of programming natural language systems.
|
| nlpFarm |
|
A collection of NLP libraries, tools and demo applications.
Current focus is mainly on parsing and dialogue systems.
|
| SenseRelate |
|
Implements a word sense disambiguation algorithm using WordNet::Similarity
|
| Tiger API |
|
Library which allows java programmers to easily access the structure of any corpus given as a tiger-xml file.
|
| Web as Corpus Toolkit |
|
A collection of programs that can be used to create a (large) text corpus from a list of URLs.
|
| Weka |
|
A collection of machine learning algorithms for data mining tasks.
|
| Weta |
|
The Waikato Environment for Text Analysis
|
| WordNet::Similarity |
|
Provides measures of semantic relatedness using WordNet.
|
If you are working on open source natural language software or wish
to start a project and are interested in joining OpenNLP, read this page.