AGTK |
|
A suite of software components for building tools for annotating linguistic signals, time-series data which documents any kind of linguistic behavior (e.g. audio, video). The internal data structures are based on annotation graphs.
|
Arithmetic Coding |
|
A java package Arithmetic Coding and PPM (adaptive variable-length > n-gram language models for compression)
|
ComLinToo |
|
A set of Perl tools for computational linguistics (esp. corpus
handling and (permutation) statistics).
|
Attribute-Logic Engine (ALE) |
|
A freeware logic programming and grammar parsing and generation system
|
EDG |
|
A Lisp system for developing and displaying HPSG
|
Ellogon |
|
An LGPL component-based natural language engineering platform written
in C, C++, Java, Tcl, Perl, and Python
|
Emdros |
|
A text database engine for analyzed or annotated text.
|
FreeLing |
|
An open source suite of language analyzers.
|
GuiTAR |
|
A General Tool for Anaphora Resolution.
|
Heart of Gold |
|
Middleware for combining shallow and deep NLP components.
|
Leo |
|
A project to provide an architecture for defining XML specifications of grammars for different natural language parsing systems and tools for converting grammars automatically between those systems
|
LKB |
|
The LKB system is a grammar and lexicon development environment for use with
constraint-based linguistic formalisms.
|
Mallet |
|
A Machine Learning for Language Toolkit written in Java
|
MinorThird |
|
A collection of Java classes for storing text, annotating text, and learning to extract entities and categorize text.
|
Ngram Statistics Package |
|
Allows for the counting and measuring of Ngrams in text.
|
NLTK |
|
A Python package intended to simplify the task of programming natural language systems.
|
nlpFarm |
|
A collection of NLP libraries, tools and demo applications.
Current focus is mainly on parsing and dialogue systems.
|
SenseRelate |
|
Implements a word sense disambiguation algorithm using WordNet::Similarity
|
Tiger API |
|
Library which allows java programmers to easily access the structure of any corpus given as a tiger-xml file.
|
Web as Corpus Toolkit |
|
A collection of programs that can be used to create a (large) text corpus from a list of URLs.
|
Weka |
|
A collection of machine learning algorithms for data mining tasks.
|
Weta |
|
The Waikato Environment for Text Analysis
|
WordNet::Similarity |
|
Provides measures of semantic relatedness using WordNet.
|
If you are working on open source natural language software or wish
to start a project and are interested in joining OpenNLP, read this page.