Web-as-corpus tools in Java.
* Simple Crawler (and also integration with Nutch and Heritrix)
* HTML cleaner to remove boiler plate code
* Language recognition
* Corpus builder

Project Activity

See All Activity >

License

Apache License V2.0

Follow JavaWAC

JavaWAC Web Site

Other Useful Business Software
Translate docs, audio, and videos in real time with Google AI Icon
Translate docs, audio, and videos in real time with Google AI

Make your content and apps multilingual with fast, dynamic machine translation available in thousands of language pairs.

Google Cloud’s AI-powered APIs help you translate documents, websites, apps, audio files, videos, and more at scale with best-in-class quality and enterprise-grade control and security.
Rate This Project
Login To Rate This Project

User Reviews

Be the first to post a review of JavaWAC!

Additional Project Details

Intended Audience

Science/Research

User Interface

Web-based, Non-interactive (Daemon)

Programming Language

Java

Related Categories

Java Search Engines, Java Frameworks, Java Intelligent Agents, Java Information Analysis Software

Registered

2008-04-11