The Cl@rity Program (diavgeia in greek) is a successful Greek State attempt to publish every Greek government decision on the web. Beginning October 1st 2010, all Ministries are obliged to upload their decisions on the Internet, through the «Cl@rity» program.
The problem with Cl@rity is that search facilities are simply not good enough, the latest version of the web app uses Google Custom Search Engine technology, and is offering this interface.
The aim of SuperCl@rity project http://yperdiavgeia.gr is to provide a full text search engine of all documents published through Cl@rity. To achieve this goal, SuperCl@rity is retrieving, analysing and indexing all published PDF documents through the excellent Cl@rity APIs. The result is a modern search engine which is able to provide fast and accurate results, enabling people to find what they need.
SuperCl@rity is implemented using state of the art tools and technologies such as:
- OCR scanning of image pdf documents using tesseract-ocr,
- full text search engine using inverted-indexes based on sphinx search,
- nginx
- php5
- jquery + jqueryui
- compass style
- debian linux
SuperCl@rity began in August 2011 as a personal research project by Vangelis Banos and is developed constantly ever since.
No comments:
Post a Comment