What is Rebel Archive?
Rebel Archive is a non-profit project seeking to accomplish two goals: to be a platform to spread "leftist" historical documents and to be a search tool for investigators.
The digital archive contains documents of very diverse currents of thought, as well as unaffiliated documents coming from no specific group. The terms "leftist" and "left" are in that way broadened to keep the archive from becoming an apologist for any specific tendencies of the left, even though the project's -for the moment- sole contributor does have well defined political position. That is why, for example, the archive houses documents of Stalinist groups, Trotskyist groups, of the communist left, and anarchists, to cite a few.
Some of the documents in the archive would not be strictly classified as political documents, however, they were produced in times that might be of interest to the historian of political or social processes.
The reason to have this variety of historical documents of leftist tendencies, and other more general documents, is to use them to generate critical studies on problems that could interest the historian and the general public, by providing the means to search the documents for information by content, and eventually by tendencies, labels and other forms of grouping.
How do the searches by content work?
The effectiveness of the searches by content depends on the quality of the images and the state of preservation of the physical documents. A sizable amount of the documents currently available in the archive have been received from external sites, thus their preservation being the responsibility of a third party. The only documents directly digitized by Rebel Archive are: "El Trabajador" and "Posición Revolucionaria". This was made possible with the implementation of a house made newspaper digitizer, assembled as a necessity to complete a then ongoing historical research project. Some images of the homemade digitizing device can be found at the end of this section.
The optimal conditions to provide an adequate search process are: an appropriate digitalization (with good resolution), and an acceptable state of the physical documents. With this two conditions met, the search engine can retrieve satisfying results.
The general process is divided in three stages:
- Applying of automatic filters to enhance the image
- Applying of Optical Character Recognition (OCR) software
Once these steps are done, the resulting text is indexed and linked to specific pages as searchable documents. All the software used is open source, so the upkeep costs are focused on the data storage infrastructure and running the web server.
It is necessary to clarify that not all documents have content fit to be correctly indexed, because of extenuating factors previously mentioned (bad quality of digitization or bad physical state of the documents). In trying to mitigate these deficiencies there are two new modules currently in development: (1) a transcription module available for users to contribute to the archive with their own transcriptions online, and (2) a labeling module for users to add labels to the documents and thus improve the search engine results for other users of the system.
The main impediment for the implementation of these new modules is the lack of economic and human resources, delaying their eventual release towards the medium or long term.
Following, some pictures of the digitizing device: