The Procedure of Digitizing Archives

The Institute of History and Philology of the Academia Sinica (IHP) has collected about 310,000 archives of the Qing Dynasty, including edicts, titles, transfers, congratulatory forms, three law division files, verbatim manuscripts, various yellow pages, pamphlets, etc.

Since the acquisition of these archives in 1929, IHP has been organizing intermittently due to changes in the environment and era. Since 2001, the IHP has participated in the Academia Sinica's National Archives Digitization Project and included the Grand Secretariat Archive filing as one of the key projects. The primary purpose was to seek more human resources and material resources, speed up the filing of archives, reduce the damage of archives, promote the opening of archives, enhance the value of archives, and at the same time fulfill the mission and responsibility of the preservation and maintenance of national property.

The organization of archives is a series of rigorous procedures involving the integration of academics, technology, and experience. From physical to digital, fully and adequately presenting the value of archives and the different meanings that have transformed through space and time while satisfying the needs of different users and researchers, is a difficult task. Here, we use the IHP's Ming-Qing Archive Studio to organize the Grand Secretariat Archive documents and introduce the digitization process in response to recent year's digitization experience, which follows the trend of scientific and technological development and offers related archive institutions for reference.

Staff from the Grand Secretariat Archives are divided into "organization group" and "abstract group." The organization group is mainly responsible for the organization and repair of documents, while the abstract group conducts catalog abstracts and descriptions.

First Stage: Physical Organization Stage


I.Inspect the original
Examining the original's physical condition is the first step in the collation work, and the main purpose is to determine subsequent processing procedures.

II.Clean the original
Use a soft brush to clean the silverfish, insect eggs, mildew, and dust on the original one by one.

III.Match and patch
If the original is relatively complete, it will go through the process of hole-filling and water spraying.

If the original is damaged or incomplete, it needs to be repaired and mounted. Because an original may be staggered or broken after years and experience numerous moving and sorting, it needs to be patched and pieced together to maintain completeness. The patching process puts large pieces of cotton paper on the back of the small fragments of the original; if pages are folded or fell apart, they will be matched and patched. The tools needed in the whole match and patch process are crepe paper, self-made paste, etc.

IV.Mount
Choose the method of mounting according to the physical condition and purpose of the original. Simple mounting is generally the main method.

The mounting method is divided into wet-mounting and dry-mounting. Apply wet-mounting, if the ink of the original archive does not smudge, by putting the paste on the back of the original. Apply dry-mounting, if the original has fragments or the ink would smudge, brushing the paste on the supporting paper, and then pasting the original on it. Currently, the Grand Secretariat Archive mainly adopts the dry-mounting method.
The pasting is done traditionally, using a hand-made paste for mounting. If an archive is severely corroded, old mounting will be removed with methylcellulose before applying the mounting process.
(simmering the paste)

The mounting is usually done by spraying water on the original archive first, and then smearing paste on the supporting paper to hold it under the original. Framed archives must be dried (in the shade) on a particular wooden wall.
(air-drying)

Peel the archive down from the wall and crop it after the archive is dried in the shade. Torn it off from the special wooden wall and then crop it, folding it according to its original format. Then, cut off its top and bottom part, leaving the margins at a certain proportion.

The tools needed for the whole mounting process are: mulberry paper, cotton paper, rice paper, silk damask, homemade paste, cutter, large wall, bamboo opener, tweezers, ruler, etc.
(mounting tools)

V.Verify data entry of the original archive
Check the correctness and completeness of the archives' order, seal with the Grand Secretariat Archive stamp, and enter the number code. The ink pad used for the seal is specially made to match the fragility of the archives. The cost is relatively expensive.
(Sealing and numbering)

VI.Put on shelf
After the originals are registered, they are packaged in acidic-free paper one by one. When a considerable amount is reached, they are placed in a collection box, sealed with a collection label, put on the shelf, and stored in a warehouse under constant temperature and humidity. The temperature and humidity control conforms to international standards, with the humidity between 50 to 60 and the temperature between 18-20 degrees Celsius.

(constant temperature and humidity warehouse)

Ancient thread-bound books are laid flat instead of standing, same for folded documents. They are more suitable for laying flat. Therefore, each document will be wrapped in acidic-free paper. Then, stored in a carton in the order of the document number. According to the file's thickness, they will be put in packs of 50 or 30 pieces. The serial number is printed on the outside of the carton for easy access. Packing can also avoid rearrangement problems caused by earthquakes.
(folded documents laying flat in acid-free cartons)

Second Stage: Digitization Stage (divided into three aspects and process simultaneously)

First aspect: Image processing

I.Original archive scanning, digital photography
Considering the digitization purpose of the archive, formulate the specifications of the digital image. Then, determine the digital image-making method according to the original's physical organization, which is divided into the scanning method and the digital photography method. The specification of the scanned image file is 300dpi for archives at the "Grand Secretariat Archives level." The system will directly downgrade and export to 150dpi or 72dpi for documents at the "public information level." Generally, folded documents are digitized by scanning, and digital photography is used for large-size archives or scrolls.

(original archive scanning, digital photography)

The color format is divided into grayscale and full color during the scanning process, depending on whether the scanned image is clear or readable. The differentiation principle is: if the original has mildew, dark spots and mildew spots, water stains, or is soaked in water and the seal at the seam of the folded part affects the text content on the original, scan it in full color. In general cases, scan it in grayscale. After proofreading the scanned images, if the results are not good, standards will be raised.

The unit of scanning is calculated by page: one page is counted for every two unfolded pages. It is because each folded piece's thickness is different, and the number of pages is also different, so it is not suitable to use pieces as the unit of calculation. Moreover, the scanning platform usually takes the size of A4,  which is more suitable. After the system combines the documents, it can be viewed page by page.

Digital images are taken with a Kodak DCS420 camera. After shooting, the digital image is directly connected to the computer for image editing and processing. The specifications of the digital image file are set at 300dpi, real pixel, to reproduce the original appearance, so that the photographed image can be as identical as possible to the original. Puzzles also strive to be perfect. The handwriting and watermarks on the archive also strive to be consistent. Besides, to make the digital images sharper and clearer, a magnet will be attached to the photographed document, and then the part covered by the magnet will be cropped out with image editing software. As for the scanning part, the system automatically combines the files without manually connecting the images.

II. Digital image proofreading and revision
Check the correctness, completeness, and clarity of the image. Mainly to check whether there are missing pages, duplicate pages, missing characters, and revise the manuscript if necessary.

(image proofing)


III.  Post-production of digital images
Including image connection: the automatic document combining system of the scanning method and manual image connecting of the digital photography method, file combination, backup storage, downgrading conversion of the scanned archive, embedding watermarks, and other technological implementation, making the digital image file a digital image that can actually be provided to the user.

Second aspect: Establishing directory

I.Recording archive directory
Dispatch documents, according to the “Ming and Qing Archives Recording Rules (Draft)” in the unit of pieces, and write the abstract summary. The directory must record:

1.“Information of the archive content,” such as the person who wrote the archive, the official title, and time.

2.“Information on the status of the archive carrier,” such as whether the archive is complete or incomplete, decorated or mounted or not, preservation status, etc.

3.“Information on archive management,” such as the author of the abstract and the future use of the archive.

In addition, a 60-word abstract is required as part of the content of an archive. The original Grand Secretariat Archives already has an abstract recorded at that time, called “posting,” which will be imported in full text. In actual implementation, because the complexity of abstracts varies and takes time to interpret, the collation team may first write a handwritten abstract and then record it in detail in the manufacturer's documentation system according to the needs.

(catalog proofing)


II.Catalog proofing
After the abstract is recorded online, a proof report will be printed out for proofreading and correction. Since all the colleagues in the Grand Secretariat Archive have senior experience in related work and are rigorous in their work, we adopt the method of self-proofing.


Third aspect: Value-adding analysis (this part will be implemented by the abstract team)

I.Authority File Recording
1.Decide the nature of the authoritative document: names of people and places.
2.Extract the names of people and places from the archives.
3.Select textual research and reference materials (provided by researchers), follow the textual sequence, and find and record the source and location of the data.
4.According to “The Name and Authority File Recording of the Grand Secretariat Archives Description Rules (draft),” “The Location Name and Authority File Recording of the Grand Secretariat Archive Description Rules (preliminary draft)”  (the rules are developed by researchers in related fields of our institute). Data is verified, compared, and recorded.

II.Authority File Proofing
Proofing is divided into self-correction and mutual correction. The system prints a proofreading report or online proofreading to make corrections.


Third Stage: Application Stage

I.System Connection
Link image files, catalog files, and authority files to perform system function operations, provide retrieval, and conduct authority control during the process. It is now possible to read the catalog search online. The image part needs to apply for paid use.

II.Open Access
Develop file reading rules and offer users to search, browse catalogs, and read full-text images in a system-controlled method.


Text editing: The Grand Secretariat Archive, Institute of History and Philology, Academia Sinica
Text compilation: The Grand Secretariat Archive, Institute of History and Philology, Academia Sinica
National Science and Technology Project - Content Development Sub-project
Picture provided by: The Grand Secretariat Archive, Institute of History and Philology, Academia Sinica
National Science and Technology Project - Content Development Sub-project