diff interface/insert_new_columns_into_books/readme.txt @ 0:b12c99b7c3f0

commit for previous development
author Zoe Hong <zhong@mpiwg-berlin.mpg.de>
date Mon, 19 Jan 2015 17:13:49 +0100
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/interface/insert_new_columns_into_books/readme.txt	Mon Jan 19 17:13:49 2015 +0100
@@ -0,0 +1,16 @@
+insert_new_columns_into_books.php	parse the book information from localmonographs.xml and insert the result into database
+insert_176_rows_into_books.php 		insert and update the information for 176 books in the database
+
+localmonographs.xml			additional book information
+localmonographs.txt			the txt version of additional book information
+local_monographs_176.txt		the list of 176 books and their information
+
+get_data_from_sinica.php		parse the book information from the website of sinica and write to files stored under data_from_sinica/
+parse_data_from_sinica.php		Group the duplicated books of source 1 and write the results to data_from_sinica/merged_books.csv
+analyze_data_from_sinica.php		Count the # of books of each source, and concatenate all the csv from 01-71.csv
+data_from_sinica/			csv files storing book information, encoding in utf8 and big5 (originally encoded format)
+					*column_name.csv contains the mapping between column name and source
+					all_data.csv contains all the data concatenated from 01-71.csv
+					merged_books.csv contains the grouped list of duplicated books
+					list_of_local_monographs_from_sinica.xlsx contains the grouping of the duplicated books which are assumed to be the same one, the excel version of merged_books.csv
+