Mercurial > hg > extraction-interface
view interface/insert_new_columns_into_books/readme.txt @ 0:b12c99b7c3f0
commit for previous development
author | Zoe Hong <zhong@mpiwg-berlin.mpg.de> |
---|---|
date | Mon, 19 Jan 2015 17:13:49 +0100 |
parents | |
children |
line wrap: on
line source
insert_new_columns_into_books.php parse the book information from localmonographs.xml and insert the result into database insert_176_rows_into_books.php insert and update the information for 176 books in the database localmonographs.xml additional book information localmonographs.txt the txt version of additional book information local_monographs_176.txt the list of 176 books and their information get_data_from_sinica.php parse the book information from the website of sinica and write to files stored under data_from_sinica/ parse_data_from_sinica.php Group the duplicated books of source 1 and write the results to data_from_sinica/merged_books.csv analyze_data_from_sinica.php Count the # of books of each source, and concatenate all the csv from 01-71.csv data_from_sinica/ csv files storing book information, encoding in utf8 and big5 (originally encoded format) *column_name.csv contains the mapping between column name and source all_data.csv contains all the data concatenated from 01-71.csv merged_books.csv contains the grouped list of duplicated books list_of_local_monographs_from_sinica.xlsx contains the grouping of the duplicated books which are assumed to be the same one, the excel version of merged_books.csv