annotate README.md @ 2:6d8b6a689b2b default tip

changed to bs4
author dwinter
date Mon, 15 Oct 2012 15:09:35 +0200
parents 57e2aa489383
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
57e2aa489383 initial
dwinter
parents:
diff changeset
1 Setup
57e2aa489383 initial
dwinter
parents:
diff changeset
2 =====
57e2aa489383 initial
dwinter
parents:
diff changeset
3 - Open settings.py and adjust database settings
57e2aa489383 initial
dwinter
parents:
diff changeset
4 - DATABASE_ENGINE can either be "mysql" or "sqlite"
57e2aa489383 initial
dwinter
parents:
diff changeset
5 - For sqlite only DATABASE_HOST is used, and it should begin with a '/'
57e2aa489383 initial
dwinter
parents:
diff changeset
6 - All other DATABASE_* settings are required for mysql
57e2aa489383 initial
dwinter
parents:
diff changeset
7 - DEBUG mode causes the crawler to output some stats that are generated as it goes, and other debug messages
57e2aa489383 initial
dwinter
parents:
diff changeset
8 - LOGGING is a dictConfig dictionary to log output to the console and a rotating file, and works out-of-the-box, but can be modified
57e2aa489383 initial
dwinter
parents:
diff changeset
9
57e2aa489383 initial
dwinter
parents:
diff changeset
10
57e2aa489383 initial
dwinter
parents:
diff changeset
11 Current State
57e2aa489383 initial
dwinter
parents:
diff changeset
12 =============
57e2aa489383 initial
dwinter
parents:
diff changeset
13 - mysql engine untested
57e2aa489383 initial
dwinter
parents:
diff changeset
14 - Issue in some situations where the database is locked and queries cannot execute. Presumably an issue only with sqlite's file-based approach
57e2aa489383 initial
dwinter
parents:
diff changeset
15
57e2aa489383 initial
dwinter
parents:
diff changeset
16 Logging
57e2aa489383 initial
dwinter
parents:
diff changeset
17 =======
57e2aa489383 initial
dwinter
parents:
diff changeset
18 - DEBUG+ level messages are logged to the console, and INFO+ level messages are logged to a file.
57e2aa489383 initial
dwinter
parents:
diff changeset
19 - By default, the file for logging uses a TimedRotatingFileHandler that rolls over at midnight
57e2aa489383 initial
dwinter
parents:
diff changeset
20 - Setting DEBUG in the settings toggles wether or not DEBUG level messages are output at all
57e2aa489383 initial
dwinter
parents:
diff changeset
21 - Setting USE_COLORS in the settings toggles whether or not messages output to the console use colors depending on the level.
57e2aa489383 initial
dwinter
parents:
diff changeset
22
57e2aa489383 initial
dwinter
parents:
diff changeset
23 Misc
57e2aa489383 initial
dwinter
parents:
diff changeset
24 ====
57e2aa489383 initial
dwinter
parents:
diff changeset
25 - Designed to be able to run on multiple machines and work together to collect info in central DB
57e2aa489383 initial
dwinter
parents:
diff changeset
26 - Queues links into the database to be crawled. This means that any machine running the crawler with the central db can grab from the same queue. Reduces crawling redundancy.
57e2aa489383 initial
dwinter
parents:
diff changeset
27 - Thread pool apprach to analyzing keywords in text.