0
|
1 Setup
|
|
2 =====
|
|
3 - Open settings.py and adjust database settings
|
|
4 - DATABASE_ENGINE can either be "mysql" or "sqlite"
|
|
5 - For sqlite only DATABASE_HOST is used, and it should begin with a '/'
|
|
6 - All other DATABASE_* settings are required for mysql
|
|
7 - DEBUG mode causes the crawler to output some stats that are generated as it goes, and other debug messages
|
|
8 - LOGGING is a dictConfig dictionary to log output to the console and a rotating file, and works out-of-the-box, but can be modified
|
|
9
|
|
10
|
|
11 Current State
|
|
12 =============
|
|
13 - mysql engine untested
|
|
14 - Issue in some situations where the database is locked and queries cannot execute. Presumably an issue only with sqlite's file-based approach
|
|
15
|
|
16 Logging
|
|
17 =======
|
|
18 - DEBUG+ level messages are logged to the console, and INFO+ level messages are logged to a file.
|
|
19 - By default, the file for logging uses a TimedRotatingFileHandler that rolls over at midnight
|
|
20 - Setting DEBUG in the settings toggles wether or not DEBUG level messages are output at all
|
|
21 - Setting USE_COLORS in the settings toggles whether or not messages output to the console use colors depending on the level.
|
|
22
|
|
23 Misc
|
|
24 ====
|
|
25 - Designed to be able to run on multiple machines and work together to collect info in central DB
|
|
26 - Queues links into the database to be crawled. This means that any machine running the crawler with the central db can grab from the same queue. Reduces crawling redundancy.
|
|
27 - Thread pool apprach to analyzing keywords in text.
|