Quantcast
Channel: feed2exec:3287a04dfe6d641d5addfee12dc973a0fd0b770a commits
Viewing all articles
Browse latest Browse all 40

refactor storage classes to force explicit path

$
0
0
Our hacks to forcibly set a class-level path then reuse it in instances was not working. It seems cleaner to explicitly force callers to provide the path to the file we are trying to manipulate in the object, and then that object only handles that path, explicitly. Not more messing around with load-time guessing that doesn't respect the environment, which should make testing for issue We also use composition rather than inheritance in the FeedManager now. The previous method was ambiguous - the manager takes care of both configuration, cache and other data points. When we "add" an entry - what do we add? Making storage member objects instead of parents makes that explicit, at the cost of being a little more verbose - but that should be Pythonic. This lead to all sorts of cleanup: the Feed object doesn't need to be aware of locking or force, and doesn't handle plugins anymore: it only parses. That behavior is moved to the FeedManager dispatch command, which is more logical. This also opens the door to making that parser more pluggable as well, but it makes the standalone "parse" command a little less clean: it doesn't work with an empty config anymore, and indeed, this refactoring makes it impossible to have a FeedManager that isn't backed by a configuration and database (although you *could* pass a memory-only database to sqlite and /dev/null as a config path...) The problem this peculiar change may raise is abusive coupling between dispatch and Feed - those two are quite tangled now because they have been bounced back and forth between the two data structures. We have *one* convenience shortcut between the Manager and config storage: the "pattern", because it's actually passed through the constructor so it seems to me that it makes sense to have that accessible as a property. Unfortunately, it's still unclear what that pattern applies to in the current API from the outside, even though we can read through the code that it applies only to the config, that seems rather arbitrary. We also add __repr__ functions here and there to ease debugging, especially during tests. This also allows us to log objects easily. Finally, note that plugin *execution* is now done serially when running in parallel. This is a result of splitting plugin execution out of the parse function, but may lead to performance degredation. Then again, it may *eventually* improve performance as we can execute plugins in parallel with parsing, something we keep for later for now. There could be issues with execution ordering between parsing and plugin execution: hopefully I am reading the code right and order will be retained when inspecting that `results` array in fetch, but I could be mistaken. The main goal of all those changes is to simplify clean up the test suite. Now the "db" and "conf" paths are coupled together: one cannot go without the other, and we directly use the FeedManager object in the fixture. The fixture is also done per-test, which might slow things down and hide some bugs, but without this, the tests would just fail and I want to go green before trying to diagnose or clean things up any further. Strangely, there is a performance impact for single-process performance (~600ms slower), but a performance *improvement* in multiprocessing - I was expecting the opposite (~300ms faster). This could be considered within the margin of error, however. Single process ============== Before ------ In [14]: run -t -N10 -m feed2exec -- fetch -n IPython CPU timings (estimated): Total runs performed: 10 Times : Total Per run User : 36.67 s, 3.67 s. System : 0.83 s, 0.08 s. Wall time: 173.24 s. After ----- IPython CPU timings (estimated): Total runs performed: 10 Times : Total Per run User : 37.30 s, 3.73 s. System : 0.90 s, 0.09 s. Wall time: 179.08 s. Multi-process ============= Before ------ In [15]: run -t -N10 -m feed2exec -- fetch -n --parallel IPython CPU timings (estimated): Total runs performed: 10 Times : Total Per run User : 7.58 s, 0.76 s. System : 0.80 s, 0.08 s. Wall time: 153.56 s. After ----- IPython CPU timings (estimated): Total runs performed: 10 Times : Total Per run User : 12.03 s, 1.20 s. System : 0.92 s, 0.09 s. Wall time: 150.33 s.

Viewing all articles
Browse latest Browse all 40

Latest Images

Trending Articles





Latest Images