Previously, I discussed how to gain performance by moving to higher level abstractions instead of going lower level as is commonly perceived.
One thing that bothered me about the original post was that, while the optimized version using the tokenizer was probably fast enough, it took quite a bit more code to get a 35% increase in performance.
Since then, I found another high level method that, while not as fast as the split method, offers a reasonable speed up without much or any additional code. That method is using Boost.Xpressive.
Xpressive is an interesting library in that it implements a domain specific language within C++ which allows for the generation of very optimized regular expressions.
As an experiment, I implemented a version of the original parser using Xpressive. That code looks like this;
What's nice about this is that it's short, clear, and reasonably fast. Here's the timing;
This runs about 18% faster than the optimized tokenize version with approximately the same amount of code. The 'split' version, shown below as a refresher, is still about 25% faster than this one, but this is considerably less code, which might be worth it.
As a further experiment, I implemented an identical solution using std::regex;
This is virtually identical to the Xpressive version, but slower; performance is closer to the original tokenizer version, which is good news for Xpressive and means I'll probably turn to that library if I need to do any regular expression work in the future.
I've updated the token test file with the added parsers available here. At this point, you're probably thinking that this isn't optimizable any further without going really raw (like raw pointer raw!). But this isn't the case, next up is another technique in which high level libraries free us up to change things in order to allow for higher performance.
Xpressing the non-obvious
One thing that bothered me about the original post was that, while the optimized version using the tokenizer was probably fast enough, it took quite a bit more code to get a 35% increase in performance.
Since then, I found another high level method that, while not as fast as the split method, offers a reasonable speed up without much or any additional code. That method is using Boost.Xpressive.
Xpressive is an interesting library in that it implements a domain specific language within C++ which allows for the generation of very optimized regular expressions.
As an experiment, I implemented a version of the original parser using Xpressive. That code looks like this;
What's nice about this is that it's short, clear, and reasonably fast. Here's the timing;
This runs about 18% faster than the optimized tokenize version with approximately the same amount of code. The 'split' version, shown below as a refresher, is still about 25% faster than this one, but this is considerably less code, which might be worth it.
As a further experiment, I implemented an identical solution using std::regex;
This is virtually identical to the Xpressive version, but slower; performance is closer to the original tokenizer version, which is good news for Xpressive and means I'll probably turn to that library if I need to do any regular expression work in the future.
Are we there yet?
I've updated the token test file with the added parsers available here. At this point, you're probably thinking that this isn't optimizable any further without going really raw (like raw pointer raw!). But this isn't the case, next up is another technique in which high level libraries free us up to change things in order to allow for higher performance.
This is good blog in which programming languages are discussed practically by addressing coding and syntaxes etc. Excellent work of the blogger, simply said that........
ReplyDelete