file systems. For file systems such as ZFS and WAFL (write anywhere file layout), if that data were not written in a temporally sequential manner, it would not necessarily be sequential on disk. Do you find that you run into those kinds of problems, or does the data tend to be written temporally sequentially as well? AW It’s always written temporally sequentially. BC So that doesn’t become an issue? AW I don’t think so. I probably would have heard about it. BC Yes, that’s probably a safe bet because the performance would be terrible. AW But it’s funny—I think all databases are like this. We’re basically keeping every transaction, so that’s all sequential. What happens at the end of the day, because of the way people query it, is that we actually sort the entire day by instrument and then write it out sequentially to disk. That operation happens in memory, and then it goes to disk, so it’s actually sorted by security and then time. During the day, however, it’s sorted by time. BC That’s a large sort. How long does it take? AW You could be sorting a billion rows. That takes a couple of minutes. BC The single CPU pipes are approaching their limits. In terms of that sort taking a couple of minutes, that’s 100 percent compute time. Do you use single or multiple cores when you do it? AW Single core. The data volumes are getting much bigger, and, of course, the core speed is not improving, so our customers have to split the symbol groups. BC Then you’ve got to segment your data flow somehow to reflect the fact that single-core performance is not improving. AW Yes, and we’re right at that limit now, because with a single core we can do about a million updates a second. BC What about making K or Q implicitly parallel, where you’re parallelizing under the hood? Is that a possibility? AW Maybe. I’ve done parallel programming since ’ 75, and K is a parallel language. How ironic—this must be the most parallel language there is. The most prominent operator is each, which is parallel. There are no control structures. The primitives themselves are parallel. BC Is that something you’re thinking about doing? Will that parallel each actually consume multiple cores? AW Yes, but that doesn’t solve the sorting problem, and it really doesn’t solve the realtime problem, because in realtime if I get an IBM quote, it’s one record. I might want to check it against everything else. Certainly, if I’ve got one-eighth of the symbols operating entirely on their own, then that’s very easy to parallelize; but if your strat-

egy involves all of the symbols all the time, that would be very difficult to run in parallel.

BC What’s the solution?

AW I think we just won’t be able to do those kinds of algorithms.

 

BC You have this four-year itch to write a new programming language, so you’re coming due. Are the constraints on the problem any different? What’s the new language going to look like?

AW It will probably be 95 percent the same. It’s the same semantics: noun, verb, adverb—same data types, same functions. But I like to try different things under the covers. For example, I like to try different memory allocation schemes. It’s all call by value but reference count, which is kind of amazing when you think about it, so there’s no garbage collect. Everything is reference counted; when it’s free, you know immediately so you get good reuse. Under the covers, I play with different things. For example, if you’re doing a vector operation and the reference count is one, well, then reuse the vector. I also always try to make the code smaller. BC Are you actually redoing the implementation, or are there going to be semantic differences as well? AW The implementation is 100 percent new. I write everything from scratch, so the C code is entirely different but the semantics are about 95 percent the same. BC You start over in terms of your C code? You take all that and throw it out? AW Yes, completely. BC What does it feel like to part with all that code that’s so lovingly created? AW I love starting from scratch—and it’s stupid because doing the parser, tokenizer, and printer takes me months. BC Do you find that you can come up with a better solution? AW I think they’re getting a little bit better, but I think I’m converging. BC Is that advice you would give to practitioners: to throw out more? AW Yes, but in business it’s hard to do that. BC Especially when it’s working! AW But I love throwing it all out. Q

 

LOVE IT, HATE IT? LET US KNOW

feedback@queue.acm.org

© 2009 ACM 1542-7730 /09/0200 $5.00

References:

http://queue.acm.org

mailto:feedback@queue.acm.org

Archives