UNIX System Programmer - Back-end datastore optimization - San Francisco
Our client builds an engine that sits on Web servers and monitors activity
of visitors, then feeds this information to dynamic content servers to
display personalized pages in real-time. Performance is paramount.
The back-end datastore (ObjectStore) has to be extremely high
performance in order to rapidly track over 10 million hits per day.
This involves writing to the ObjectStore's native C++ interfaces. The
datastore is written in both C (for performance) and C++ (for
interface to ObjectStore). Performance is the key issue because they
are optimizing for high-end Web sites. The more performance they can
squeeze out of the datastore, the more hits per day they can support.
The datastore currently supports 10 million hits per day, but they
want to increase this dramatically.
Performance issues include UNIX threading (the datastore has 3
processes, 2 of which are multithreaded. One accepts connections
and does all the communications, and the other does the store and
updates on the ObjectStore objects; the third watches over the
first two to make sure they stay up.
Experience writing compiler back-ends (code generation, code
optimization, peephole optimization) and doing compiler
optimization (pipelin*, instruction schedul*, code anal*, common
subexpression elimination, branch folding, vectoriz*,
paralleliz*, global optimiz*, induction variable analysis, flow
analysis, data flow equat*) indicates they have faced the same
issues that need to be addressed in the datastore's optimization.
The datastore does a lot of things on the system level for
performance reasons. They do a lot of System Calls in a certain
way just for performance reasons. It's not enough for the
datastore just to be functional. They have to do things fast.
Experience writing operating systems would be helpful too.
Other preferred experience includes: C++ and/or Java, API design,
UNIX (Solaris preferred, and multi-platform development.
Future products will be CORBA II compliant; the current architecture
lends itself to being CORBA II compliant. CORBA II compliance will
give the ability to open up the product to be used by third parties.
So there will also be possibilities for API design and development.
Later, they will do a distributed datastore which will be able to
distribute the load of the recording engine across multiple machines.
This includes machine intercommunications and making sure they are in
sync, and all the problems of distributed databases.