Anthony D. Tribelli <a...@netcom.com> wrote:
> : The Linux render farm was, by definition, 100% Linux.
> :
> : I'm surprised that someone hasn't already pointed this out to you.
> The rendering for Titanic was primarily done on 187 Carrera Alpha's
> mostly running Linux and some running Windows NT.
> For Linux ...
> On Digital Unix ...
> For the Windows NT portion ....
you might want to read the following reply, written by the Digital Domain
employee who has installed the render farm. [the article was approved by
Digital Domain management.] After reading this you might find that those NT
boxes were mostly sitting around almost unused, serving files, while the
Linux boxes did the rendering work. 100% of the rendering work (floating
point intensive stuff) was done on Linux boxes. They consider NT an important
(future) platform, but looks like the Titanic rendering was well, Linux's
round. (The article is a bit longish, but worth reading IMO.)
[ watch the posting date, and consider that _still_ how much FUD is being
injected ...]
also note that the whole Linux installation/support thing was apparently
done by a single person, wondering how many NT people that would take. Not
to mention that he also has found a bug in the Linux/Alpha kernel and has
fixed it on the spot. With NT i fear they would have missed this year's
Oscar nominations ;)
-----Forwarded message from Daryll Strauss <daryll>-----
Message-ID: <19980107185209.60060@jolt>
Date: Wed, 7 Jan 1998 18:52:09 -0800
From: Daryll Strauss <daryll>
To: alph...@listserv.mke.ra.rockwell.com
Subject: Digital Domains use of Linux on Titanic
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.85
Organization: Digital Domain
I felt like I needed to address some of the comments Grant has made
about our Linux Alpha cluster. I'm trying to avoid this becoming a flame
war and instead just concentrate on the facts of the case.
- |Daryll
From: Grant Boucher <grantbouc...@earthlink.net>
Sent: Tuesday, January 06, 1998 5:57 AM
Subject: Re: ALPHANT Digest V1 #431
GB> uh, as the person who recommended, supervised, and implemented DEC Alpha
GB> at Digital Domain, I would like to clear up a few matters....
Grant was digital artist at Digital Domain. The official decisions
about the purchase of the systems were made by our director of
technology. I did the installation of the cluster and implemented the
Linux portion of the cluster.
GB> first, half of the 160 Alpha render farm was Windows NT 4.0. Only half
GB> was linux.
Half the machines were Linux originally, until they (the Titanic crew)
found that the NT boxes really weren't as useful. The 105 machines I
quoted in my article was the configuration roughly one third of the way
into the project. 40 machines were converted from NT to Linux.
GB> Unlike the Linux machines, the NT machines and the Digital Unix servers
GB> NEVER crashed, routed IP packets automatically (just hit the check box
GB> under Network config for NT) and basically rang rings around the Linux
GB> machines for ease of use, installation, and reliability. It took
GB> days of kernel recompiles just to get the linux boxes to even
GB> barely work and they NEVER properly routed packets (an NT machine was
GB> configured in 15 minutes when they finally gave up on Linux).
First, the NT boxes did crash. The systems administrator for the NT boxes I'm
sure would attest to that. Unfortunetly, they don't report their uptime,
and were silently rebooted. So, there really isn't a measure of how
reliable the NT boxes were. I do think they remained up more than the
Linux boxes for reasons I've explained later.
Second, we run a slightly unusual network. I did have trouble with the
FDDI card under Linux. We opted not to use it because of the problems,
but also because we could spare the NT boxes (they weren't being heavily
used) and it was a solution that minimized downtime. We were very busy
and it was the expedient solution. The other problem is that the NT box
did route packets, but not very quickly. The overall performance was not
very good for the speed of the link.
Third, I did describe in my article the troubles we had with that
version of the Linux kernel. They weren't minor, but we did manage to
resolve them relatively quickly. As I mentioned, I believe most of them
would not be true for current users.
GB> The Linux farm was unreliable and problematic for weeks when compared
GB> with the NT farm, and this was the SAME hardware, network etc. I am
GB> sorry to disappoint all the Linux fans out there, but in a production
GB> environment, Linux was found to be seriously wanting when compared to
GB> NT. NT was the ONLY operating system during Titanic that did not crash
GB> the servers at all...EVER. Irix on the SGIs and Linux on the Alphas
GB> both crashed DAILY...sometimes more than a few times a day.
I'm not sure where Grant got his numbers about downtime. Perhaps he is
extrapolating from the initial setup. Once the machines were up and
configured they worked very reliably. The machines are still in heavy
use and have an average uptime of around 60 days.
The most common cause for crashes was environmental
conditions. Unfortunetly, we under equipped the air conditioning in the
room, and the outside air temperature approached 110 degrees in some
places. A few of the processors that were being used in that area died
(quite understandably). In one of those places a couple of the Linux
boxes died, the NT boxes in those areas stayed alive. That was because
the Linux boxes were being heavily used while the NT boxes sat idle.
The other crash that was more serious for Linux was caused by bugs in
the NFS implementation. When a Linux box was being actively used and the
SGI server went down. This caused the NFS implementation on Linux to
hang. This was a serious problem for us, that sometimes required
resetting the machines. This was also a fairly infrequent occurrence. I'd
estimate once every couple weeks. Again, I believe current versions
would not have these problems.
GB> since these were simple Command line renderers, with simple parameters
GB> passed to them, your comment makes no sense whatsoever...again, ONLY the
GB> linux and irix boxes crashed during the production of Titanic...the NT
GB> boxes were the most reliable on the production...period.
The problem with the NT boxes is that they never got a reasonable NFS
implementation. The NFS on the NT Alphas was extremely slow. The lack
of support for symbolic links made using our disk space effectively very
difficult. The limitation of 26 mounted drives was insufficient. We
avoided this problem in the most expedient way possible. We dedicated NT
file servers and moved all the NT data to those file servers, that way
they didn't have to interconnect with the rest of the NFS
environment. They could remain their own isolated NT solution.
> The openness of especially Linux makes everybody can see
> what could be made better, everybody can help with the
> debugging of applications.
GB> huh? you are really reaching here...Linux is a shareware OS and the
GB> decision to risk the biggest film of all time on it was a terrible mistake
GB> in my opinion.
Linux is, of course, a freely available operating system. Having source
allowed us to fix problems we encountered that we could not have done
with a standard commercial OS. Of course, we would hope we don't have
problems to fix, but frankly that never happens. There are bugs in every
OS, and our environment stresses the operating systems.
GB> big mistake...Windows NT is a totally different animal than Windows 95
GB> and Titanic would not have delivered without it. I suggest you take a
GB> closer look at it. Linux is a shareware version of an antiquated OS
GB> from the 1970s...nothing more, nothing less. :}
Well this is obvious bait. So I won't address much. I agree Window95 and
WindowsNT are entirely different animals. Linux is a very modern
operating system and many of the technologies are very current in
operating systems.
GB> LightWave was the ONLY software running on the NT farm and NT
GB> workstations. The choice of linux for the other farm was merely a
GB> convenience for two programmers (the ones who wrote the article), who
GB> could have easily ported command-line code to NT as well as Linux.
GB> This, and other similar decisions, cost the facility (and actually Fox)
GB> a fortune in time and lost productivity as every time the linux machines
GB> bombed out, dozens of compositors were left in the lurch (every one of
GB> them being paid very high rates per hour mind you). The only problem
GB> exhibited by the LightWave/NT machines came from the render control
GB> software, which we just replaced when it became clear that the control
GB> software was "found wanting". This problem was not the least bit OS
GB> related.
Lightwave was used on the NT systems.
The choice of Linux was made for a number of reasons. The primary one
was integration into the rest of our facility. The ease of porting our
applications did come into play. Our distributed rendering system and
compositing system were much easier to get running under Linux than
NT. Since then we have ported those applications, as it makes the NT
systems more productive.
Not having an effective means of distributed rendering on the NT boxes
was a serious problem. That was not the case for the Linux boxes.
GB> In fact, one of my favorite Linux moments was one of the authors of the
GB> article asked the NT sysadmin "how many OS related crashes do you get a
GB> day?" The answer was, of course, "None" because neither of us would
GB> have recommended NT machines on a production like Titanic if they
GB> weren't 100% reliable. Perhaps he was trying to see if the hardware was
GB> to blame. The author, puzzled, decided not to tell us how many times
GB> per day the Linux OS was crashing. :}
As I said before we definetly had
...
read more »