:[He bumped into the malloc virtual allocation nonsense again.
: I still get mad just thinking about it.]
DITTO! And the excuses we get about it look just like that, EXCUSES!
:Some arguments to the effect that vapour-memory was a good thing were:
: -Lets you use gigantic sparse arrays.
: -Lets vendors ship Fortran binaries with static arrays dimensioned
: to maximum size, and yet have them run on small machines for small
: problems that use only part of the arrays.
:I'm skeptical. Sparse arrays at 4kB/page? As for the Fortran bit, it
:only makes sense on machines dedicated to a single application. That
:sure isn't the way we use ours.
Dedicating a WS to a single (set of) application IS the typical way that
cad.lab customers would use their machines, and we are heavy Fortran
users, and the malloc()-but-not-really idea STILL stinks. We are
selling INDUSTRIAL STRENGTHS applications, that will be used for CRUCIAL
PRODUCTION WORK; it's COMPLETELY UNACCEPTABLE for our customers to lose
data because the application dumps abruptly!!! So our apps are full of
In particular, we do NOT place data, that will grow for large problems,
inside Fortran arrays; they reside, instead, in areas which are
dynamically allocated by an underlying library written in C, and
accessed via functions or subroutines by the Fortran portions. On
machines where malloc() semantics make sense, the C routine will return
an error indicator to the Fortran portion if it's unable to get the
memory requested; in this case, the application communicates to the
interactive user that the requested operation cannot be completed due to
running out of virtual memory, but the app is still alive and the user
can save hir work so far, and restart from there presumably after having
We've been particularly careful that nothing in the save-to-disk
subsystem NEEDS to allocate extra memory, so that the saving will work
even in crucial memory-low situations; we even had to recode the
output-to-file portions as C subroutines running over low-level
systemcalls, as we found with surprise that Fortran I/O, and C stdio, on
some platforms, may need a malloc() to succeed and will die if it fails
(and, yes, our applications ARE and WILL REMAIN extremely portable
All this care, of course, is for naught on the IBM R/6000 (thankfully we
don't presently run on DG Aviion, where malloc() reportedly's similarly
broken). And no, we can't just set "limit datasize" appropriately,
because it depends on what the user is doing exactly: sometimes the 3D
modeler will be running alone, other times it will be scheduled together
with the 2D drafter and/or the surface renderer and/or the relational
database and/or the tool which builds programs for numerically
controlled tools and/or... each of these applications is written to be
able to run alone OR communicate with its brethren.
We've tried the tricks IBM suggested to stop our application from dying
in unexpected places, but what happens then is that OTHER processes
die -- and the first to go is typically the X server (a memory hog, I
guess!), so the user cannot communicate with the apps to ask to save...
and NO, we CANNOT just do the saving from the SIGDANGER handler as a
safetynet; the handler can be basically entered from anywhere in the
application, including "critical sections" where the data structures
are in transition and inconsistent (and NO, we CANNOT protect the
critical sections by turning off signals there, or we'll die for
lack of SIGDANGER handling).
Yes, I know that a thousand clever tricks spring to mind to workaround
one of the other of these problems, but believe me: we must have tried
at least 900 of them and they don't work. We've spent more time and
effort on battling this malloc() idiocy than on any other single porting
problem EVER (and with the huge list of platforms we've supported over
the years we've had quite SOME such problems, believe you me!)!!! Most
porting problems come from bugs in the target system, some from bugs in
our code, but here we're fighting against something BROKEN AS DESIGNED
-- ***HORRIBLY*** BROKEN. I would say it's been half the cost of the
IBM R/6000 port, if it weren't for the fact that the monstruously slow
linker (thankfully remedied in 3.2, but this port was started right at
system announcement...) and the bugs in the early X have driven that
cost way up. Anyway, at the end, we've given up and just document to
our customers how AND WHY their work may go up in smoke on IBM R/6000
and not on DEC, Olivetti, Sun, HP, Sony or other platforms.
If IBM ever gives us a malloc() WHICH WORKS, we'll be glad to use it.
And I hope that periodically rekindled flames about it will do some
good -- if we could get together with everybody who's suffered for
this and blackmail IBM into it the world would become a better place
in at least this small way...
CAD.LAB s.p.a., v. Ronzani 7/29, Casalecchio, Italia Fax: ++39 (51) 6130294