Hash clash in polymorphic variants

Post by Jon Harrop
ISTR advice that constructors sharing the first few characters should be
avoided in order to reduce the likelihood of clashing hash values for
polymorphic variants. Is that right?

I don't think it's worth worrying about.

I wrote a program a while ago to look into this. I never saw any
"human-sensible" collisions (between two identifiers that a person
might have chosen). And if you're producing gensyms in a program, you
can just check ahead of time.

To find a collision with a given identifier, consider each bignum N
that differs by a multiple of 2^31 from the identifier's hash value.
Compute the radix-223 representation of N. If that forms a legal
OCaml identifier, then you've found a collision.

For example, Eric_Cooper collides with azdwbie, c7diagq, hlChrkt,
NSaServ, and SaupDOF, to pick just a few.

--
Eric (call me SaupDOF) Cooper e c c @ c m u . e d u

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jon Harrop

2008-01-10 21:24:26 UTC

Post by Eric Cooper

I'm interested in automatically translating the GL_* enum from OpenGL into
polymorphic variants. So although it is generated code I have little control
over it, e.g. I cannot change the translation as OpenGL gets extended because
code will already be using the existing names.

Still, maybe I'm over-reacting. ;-)
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

David Allsopp

2008-01-10 21:40:38 UTC

Post by Eric Cooper

Post by Jon Harrop
ISTR advice that constructors sharing the first few characters should
be avoided in order to reduce the likelihood of clashing hash values
for polymorphic variants. Is that right?

I presume you're worried about the bindings clashing internally rather than
someone who uses the library happening to use a variant that clashes?

You can do something about it - when you're generating your bindings, you
can use the hash_variant() C function to detect the collisions yourself. If
you detect one, you can either issue *your own* warning while generating the
bindings allowing you to specify specific renaming for the program
generating your bindings or you could append digits to the names until the
collisions disappear (which is likely, though not guaranteed, to happen
quickly).

It's slightly ugly, but then the possibility of collisions in the first
place is IMHO ugly too!

David

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Kuba Ober

2008-01-11 13:30:29 UTC

Post by David Allsopp

Post by Eric Cooper

Post by Jon Harrop
ISTR advice that constructors sharing the first few characters should
be avoided in order to reduce the likelihood of clashing hash values
for polymorphic variants. Is that right?

I don't think it's worth worrying about.

I'm interested in automatically translating the GL_* enum from OpenGL into
polymorphic variants. So although it is generated code I have little

Are those collisions of any real importance? I mean, do they break anything?
If all they do is imply linearly searching a list of a few elements, for the
colliding entry, then it's a non-issue?

Cheers, Kuba

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jon Harrop

2008-01-11 13:48:15 UTC

Post by Kuba Ober
Are those collisions of any real importance? I mean, do they break
anything? If all they do is imply linearly searching a list of a few
elements, for the colliding entry, then it's a non-issue?

It would prevent code from compiling so it would be a complete show-stopper.

In this case, there is a chance that a hash clash in names that I have no
control over would break my OpenGL bindings at some point in the future.

A theoretical solution would be to grow the bindings and avoid clashes in
identifiers included in later versions of OpenGL by adding random suffixes.
Although this works in theory, in practice it places the burden of a linear
search on the programmer who must then sift through the bindings to find out
if the identifier they want to use happens to have had an internal clash in
my bindings and, therefore, would require them to use a different identifier.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Kuba Ober

2008-01-11 16:14:03 UTC

It would prevent code from compiling so it would be a complete
show-stopper.

So what you're saying is that the implementation uses the hash with bucket
size of 1? That's kinda poor decision, methinks.

Maybe perfect hashes should be used, computed at link time (and at runtime
whenever a module is linked in). The pefect hashing function could probably
implement some sort of a table, so that no real code would need to be
generated, just recomputing of decision tree table. Gperf code could be
adapted for that. The benefit is that there would be no collisions, the hashed
data structure would be very compact, and the cost to regenerate the hash is
amortized. Ideally, one would generate the actual perfect hashing function,
but this is currently only possible in bytecode, right? I mean, toplevel won't
run in native code? Or am I mistaken?

Kuba

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

David Allsopp

2008-01-11 18:40:53 UTC

It would prevent code from compiling so it would be a complete
show-stopper.

So what you're saying is that the implementation uses the hash with bucket
size of 1? That's kinda poor decision, methinks.

I think you're missing the context - there's no hash table. See 18.3.6 in
the manual - the hashed values (and resulting collisions) are to do with the
internal representation of polymorphic variants.

The compiler cannot process code that uses two polymorphic variants whose
tag names will have the same internal representation (and therefore be
incorrectly viewed as having the same value). The test is probably performed
somewhere in the type checker...

An alternative implementation might have been to lookup the tags (in a
perfect hash table) using a system similar to caml_named_value but I imagine
that the present method was preferred because it's simpler (and quite
possibly faster) and collisions are rare (as Eric pointed out) - although in
Jon's case the lack of a guarantee is unfortunate.

Incidentally, and off-the-subject here, using a hash table with a bucket
size of 1 is very important if you need performance guarantees on your hash
table and have some other way of coping with collisions.

David

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Kuba Ober

2008-01-14 12:20:01 UTC

Post by David Allsopp

It would prevent code from compiling so it would be a complete
show-stopper.

So what you're saying is that the implementation uses the hash with bucket
size of 1? That's kinda poor decision, methinks.

Yeah, I sort of put the wagon ahead of the horse. Of course the hashing
function doesn't imply a hash table.

What I meant was simply that instead of using some fixed hash function, one
could use a perfect hashing function which is optimal for its known set of
inputs, and won't ever generate a collision.

The tables that such a function uses to hash its input have to be generated at
link-time, which means run-time too.

Cheers, Kuba

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Stefan Monnier

2008-01-14 14:44:58 UTC

Post by Kuba Ober
What I meant was simply that instead of using some fixed hash function, one
could use a perfect hashing function which is optimal for its known set of
inputs, and won't ever generate a collision.

The problem is that the set of inputs is not know at compile time, only
at link time.

Stefan

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Kuba Ober

2008-01-14 14:56:25 UTC

Post by Kuba Ober
What I meant was simply that instead of using some fixed hash function,
one could use a perfect hashing function which is optimal for its known
set of inputs, and won't ever generate a collision.

The problem is that the set of inputs is not know at compile time, only
at link time.

As I've said in the cited post, the perfect hash generator would have to be
invoked at link time, which shouldn't be a big deal.

Cheers, Kuba

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

David Allsopp

2008-01-14 15:37:50 UTC

Post by Kuba Ober
What I meant was simply that instead of using some fixed hash
function, one could use a perfect hashing function which is optimal
for its known set of inputs, and won't ever generate a collision.

The problem is that the set of inputs is not know at compile time, only
at link time.

As I've said in the cited post, the perfect hash generator would have to
be invoked at link time, which shouldn't be a big deal.

Assuming you're talking hypothetically and designing a new runtime then,
yes, it's not a big deal.

However, this scheme could not just be dropped into the present system - it
would not work with dynamic linking because once you've hashed a polymorphic
variant tag-name you drop the name so you can't re-hash when you update your
perfect hashing function... unless you can devise a perfect hashing scheme
that hashes all the old keys to their old values and new ones to
non-clashing new values ;o)

Internally, `Foo is indistinguishable from the int 3505894* - so if
caml_hash_variant("Foo") suddenly changes value mid-program then any
previous instances of `Foo in memory cease to be equal to it!

David

* Try:
# (Obj.magic `Foo : int);;
- : int = 3505894
# (Obj.magic 3505894) = `Foo;;
- : bool = true

I don't know whether caml_hash_variant varies between version or even
platform so the actual number may be different on other systems.

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Kuba Ober

2008-01-14 15:44:05 UTC

Post by David Allsopp

Post by Kuba Ober
What I meant was simply that instead of using some fixed hash
function, one could use a perfect hashing function which is optimal
for its known set of inputs, and won't ever generate a collision.

The problem is that the set of inputs is not know at compile time, only
at link time.

As I've said in the cited post, the perfect hash generator would have to
be invoked at link time, which shouldn't be a big deal.

A trivial solution to that is to keep both, as obviously each time an
equivalent of dlopen() is made, everything has to be rehashed. gperf
is "slightly" memory-hungry, so surely it'd need to be something using a
different algorithm. I'm talking hypothetically, but I also think it's a
weird design decision to use those possibly-colliding hashes. String
sorting/comparison isn't exactly a CPU killer, so couldn't the original names
have been used instead? I admit not to knowing too many details of the
current implementation of course ;(

Cheers, Kuba

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

David Allsopp

2008-01-14 16:03:35 UTC

Post by Kuba Ober
A trivial solution to that is to keep both, as obviously each time an
equivalent of dlopen() is made, everything has to be rehashed. gperf
is "slightly" memory-hungry, so surely it'd need to be something using a
different algorithm. I'm talking hypothetically, but I also think it's a
weird design decision to use those possibly-colliding hashes.

I agree that it's a bit weird - but the clashes are very rare (and the
function was designed to keep them rare for "normal" usage).

Post by Kuba Ober
String sorting/comparison isn't exactly a CPU killer, so couldn't the
original names have been used instead?

String comparison is much slower than integer comparison... we're talking
about one CPU instruction compared to a for loop! Jon would never use them
again :o) Not to mention the storage overhead of keeping the tag names in
memory - not great if you've got long lists of `YetAnotherTag.

David

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Stefan Monnier

2008-01-14 15:45:18 UTC

Post by Kuba Ober
What I meant was simply that instead of using some fixed hash function,
one could use a perfect hashing function which is optimal for its known
set of inputs, and won't ever generate a collision.

The problem is that the set of inputs is not know at compile time, only
at link time.

As I've said in the cited post, the perfect hash generator would have to be
invoked at link time, which shouldn't be a big deal.

That would require postponing the execution of the hash-function to
link-time or run-time. Run-time is clearly undesirable, and link-time
adds yet-more complexity to the linker.

It's not a bad idea, obviously, but AFAICT the linker currently is kept
very simple.

Stefan

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jacques Garrigue

2008-01-15 03:36:21 UTC

Post by Kuba Ober
What I meant was simply that instead of using some fixed hash function,
one could use a perfect hashing function which is optimal for its known
set of inputs, and won't ever generate a collision.

The problem is that the set of inputs is not know at compile time, only
at link time.

As I've said in the cited post, the perfect hash generator would have to be
invoked at link time, which shouldn't be a big deal.

Unfortunately, this would make marshalling between different programs
much more complicated...

Another advantage of knowing the hash function at compile time is
that you can generate efficient code for pattern matching. Since you
already know the ordering of tags, it is easy to generate a decision
tree. I didn't check very recently about efficiency for polymorphic
variants, but the depth of the decision tree is logarithmic in the
number of tags involved in the pattern matching, and if you can keep
it below 3 or 4 (about 10 tags) you can be actually faster than a
jump table.
Another comparison is with the old implementation for method calls.
Originally ocaml used your idea for methods: method hashes were
generated at initialization time. The scheme for dispatch was a two
level array, compressed by reusing buckets so that you don't use too
much memory. This meant actually 3 array accesses for a method call.
The current scheme reuses variant hashes, and implements a simple
dichotomic search, together with an index cache for each call site.
This doesn't look very efficient, but on small method tables, the
search is almost as fast as the old approach, and if the cache hits
this is much faster...

Now concerning the risks of name conflicts. The main point of
polymorphic variants is that there is only a conflict if the two tags
appear in the same type. And logically the type should stay small.
If you want to put all GLenum's inside the same type, then you may
well end up with conflicts. But what LablGL shows is that in practice
only a small number of tags are used together. So if you can partition
your set of tags so that each type has at most 64 tags, then you get
a probability conflict less than 1 per million for each type. This
seems safe enough. But if you have one type with 2000 tags, then the
probability is 1 per thousand. Not that much, but it can happen.
(p(n) is n*n / 2**32)

Jacques Garrigue

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jon Harrop

2008-01-15 04:59:03 UTC

Post by Jacques Garrigue
Unfortunately, this would make marshalling between different programs
much more complicated...

Do people marshal polymorphic variants between different programs?

Post by Jacques Garrigue
Another advantage of knowing the hash function at compile time is
that you can generate efficient code for pattern matching. Since you
already know the ordering of tags, it is easy to generate a decision
tree. I didn't check very recently about efficiency for polymorphic
variants, but the depth of the decision tree is logarithmic in the
number of tags involved in the pattern matching, and if you can keep
it below 3 or 4 (about 10 tags) you can be actually faster than a
jump table.

For 3-16 tags on AMD64, jump tables (ordinary variants) are 2x slower than
decision trees (polymorphic variants) when branches are taken at random.
However, jump tables are consistently up to 2x faster when a single branch is
taken repeatedly. So caching jump tables is more effective at run-time
optimizing pattern matches over ordinary variants than branch prediction is
at optimizing decision trees for pattern matches over polymorphic variants.

So the advantage of a decision tree is probably insignificant on real code
because it will lie between these two extremes.

Post by Jacques Garrigue
Now concerning the risks of name conflicts. The main point of
polymorphic variants is that there is only a conflict if the two tags
appear in the same type. And logically the type should stay small.
If you want to put all GLenum's inside the same type, then you may
well end up with conflicts. But what LablGL shows is that in practice
only a small number of tags are used together.

Can LablGL's design support OpenGL extensions?
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jacques Garrigue

2008-01-15 09:01:42 UTC

Post by Jacques Garrigue
Unfortunately, this would make marshalling between different programs
much more complicated...

Do people marshal polymorphic variants between different programs?

Do people marshal data between different programs (or different
versions of the same program)?

Post by Jon Harrop
For 3-16 tags on AMD64, jump tables (ordinary variants) are 2x slower than
decision trees (polymorphic variants) when branches are taken at random.
However, jump tables are consistently up to 2x faster when a single branch is
taken repeatedly. So caching jump tables is more effective at run-time
optimizing pattern matches over ordinary variants than branch prediction is
at optimizing decision trees for pattern matches over polymorphic variants.
So the advantage of a decision tree is probably insignificant on real code
because it will lie between these two extremes.

Since the goal was never to be faster than ordinary variants, but just
obtain comparable speed, this seems good :-)

Can LablGL's design support OpenGL extensions?

I'm not sure what this means.
Since LablGL was coded by hand, adding extensions would mean modifying
it.
One might want to add a way to detect whether an extension is
available or not, but making it static does not seem a good idea: one
wouldn't even be able to compile code using an extension that is not
available.
Also, one might want to make code generation automatic, particularly
for C wrappers, to allow adding cases to functions easily. This should
be doable, but there is no infrastructure for that currently
(using CPP macros was simpler to start with...)

Jacques Garrigue

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jon Harrop

2008-01-15 18:17:32 UTC

Post by Jacques Garrigue

Post by Jacques Garrigue
Unfortunately, this would make marshalling between different programs
much more complicated...

Do people marshal polymorphic variants between different programs?

Do people marshal data between different programs (or different
versions of the same program)?

I suspect OCaml's marshalling is used almost entirely between same versions of
the same programs.

In particular, I was advised against marshalling data between different
versions of the same program because this is unsafe (not just type safety but
the format used by Marshal is not ossified).

Post by Jacques Garrigue

Post by Jon Harrop
So the advantage of a decision tree is probably insignificant on real
code because it will lie between these two extremes.

Since the goal was never to be faster than ordinary variants, but just
obtain comparable speed, this seems good :-)

Yes. This would probably also work ok if you used a symbol table to store
exact identifier names rather than just a hash. The symbol's index in the
table would serve the same purpose as the hash.

Post by Jacques Garrigue

Can LablGL's design support OpenGL extensions?

I'm not sure what this means.

OpenGL has an extension mechanism that can be queried at run-time. If a given
extension is available then you can do things that you could not do before,
such as pass a GLenum to a function that might not have accepted it without
the extension.

Post by Jacques Garrigue
Since LablGL was coded by hand, adding extensions would mean modifying
it.

Exactly, that is a limitation of LablGL's design and, therefore, I think it is
was quite wrong of you to claim "LablGL shows is that in practice only a
small number of tags are used together" when LablGL's use of small, closed
sum types is actually a design limitation that would not be there if it
supported all of OpenGL, i.e. the extension mechanism.

Incidentally, Xavier made a statement based upon what appears to me to be a
similar logical error in the CUFP notes from last year that I read recently:

"On the other hand, certain features seem somewhat unsurprisingly to be
unimportant to industrial users. GUI toolkits are not an issue, because GUIs
tend to be built using more mainstream tools; it seems that different
competencies are involved in Caml and GUI development and companies "don't
want to squander their precious Caml expertise aligning pixels". Rich
libraries don't seem to matter in general; presumably companies are happy to
develop these in-house. And no-one wants yet another IDE; the applications of
interest are usually built using a variety of languages and tools anyway, so
consistency of development environment is a lost cause."
- http://cufp.galois.com/CUFP-2007-Report.pdf (page 3)

Xavier appears to have taken the biased sample of industrialists who already
use OCaml despite its limitations and has drawn the conclusion that these
limitations are not important to industrialists. I was really horrified to
see this because, in my experience, companies are turning away from OCaml in
droves because of exactly the limitations Xavier enumerated and I for one
would dearly love to see them fixed.

OCaml will continue to go from strength to strength regardless but its uptake
would be vastly faster if these problems are addressed. To take them point by
point:

. GUIs are incredibly important (LablGTK is the world's favorite OCaml
library!) and tens of thousands of OCaml programmers are crying out for
proper LablGTK documentation as a first priority, many of whom are in
industry.

. Rich libraries are incredibly important and OCaml has the potential to
become a hugely successful commercial platform where people can buy and sell
cross-platform libraries but OCaml needs support for shared run-time DLLs (or
something equivalent) this before this can happen.

. An easy-to-use IDE would be an excellent way to kick-start people learning
OCaml even if an industrial-strength IDE is intractable.

Post by Jacques Garrigue
Also, one might want to make code generation automatic, particularly
for C wrappers, to allow adding cases to functions easily. This should
be doable, but there is no infrastructure for that currently
(using CPP macros was simpler to start with...)

Yes. A better FFI could also be enormously beneficial. Improving upon OCaml's
FFI is one of the most alluring aspects of a reimplementation on LLVM, IMHO.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Gerd Stolpmann

2008-01-15 19:20:09 UTC

An interesting thesis, right? Although I wouldn't get that far, there is
some truth in it. The point, IMHO, is that OCaml will never replace
other languages in the sense that a company who uses language X for
years in product Y rewrites the code in OCaml. For what reason? The
company would run into big educational problems (learning a new
environment), would have high initial costs, and it is questionable
whether the result is better. Of course, for rewriting existing software
the company would profit from GUIs, from rich libraries etc. But I think
this does not happen.

What I see, however, is that OCaml is used where new software is
developed, in ambitious projects that start from scratch. It is simply a
fact that GUIs are not crucial in these areas (at least for the
companies I know). GUIs are seen as standard tools where nothing new
happens where OCaml could shine. If you need one, you develop it in one
of the mainstream languages.

IDEs aren't interesting right now because OCaml is mainly used by
(computer & related) scientists (and I include scientists working for
companies outside academia). IDEs are nice for beginners and for people
who do not want to know what's happening inside. They are not
interesting for companies that invent completely new types of products,
because they've hired experts that can live without (and want to live
without).

Post by Jon Harrop
Xavier appears to have taken the biased sample of industrialists who already
use OCaml despite its limitations and has drawn the conclusion that these
limitations are not important to industrialists. I was really horrified to
see this because, in my experience, companies are turning away from OCaml in
droves because of exactly the limitations Xavier enumerated and I for one
would dearly love to see them fixed.

Which companies?

I fully understand that OCaml is not well-suited for the average
company. But it is not because of missing GUIs and IDEs, but because the
language itself is too ambitious. Sorry to say that, but this is not the
mainstream and it will never be.

(I have a good friend who works for an average company, so I know what
I'm talking of. They program business apps for a commercial platform
from CA. A horrible language, but they can manage it. They are experts
for the models they use, and simply take a platform from industry.)

Post by Jon Harrop
OCaml will continue to go from strength to strength regardless but its uptake
would be vastly faster if these problems are addressed. To take them point by
. GUIs are incredibly important (LablGTK is the world's favorite OCaml
library!) and tens of thousands of OCaml programmers are crying out for
proper LablGTK documentation as a first priority, many of whom are in
industry.

See this as opportunity for your next book :-)

GTK is already poorly documented, so this is not only the problem of the
LablGTK creators. Nevertheless, GTK is widely used. I don't think it's a
real problem.

Post by Jon Harrop
. Rich libraries are incredibly important and OCaml has the potential to
become a hugely successful commercial platform where people can buy and sell
cross-platform libraries but OCaml needs support for shared run-time DLLs (or
something equivalent) this before this can happen.

Do you dream or what?

I don't think that selling libraries in binary form is that important...
It is difficult anyway to do that, and why do you expect you could be
successful in a niche language? As customer I would demand to get the
source code - to lower the risks of the investment into a small
platform.

Post by Jon Harrop
. An easy-to-use IDE would be an excellent way to kick-start people learning
OCaml even if an industrial-strength IDE is intractable.

Yes. A better FFI could also be enormously beneficial. Improving upon OCaml's
FFI is one of the most alluring aspects of a reimplementation on LLVM, IMHO.

A general question to you: When you are complaining about so many
aspects of OCaml, why don't you invest time & money to fix them? We
would all be very thankful.

Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
***@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jon Harrop

2008-01-15 22:04:45 UTC

Post by Jon Harrop
Incidentally, Xavier made a statement based upon what appears to me to be
a similar logical error in the CUFP notes from last year that I read
"On the other hand, certain features seem somewhat unsurprisingly to be
unimportant to industrial users. GUI toolkits are not an issue, because
GUIs tend to be built using more mainstream tools; it seems that
different competencies are involved in Caml and GUI development and
companies "don't want to squander their precious Caml expertise aligning
pixels". Rich libraries don't seem to matter in general; presumably
companies are happy to develop these in-house. And no-one wants yet
another IDE; the applications of interest are usually built using a
variety of languages and tools anyway, so consistency of development
environment is a lost cause."
- http://cufp.galois.com/CUFP-2007-Report.pdf (page 3)

I believe many more companies would migrate to OCaml if it had well-documented
GUI APIs and rich libraries. Indeed, Microsoft are gambling on people
migrating to F# in exactly the same way.

Post by Gerd Stolpmann
What I see, however, is that OCaml is used where new software is
developed, in ambitious projects that start from scratch. It is simply a
fact that GUIs are not crucial in these areas (at least for the
companies I know).

But the companies you know were already self-selected to be the ones who do
not care about OCaml's limitations, so it is a biased sample?

Post by Gerd Stolpmann
GUIs are seen as standard tools where nothing new happens where OCaml could
shine.

I have no doubt that OCaml would shine in GUIs just as it does elsewhere.

Post by Gerd Stolpmann
If you need one, you develop it in one of the mainstream languages.

Actually I would either choose F# on Windows or give up on any other OS.

Post by Gerd Stolpmann
IDEs aren't interesting right now because OCaml is mainly used by
(computer & related) scientists (and I include scientists working for
companies outside academia).

Many of the world's most sophisticated IDEs are targetted solely at technical
users. Look at Mathematica's notebook interface, for example. I believe that
is a great example to aspire to.

Post by Gerd Stolpmann
IDEs are nice for beginners and for people
who do not want to know what's happening inside. They are not
interesting for companies that invent completely new types of products,
because they've hired experts that can live without (and want to live
without).

I couldn't disagree more. Pharmaceuticals are a trillion dollar industry where
many scientists would benefit enormously from being able to use a tool like
OCaml without knowing anything about how it works in order to create their
next generation products (drugs). The same is true of most industries where
scientists and engineers work and there are many such industries and there
are extremely profitable.

Post by Jon Harrop
Xavier appears to have taken the biased sample of industrialists who
already use OCaml despite its limitations and has drawn the conclusion
that these limitations are not important to industrialists. I was really
horrified to see this because, in my experience, companies are turning
away from OCaml in droves because of exactly the limitations Xavier
enumerated and I for one would dearly love to see them fixed.

Which companies?

General Electric, Microsoft, Wolfram Research and various bioinformatics
institutes for example.

Look at General Electric. They build some of the world's most sophisticated
medical scanners and that large-scale embedded market is ideal for using
languages like OCaml for its high-performance numerics because you have
complete control over the environment. However, they desperately need GUI
toolkits to provide a front-end for users.

I'd like to know what Alex Barretta makes of this, for example. His glass
cutters must have the same characteristics in this respect...

Post by Gerd Stolpmann
I fully understand that OCaml is not well-suited for the average
company. But it is not because of missing GUIs and IDEs, but because the
language itself is too ambitious. Sorry to say that, but this is not the
mainstream and it will never be.

I still think OCaml has the best chance of any FPL to become a mainstream tool
in technical computing.

Indeed, I recently tried to quantify how far OCaml has already come and I
believe it is already as popular as C# among technical users, for example.
That is quite an achievement!

Post by Gerd Stolpmann
(I have a good friend who works for an average company, so I know what
I'm talking of. They program business apps for a commercial platform
from CA. A horrible language, but they can manage it. They are experts
for the models they use, and simply take a platform from industry.)

Yes. I do not believe OCaml will make significant inroads into displacing
COBOL and relatives but there are a lot of other big opportunities out there
for such a language.

Post by Jon Harrop
OCaml will continue to go from strength to strength regardless but its
uptake would be vastly faster if these problems are addressed. To take
. GUIs are incredibly important (LablGTK is the world's favorite OCaml
library!) and tens of thousands of OCaml programmers are crying out for
proper LablGTK documentation as a first priority, many of whom are in
industry.

See this as opportunity for your next book :-)

Indeed. Even after the announcement that Microsoft are productizing F#, OCaml
for Scientists continues to be our biggest earning product. Consequently, I
am very tempted to write a "sequel" that covers many of the important aspects
of the language that I did not cover in the original, including GUI
programming, XML, parallelism and so forth. If anyone has ideas for subjects
they would like to see covered, please e-mail me!

Post by Gerd Stolpmann
GTK is already poorly documented, so this is not only the problem of the
LablGTK creators. Nevertheless, GTK is widely used. I don't think it's a
real problem.

Yes. I'm really not sure what the best course of action would be here. Would
Qt bindings be preferable? Is it worth the hassle? How long would it be
before they reached the maturity of GTK?

I think we would really need more high-profile open source programs with
hundreds of thousands of users testing the bindings (as GTK has had) before
you could really gamble on it.

Post by Jon Harrop
. Rich libraries are incredibly important and OCaml has the potential to
become a hugely successful commercial platform where people can buy and
sell cross-platform libraries but OCaml needs support for shared run-time
DLLs (or something equivalent) this before this can happen.

Do you dream or what?

One man's reality is another man's dream. :-)

Post by Gerd Stolpmann
I don't think that selling libraries in binary form is that important...

If it were possible then it would be important to me because I could earn a
living from it. I'm sure the same is true for many other people.

Post by Gerd Stolpmann
It is difficult anyway to do that, and why do you expect you could be
successful in a niche language?

Because I already am. :-)

Post by Gerd Stolpmann
As customer I would demand to get the source code - to lower the risks of
the investment into a small platform.

Nobody ever got fired for buying IBM.

Historically, we've made a lot more money from sales of binaries than from
sales of source code. Consequently, I would be more than willing to gamble on
selling shared run-time DLLs for OCaml users if it were possible.

Post by Jon Harrop
Yes. A better FFI could also be enormously beneficial. Improving upon
OCaml's FFI is one of the most alluring aspects of a reimplementation on
LLVM, IMHO.

A general question to you: When you are complaining about so many
aspects of OCaml, why don't you invest time & money to fix them?

An excellent idea!

So I wrote to Xavier Leroy and asked about contributing to INRIA's OCaml
distribution. Xavier explained that French copyright law makes it
prohibitively difficult for him to include my code contributions so this will
never be possible. The best I could think of was to suggest that they make it
possible for users to pay to get certain bugs fixed or functionality
implemented. I'm not sure that will happen though.

I wrote to Pierre Weis and asked what the likelihood of getting some tweaks
into the language was. He said that it is unlikely I could even get a "try ..
finally" construct put in.

So there's no way I can improve INRIA's OCaml distribution. Next, I thought
perhaps a complete fork of OCaml would be a viable alternative. This is
complicated by OCaml's license which requires variants to be distributed with
the core sources intact and everything else as patches to it. This is not an
insurmountable problem, of course, you just distribute the core and a giant
autogenerated patch instead. So I asked Sylvain about getting Debian to adopt
the fork rather than INRIA's upstream. He said this will almost certainly not
happen.

So I can't develop or contribute to INRIA's OCaml implementation and I can't
fork it without starting with zero users. What about reimplementing it?

So I wondered what I could build upon that would make this as painless as
possible. This led me to the Smoke VM, Mono, the JVM and LLVM. I enumerated
each of these in turn and came to the conclusion that LLVM is preferable, not
least because several other people had already drawn the same conclusion and
started work on similar projects themselves.

That's when I wrote my 100LOC test program calling LLVM from OCaml. Since
then, Gordon has been working hard on the OCaml bindings and example
programs, which are now nothing short of incredible. Dozens of people have
e-mailed me expressing their desire to contribute to such an effort.

This will take time, of course, but I believe it is the future of the OCaml
language.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Kuba Ober

2008-01-16 13:48:14 UTC

Post by Jon Harrop
Incidentally, Xavier made a statement based upon what appears to me to
be a similar logical error in the CUFP notes from last year that I read
"On the other hand, certain features seem somewhat unsurprisingly to
be unimportant to industrial users. GUI toolkits are not an issue,
because GUIs tend to be built using more mainstream tools; it seems
that different competencies are involved in Caml and GUI development
and companies "don't want to squander their precious Caml expertise
aligning pixels". Rich libraries don't seem to matter in general;
presumably companies are happy to develop these in-house. And no-one
wants yet another IDE; the applications of interest are usually built
using a variety of languages and tools anyway, so consistency of
development environment is a lost cause."
- http://cufp.galois.com/CUFP-2007-Report.pdf (page 3)

I believe many more companies would migrate to OCaml if it had
well-documented GUI APIs and rich libraries. Indeed, Microsoft are gambling
on people migrating to F# in exactly the same way.

But the companies you know were already self-selected to be the ones who do
not care about OCaml's limitations, so it is a biased sample?

Post by Gerd Stolpmann
GUIs are seen as standard tools where nothing new happens where OCaml
could shine.

I have no doubt that OCaml would shine in GUIs just as it does elsewhere.

In fact, after some initial thinking and looking around it seems that the
only "sane" GUI for OCaml, at this time, is Qt, but someone has to write a
machine translator to port it from C++ to OCaml. Qt is reasonably well
designed, and has the richest feature set of all GUI toolkits, even if you
combined all the competition and treated it as one "other" toolkit.

Using Qt with some machine (or not!) generated bindings is just a huge
waste -- it's a nice, clean design, which has recently been tweaked for
performance (some Qt4 apps start in 50% of the time just by having been
ported to Qt4 from Qt3).

Cheers, Kuba

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Dario Teixeira

2008-01-16 15:02:54 UTC

Hi,

Post by Kuba Ober
In fact, after some initial thinking and looking around it seems that the
only "sane" GUI for OCaml, at this time, is Qt, but someone has to write a
machine translator to port it from C++ to OCaml. Qt is reasonably well
designed, and has the richest feature set of all GUI toolkits, even if you
combined all the competition and treated it as one "other" toolkit.
Using Qt with some machine (or not!) generated bindings is just a huge
waste -- it's a nice, clean design, which has recently been tweaked for
performance (some Qt4 apps start in 50% of the time just by having been
ported to Qt4 from Qt3).

I'm inclined to agree. I would even go as far as saying that the lack of
Qt bindings is perhaps the biggest open sore as far as Ocaml library support
is concerned.

The guys at Trolltech, however, seem quite keen on having Qt on as many
platforms as possible (Qt-Jambi, which brings Qt to the JVM is one of their
products). Couldn't this whole auto-generation of bindings be made easier
if they got involved? I am sure they already have plenty of tools in
place to facilitate it. Even if they were not to commit actual manpower
to the effort, they might still be able to help.

And incidentally, the afore mentioned Qt-Jambi, together with the Ocamljava
project might provide a last-resort solution in the absence of native bindings.
Another possibility might be the Qyoto/Kimono project (which brings Qt/KDE
into .net) together with the OcamlIL project (if it's still alive). You would
then use Mono to run Ocaml programmes.

cheers,
Dario

__________________________________________________________
Sent from Yahoo! Mail - a smarter inbox http://uk.mail.yahoo.com

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Jon Harrop

2008-01-16 19:00:25 UTC

Post by Dario Teixeira
I'm inclined to agree. I would even go as far as saying that the lack of
Qt bindings is perhaps the biggest open sore as far as Ocaml library
support is concerned.

As I understand it, OCaml's FFI makes writing Qt bindings an enormous
undertaking which is why we don't have any.

I'm happy with GTK for now and would rather see OpenGL 2 bindings instead.

Post by Dario Teixeira
The guys at Trolltech, however, seem quite keen on having Qt on as many
platforms as possible (Qt-Jambi, which brings Qt to the JVM is one of their
products). Couldn't this whole auto-generation of bindings be made easier
if they got involved? I am sure they already have plenty of tools in
place to facilitate it. Even if they were not to commit actual manpower
to the effort, they might still be able to help.

I found TrollTech's customer support awful as a customer so I very much doubt
they will go out of their way to help a really obscure virgin corner of the
Qt market. That was a few years ago though.

Post by Dario Teixeira
And incidentally, the afore mentioned Qt-Jambi, together with the Ocamljava
project might provide a last-resort solution in the absence of native
bindings. Another possibility might be the Qyoto/Kimono project (which
brings Qt/KDE into .net) together with the OcamlIL project (if it's still
alive). You would then use Mono to run Ocaml programmes.

I evaluated various such options recently and decided that Mono is truly awful
(very poorly written, unreliable and slow) and LLVM is absolutely superb
(extremely well-written C++ with complete native OCaml bindings!). Moreover,
Mono appears to have no future in its current form whereas LLVM has serious
backers and is improving at a tremendous rate.

Even if you don't want to implement a whole new language or backend, using
LLVM's JIT compilation for code generation has great potential for OCaml,
e.g. regexps. I highly recommend giving it a play!
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Kuba Ober

2008-01-17 13:09:17 UTC