A couple of days ago I found myself writing a breadth-first search routine in C++ for a particular puzzle (Klotski, to be specific). Since I was using Visual C++, I needed a set for the board database, and decided to use its stdext::hash_set.
See, Visual C++'s Standard Template Library (STL) is based on the implementation from Dinkumware, and has the following prototype:
class Traits = hash_compare<Key, less<Key> >,
class Allocator = allocator<Key> >
Notice two things about this hash set implementation: it takes a less-than predicate, and it doesn't take an equality predicate. What the frick kind of hash table requires a less-than predicate??? And we're not talking about a predicate for the hash code here, but a less-than comparison operator for the whole key. It defeats one of the main advantages of a hash container, since IMO it's often easier to write hash and equality predicates instead of a less-than predicate, as long as you aren't dealing with floats. I've never seen a hashed container that required this -- not in SGI-STL/STLport, not in Java, not in the .NET Framework. As you might expect, this means that VC8's hash_set is a pain in the butt to use and is slow. Thankfully, C++ TR1's unordered_set follows the SGI-STL/STLport path rather than Dinkumware's.
I haven't really had a need for a full blown hash_map or a hash_set in VirtualDub, but I'm thinking that if I do, I'll be better off writing my own....
Well, the find function just calls lower_bound, no surprise it needs the less than comparison. If it had a great than it would call upper_bound surely :) Why does a hash_set (or hash_map) have such a functions is another question.
Gabest - 03 05 08 - 23:52
I'm not sure how it works for a regular hash_set or hash_map given that the hash tables are unordered at the hash bucket level -- they can be ordered in the hash chains depending on the implementation -- but lower_bound(), upper_bound(), and equal_range() are useful for a hash_multimap or hash_multiset. You can use equal_range() to extract a range of identically keyed elements within the container. This does require the overhead of at least partially sorting the hash chain, however. A lot of the time I only need one operation in a hash container, insert(), and so a reduced hash table can be faster than a full-blown STL one.
A less-than predicate suffices to give all six relational operators: (x > y) = (y < x), (x == y) == !(x < y) && !(y < x), (x y), (x >= y) == !(x < y), and (x != y) == !(x == y). STL implementations commonly take advantage of this. Problem is, if your predicate isn't fully consistent, you can get bad behavior from garbled output to even memory corruption and a crash. One of the easiest ways to goof up is to implement (x (x < y) && (y < x). A more subtle way is to use floating point arithmetic such that the compiler generates code differently for two identical expressions, leading to order of evaluation induced errors.
Phaeron - 04 05 08 - 01:56
Correct me if I am wrong, but if you use chaining for collision resolution of the hash table then it makes sense to me to have the less than predicate for sorting the colliding entries (assuming you do not use linked lists for it).
macin - 04 05 08 - 06:16
The Dinkumware hash container maintains the buckets as iterators within a master list, yeah, so any gain from sorting the chains is minimal at best. If the "chains" were maintained as sorted trees, then yeah, you'd have a safeguard against O(N) worst case collision behavior... but the complexity constant and memory overhead would be so much worse that I doubt anyone would want to use it. If you really need guaranteed super-linear behavior then it's best to just use an rbtree based map or set. Generally the idea with a hash container is simply to try to keep the bucket sizes from going above tiny single digits.
There's a peculiar feature in the Dinkumware implementation where it can supposedly do incremental rehashing in order to avoid one big hitch when the hash table overflows. Not worth the ~4x penalty over SGI-STL/STLport that I've seen in benchmarks, though.
Phaeron - 04 05 08 - 15:30
Your father was gifted--at least compared to my understanding of computers, hardware and software.
I am converting 8mm file to DVD. I project the 8mm film onto a screen, recording that visual on a 8mm digital camcorder, and then puting that into my computer to edit with Pinnacle STUDIO 9. I am using MPEG 2. Because the projector projects at a slower frame per second that the camcorder records there is at times a flicker. Upon reviewing the internet I found a reference to using the VirtualDub, the Deflicker filter, in removing or reducing the flicker. Are you familiar with this? Do you recommend it? What are the pro and cons?
Thank you so much for assisting this senior.
John Ogden - 21 05 08 - 18:04
I have started writing an image viewer which uses OpenGL. Problem was not loading textures or something graphics related but keeping a sorted image list up to date if files are added or removed from the watched folder. Suffice to say I still haven't decided what to use for such a list. I need it sorted in "human-alphabet" mode and I need speedy insertions and deletions. I also need to keep track of current image so I can do prev/next navigation. If you have any advice when it comes to those sets, maps and containers it would be very appreciated.
Igor Levicki (link) - 28 05 08 - 12:17
Your best bet is to follow the directions that come with the Deflicker filter -- I'm aware of it but haven't actually used it much. I do know that you do need to do an analysis pass for it to measure the flickering, following by a second pass to mitigate it.
The best way to handle a situation like that is to store your items with multiple keys. I think databases can do this fairly easily, but unfortunately container libraries typically can't, at least not efficiently. STL definitely can't. If you have intrusive containers, where you incorporate the nodes into your data items rather than having the container library aggregate them on for you, then it's possible to insert the same data item into multiple associative containers at the same time with different keys and sort/equality/hash predicates, which makes things easy because then it's O(1) to switch between the different iterators to the same item. Sadly, intrusive_list and intrusive_unordered_map aren't common containers.
Your best bet, I think, would be to abstract it and then see if perf becomes a problem. Container updates driven by user interaction and filesystem activity aren't likely to be much of a performance problem even up to 1K-10K entries, even if you use dumb linear search algorithms. One starting approach would be to keep your data items in a slot array with unique IDs and keep pairs of forward and reverse maps for all the indices you need, such as one keyed by filesystem entry, and another for the UI list. I doubt one or two extra O(log N) lookups when you need to remove an item would be a problem. This way you can also be flexible about at which layers you maintain the various mappings. For example, you don't necessarily want to insert items into the UI in sorted order as soon as they come in, as it can lead to surprises when the list updates right when the user is trying to select an item. Windows Explorer adds new items at the bottom until you manually refresh the pane.
Phaeron - 30 05 08 - 02:12
Avery, this is a bit different because there is no visible list. There is a list user navigates using PGUP/PGDN/HOME/END keys. That list is a list of images from the active folder sorted alphabetically. If you copy new items into that folder (that may happen asynchronously by another process such as download manager) they have to be inserted in the right postition in the list and the current index (the picture shown) has to be preserved. Same goes if images are deleted unless of course the currently shown image gets deleted too when the program should keep displaying it until the user manually advances to another image.
Then there is also a selection list. I have an option to tag images (using spacebar, they turn red) and copy or move them to another folder.
I got all tangled up attempting to solve all that. :-)
Igor Levicki (link) - 07 06 08 - 12:43