Author Topic: Interesting data structures and algorithm you may not have heard about (Read 5879 times)

harold · « **on:** March 05, 2013, 02:39:18 pm »

I'll start out with some stuff I actually use and may not be widely known: ("widely known" is stuff such as red-black trees, array-based heaps, circular buffers, hash tables, linked lists, you name it - first year CS material)

Binary Decision Diagram solves all sorts of problems that you thought were hard or impossible, such as: boolean function equivalence, boolean satisfiability (and in fact the number of solutions, in time independent on the result) and Linear Boolean Programming. I've used BDDs with great success in many Project Euler problems (usually the "count solutions" part of BDDs) that were designed to require some mathematical insight. Great for "brute force - but smart".
Zero Suppressed BDD's - a variation of BDDs especially suited to sparse functions, such as when solving some Exact Cover (or regular Set Cover) problems (not when choosing a set has very non-local results, such as with Sudoku) and you can actually get a count of solutions, pick a uniformly distributed random solution or a solution that optimizes the sum of weights of the chosen sets. Also good for representing all Hamiltonian paths though a graph, which you can use to solve some nontrivial instances or the Traveling Salesperson Problem.
Dynamic Programming. Not an algorithm, but a class of algorithms so big that you have already used DP even if you didn't know it. For example, you know how to calculate Fibanacci numbers in linear time? That's DP. You've used Dijkstra's algorithm? DP. Learn about DP, it will come in handy. Maybe this shouldn't be on the list - almost everyone has heard about it.
Branch and Bound. The wiki entry is just a stub, so no link. The idea is simple enough, but powerful: you do a Depth First Search, but you stop going down the recursion tree when the solution can not possibly become better by going down this sub-tree than the best solution found so far. Especially good when combined with a fast approximation. This technique can save your ass when it looks like you are going to have to use actual brute force, but when the cost function is monotonic while going down a sub-tree. Often applicable to solving puzzle games. It may seen niche, but I've had to use this in real life software that is actually used by people.
Suffix Arrays, takes less space than a suffix tree, and useful for many of the same problems, such as many string problems (such as searching for substrings in sub-linear time). Other significant applications include a linear time Burrows–Wheeler transform (useful for data compression) and linear time Lempel Ziv decomposition (probably useful for LZ77/Deflate).
Miller-Rabin primality test for when trial division gets too slow. It will test 64bit numbers for primality in mere miliseconds.
Pollard's Rho algorithm for when trial division gets too slow and you actually need a factor, not just a boolean "IsPrime".

Let's get this going, I'm sure we can all learn something new and interesting from this thread if people start adding stuff to it.

Scipi · « **Reply #1 on:** March 05, 2013, 02:45:26 pm »

A lot of these I've never heard about.

I will have to take a look into understanding these and implementing them in a program or two

* (Sadly, HOMER-16 doesn't know any data structures or algorithms off the top of his head)

harold · « **Reply #2 on:** March 06, 2013, 04:25:58 am »

Unrolled Linked List. Despite theoretical advantages, linked lists almost always turn out to be very disappointing compared to ArrayLists. Unrolled linked lists fix that to a certain degree - sometimes they may even beat ArrayLists.
Table Based Huffman Decoding. The most well-known way (frequently given in textbooks, because it's The Obvious Way) is going down the huffman tree bit-by-bit until you reach a leaf. The table based method in the simplest way makes a single table lookup to decode a symbol. With this trick, the performance of Huffman decoding increases from "unacceptable" to "very good". Note that it requires canonical Huffman codes - which most compression formats use, so it's widely applicable.
K-d trees and variants, useful for various geometry problems. For example, I used them to fit textures varying in size widely together onto a texture atlas. That isn't optimal, but a lot better than a naive approach and a lot faster than an optimal approach (it runs every time when the game starts, and does not significantly increase the loading time).
Fermat Factorization, an other trick for when trial division is too slow. It has different performance characteristics than Pollard's Rho: Rho is very good for small factors, Fermat Factorization is good for big factors.

Come on guys, post some more

calc84maniac · « **Reply #3 on:** March 06, 2013, 12:58:06 pm »

Binary Indexed Tree, a data structure which represents an array that can both be modified in O(log n) time and retrieve the sum of any range of elements in O(log n) time. In addition, the storage requirements are no more than that of an array of the elements.

The alternatives to this method are simply keeping an array of the elements, which is O(1) for modification and O(n) for range sum, or keeping an array of the cumulative sum, which is O(n) for modification and O(1) for range sum. Binary indexed tree is the best of both methods at O(log n) for each operation, and it can be implemented in a very small amount of code.

Seriously small:

Code: [Select]

// Note that indices start at 1

// Get sum from elements 1 to k
int get_prefix_sum(int k) {
   int sum=0;
   for(; k; k-=(k&-k))
      sum += array[k-1];
   return sum;
}

// Add n to element k
void add_to_element(int k, int n) {
   for(; k<=array_size; k+=(k&-k))
      array[k-1] += n;
}

Edit: Also, here are some related helper functions.

Spoiler For Spoiler:

Code: [Select]

// Sum from element a to b
int range_sum(int a, int b) {
   return get_prefix_sum(b) - get_prefix_sum(a-1);
}

// Get element k
int get_element(int k) {
   return range_sum(k,k);
}

// Set element k to value n
void set_element(int k, int n) {
   add_to_element(k, n - get_element(k));
}

harold · « **Reply #4 on:** March 08, 2013, 04:34:48 am »

Very good, Binary Indexed Trees are awesome, simple but powerful

Here's some more stuff I find interesting:

Bloom Filters. I haven't really used them yet, but they look very useful when you want to do a quick but inaccurate test so you can save a large portion of slower but accurate tests - for example, google uses them in Chrome to pre-test a list of "dangerous" domains locally, so most queries don't have to go all the way to their servers.
The Hungarian Algorithm finds a "best assignment" of "tasks" to "actors". This is an important optimization for real life, as well as in many programs, including games (for example if you have a bunch of units and a bunch of tasks in an RTS, and you'd like to not just pick the nearest the unit for each tasks greedily but calculate the optimal way to assign the tasks). It looks simple, but sadly it has steps that are difficult to do nicely on a computer.
Interpolation Search is an adaptation of Binary Search that works extremely well when the items in the list are roughly uniformly distributed in their keyspace. Often neglected because it's not as "theoretically pretty" as basic Binary Search, but frequently a win in practice.

AngelFish · « **Reply #5 on:** March 08, 2013, 01:04:30 pm »

More of an idea than a specific algorithm, but Secure Multi-party Computation. Quite possibly one of my favorite pieces.

Blum Blum Shub (no free links, sorry). It's a cryptographic random number generator of the form: x_n+1 = x_n² mod M, where M is the product of two large primes. Each iteration can produce a max of two bits (!) of random numbers, which can be done via the following: (parity(x_n)*(x_n & 0x01))*(parity(x_n+1)*(x_n+1 & 0x01)). Alternatively, you can just take the straight parity or the least significant bit of the output and get one bit per iteration.

Scipi · « **Reply #6 on:** March 08, 2013, 01:41:55 pm »

Well, what would you know, I do have something to add here.

This is something I just found today and am trying out.

Neural Networks are similar to cellular automata in that they have emergent behaviors. Although the difference is that neural networks learn and evolve with each successive generation. They're useful in AI programming as well as pattern recognition.

Edit:

This article might be more relevant http://en.wikipedia.org/wiki/Artificial_neural_network

harold · « **Reply #7 on:** March 30, 2013, 02:56:41 pm »

Simulated Annealing is a nice trick that works very well for how simple it is (in it's basic form - it gets complicated when you make it fancy). It combines well with other tricks, since it's really more of a "framework-type algorithm" and not really a "mathy algorithm", so there are lots of "blanks" that you get to fill in. It's also attractive because it becomes arbitrarily good just by throwing more time at it - which means that on one side, you can make it quick if you don't really care that much about how good the answer is, or on the other side, you can make a very good answer just by waiting a long time. And it's the same code for both, so you can even make the speed a user input. That's a rare thing in algorithms.
A* is a very often a good way to speed up an exhaustive search. It's like BFS, but instead of a queue it uses a priority queue, sorted by "distance from root + estimated distance to goal". What makes it particularly useful is that it's guaranteed to find the optimal answer when the estimated distance to the goal is never higher than the actual distance (heuristics satisfying this condition are called "admissible"). The maximum over a bunch of admissible heuristics is still admissible, so you can often make the search faster by making up more heuristics (unless evaluating the heuristics takes too much time, of course). If your heuristic is accidentally not admissible, that's one of the worst bugs to diagnose, so be careful.
[I thought about putting A* in the list when I created the thread but had thought that too many people would already know it, but maybe not everyone does so here it is]

Author Topic: Interesting data structures and algorithm you may not have heard about (Read 5879 times)

harold

Interesting data structures and algorithm you may not have heard about

Scipi

Re: Interesting data structures and algorithm you may not have heard about

harold

Re: Interesting data structures and algorithm you may not have heard about

calc84maniac

Re: Interesting data structures and algorithm you may not have heard about

harold

Re: Interesting data structures and algorithm you may not have heard about

AngelFish

Re: Interesting data structures and algorithm you may not have heard about

Scipi

Re: Interesting data structures and algorithm you may not have heard about

harold

Re: Interesting data structures and algorithm you may not have heard about