We don't know what type of network, even. Or what type of data it is. For all I know, it's transmitting streaming video and nothing else. In which case, packets would be unnecessary. All I'm saying is: don't make assumptions. Or at least point out you're making the assumption. Something like, "assuming it's a packet-switched network, and it has a concept of 'MTU's, have you tried X?
Then the solution is simple: Open the connection and keep pushing random data down there in real time. Find out what what Network Layers are involved and what Protocols are involved so you can better understand how to "saturate" the link. Otherwise don't waste your time trying to figure out a solution that may not be one. Try Images especially Bit Mapped, or jpeg type if you don't want to check for the colour complexity of the image but strip out the EXIF.
NedFodder said :. If the expected error rate is close to zero, any of these will work. Even a simple XOR or additive checksum is probably plenty strong enough for this use case. Accumulating a simple checksum the width of a native machine word over data of that same width should fly like shit off a shovel - if I had a guarantee that the size of the data to be checked was always a multiple of the word size, I'd probably do that instead of even bothering with a library function.
Not my job. If I can break things through the API, I go to the firmware engineer and say "I broke your network layers and protocols, you fix it. Not your job as a developer. However, as a tester, you are responsible for identifying probable failure modes and testing those.
Adler and Fletcher are each just 4 lines of code. WTF do you need a library for? We want speed here, so just inline that crap wherever it needs to go.
What the Daily WTF? Register Login. What's the fastest hash or CRC algorithm? This topic has been deleted. Only users with topic management privileges can see it. Reply Quote 2 1 Reply Last reply. Reply Quote 0 1 Reply Last reply.
Gives me: Which hashing algorithm is best for uniqueness and speed? Fastest hash for non-cryptographic uses? Reply Quote 0 A 1 Reply Last reply. There also is a plain C port.
About bit support:. All the CityHash functions are tuned for bit processors. That said, they will run except for the new ones that use SSE4.
They won't be very fast though. You may want to use Murmur or something else in bit code. The individual plots only differ slightly in the reading method and can be ignored here, since all files were stored in a tmpfs.
Therefore the benchmark was not IO-bound if you are wondering. The assumption that cryptographic hash functions are more unique is wrong, and in fact it can be shown to be often backwards in practice. In truth:. Which means that a non-cryptographic hash function may well have fewer collisions than a cryptographic one for "good" data set—data sets that it was designed for. We can actually demonstrate this with the data in Ian Boyd's answer and a bit of math: the Birthday problem.
The formula for the expected number of colliding pairs if you pick n integers at random from the set [1, d] is this taken from Wikipedia :. Ian's tests mostly show results around that neighborhood, but with one dramatic exception: most of the functions got zero collisions in the consecutive numbers tests. The probability of choosing , bit numbers at random and getting zero collisions is about 0. And that's just for one function—here we have five distinct hash function families with zero collisions!
So what we're seeing here is that the hashes that Ian tested are interacting favorably with the consecutive numbers dataset—i. Side note: this means that Ian's graphical assessment that FNV-1a and MurmurHash2 "look random" to him in the numbers data set can be refuted from his own data. Zero collisions on a data set of that size, for both hash functions, is strikingly nonrandom! This is not a surprise because this is a desirable behavior for many uses of hash functions. This is a use where collision avoidance on likely inputs wins over random-like behavior.
Another instructive comparison here is the contrast in the design goals between CRC and cryptographic hash functions:. So for CRC it is again good to have fewer collisions than random in minimally different inputs. With crypto hashes, this is a no-no!
In fact, their speed can be a problem sometimes. In particular, a common technique for storing a password-derived token is to run a standard fast hash algorithm 10, times storing the hash of the hash of the hash of the hash of the Use SipHash. It has many desirable properties:. SipHash is a strong PRF pseudorandom function. This means that it is indistinguishable from a random function unless you know the bit secret key.
No need to worry about your hash table probes becoming linear time due to collisions. With SipHash, you know that you will get average-case performance on average, regardless of inputs.
If you receive a message and a SipHash tag, and the tag is the same as that from running SipHash with your secret key, then you know that whoever created the hash was also in possession of your secret key, and that neither the message nor the hash have been altered since. It depends on the data you are hashing. Some hashing works better with specific data like text. Some hashing algorithms were specificaly designed to be good for specific data. Paul Hsieh once made fast hash.
He lists source code and explanations. But it was already beaten. Java uses this simple multiply-and-add algorithm:. The hash value of the empty string is zero.
There are probably much better ones out there but this is fairly widespread and seems to be a good trade-off between speed and uniqueness.
First of all, why do you need to implement your own hashing? For most tasks you should get good results with data structures from a standard library, assuming there's an implementation available unless you're just doing this for your own education. As far as actual hashing algorithms go, my personal favorite is FNV. Sign up to join this community.
The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. Ask Question. Asked 10 years, 8 months ago. Active 5 months ago. Viewed k times. Improve this question. Sazzad Hissain Khan 2 2 silver badges 15 15 bronze badges. CRC is most an error detection method than a serious hash function. It helps in identify corrupting files rather than uniquely identify them.
If you don't have strong security needings you can choose MD5 that should be faster. If you are not worried about malicious collision that's it in which case Thomas' answer applies. As others have said, CRC doesn't guarantee absence of collisions. However, your problem is be solved simply by giving the files incrementing bit numbers.
This is guaranteed to never collide unless you want to keep gazillion of files in one directory which is not a good idea anyway. Even in software implementations, it can be useful as a means of detecting random corruption of data from hardware causes such as noisy communications line or unreliable flash media.
It is not tamper-resistant, nor is it generally suitable for testing whether two arbitrary files are likely to be the same: if each chunk of data in file is immediately followed by a CRC32 of that chunk some data formats do that , each chunk will have the same effect on the overall file's CRC as would a chunk of all zero bytes, regardless of what data was stored in that chunk.
If one has the means to calculate a CRC32 quickly, it might be helpful in conjunction with other checksum or hash methods, if different files that had identical CRC's would be likely to differ in one of the other hashes and vice versa, but on many machines other checksum or hash methods are likely to be easier to compute relative to the amount of protection they provide.
That php script runs several hashing algorithms and measures the time spent to calculate the hashes by each algorithm. It shows that MD5 is generally the fastest hashing algorithm around. So, anyway, if you want to do some quick error-detection, or look for random changes I would always advice to go with MD5, as it simply does it all. One man's common is another man's infrequent. Common varies depending on which field you are working in. If you are doing very quick transmissions or working out hash codes for small items, then CRCs are better since they are a lot faster and the chances of getting the same 16 or 32 bit CRC for wrong data are slim.
If it is megabytes of data, for instance, a linux iso, then you could lose a few megabytes and still end up with the same CRC. Not so likely with MD5. For that reason MD5 is normally used for huge transfers. It is slower but more reliable. So basically, if you are going to do one huge transmission and check at the end whether you have the correct result, use MD5. If you are going to transmit in small chunks, then use CRC.
How are we doing? Please help us improve Stack Overflow. Take our short survey. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Collectives on Stack Overflow. Learn more. Ask Question. Asked 8 years, 6 months ago. Active 1 year, 5 months ago. Viewed 64k times. Improve this question. They serve different purposes completely.
For one, CRCs don't avalanche, making them terrible hash functions: home.
0コメント