Java 的 UUID.randomUUID 有多好?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/2513573/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-13 08:41:40  来源:igfitidea点击:

How good is Java's UUID.randomUUID?

javauuid

提问by Alvin

I know that randomized UUIDshave a very, very, very low probability for collision in theory, but I am wondering, in practice, how good Java's randomUUID()is in terms of not having collision? Does anybody have any experience to share?

我知道随机UUID理论上发生碰撞的可能性非常非常非常低,但我想知道在实践中,JavarandomUUID()在没有碰撞方面有多好?有人有经验可以分享吗?

回答by sfussenegger

I'm not an expert, but I'd assume that enough smart people looked at Java's random number generator over the years. Hence, I'd also assume that random UUIDs are good. So you should really have the theoretical collision probability (which is about 1 : 3 × 10^38for all possible UUIDs. Does anybody know how this changes for random UUIDs only? Is it 1/(16*4)of the above?)

我不是专家,但我认为这些年来有足够聪明的人研究过 Java 的随机数生成器。因此,我还假设随机 UUID 是好的。所以你真的应该有理论碰撞概率(对于所有可能的 UUID ,它大约是 1 : 3 × 10^38。有谁知道这仅对随机 UUID 是如何变化的?是1/(16*4)上面的吗?)

From my practical experience, I've never seen any collisions so far. I'll probably have grown an astonishingly long beard the day I get my first one ;)

根据我的实际经验,到目前为止我从未见过任何碰撞。在我得到第一个胡须的那天,我可能会长出惊人的长胡须;)

回答by Stephen C

Does anybody have any experience to share?

有人有经验可以分享吗?

There are 2^122possible values for a type-4 UUID. (The spec says that you lose 2 bits for the type, and a further 4 bits for a version number.)

2^122类型 4 UUID有可能的值。(规范说类型丢失了 2 位,版本号又丢失了 4 位。)

Assuming that you were to generate 1 million random UUIDs a second, the chances of a duplicate occurring in your lifetime would be vanishingly small. And to detect the duplicate, you'd have to solve the problem of comparing 1 million new UUIDs per second against all of the UUIDs you have previously generated1!

假设您要每秒生成 100 万个随机 UUID,那么在您的一生中发生重复的可能性将非常小。为了检测重复,您必须解决每秒将 100 万个新 UUID 与您之前生成的所有UUID 进行比较的问题1

The chances that anyone has experienced (i.e. actually noticed) a duplicate in real life are even smaller than vanishingly small ... because of the practical difficulty of looking for collisions.

任何人在现实生活中经历(即实际注意到)重复的机会甚至比消失的小……因为寻找碰撞的实际困难。

Now of course, you will typically be using a pseudo-random number generator, not a source of truly random numbers. But I think we can be confident that if you are using a creditable provider for your cryptographic strength random numbers, then it willbe cryptographic strength, and the probability of repeats will be the same as for an ideal (non-biased) random number generator.

当然,现在您通常会使用伪随机数生成器,而不是真正随机数的来源。但我认为,我们可以相信,如果你正在使用你的加密强度随机数的可信供应商,那么它就会被加密强度,并重复的概率是相同的理想(不带偏见的)随机数发生器.

However, if you were to use a JVM with a "broken" crypto- random number generator, all bets are off. (And that might include some of the workarounds for "shortage of entropy" problems on some systems. Or the possibility that someone has tinkered with your JRE, either on your system or upstream.)

但是,如果您要使用带有“损坏的”加密随机数生成器的 JVM,那么所有赌注都将落空。(这可能包括针对某些系统上“熵不足”问题的一些解决方法。或者有人在您的系统或上游修改了您的 JRE 的可能性。)



1 - Assuming that you used "some kind of binary btree" as proposed by an anonymous commenter, each UUID is going to need O(NlogN)bits of RAM memory to represent Ndistinct UUIDs assuming low density and random distribution of the bits. Now multiply that by 1,000,000 and the number of seconds that you are going to run the experiment for. I don't think that is practical for the length of time needed to test for collisions of a high quality RNG. Not even with (hypothetical) clever representations.

1 - 假设您使用匿名评论者提出的“某种二进制 btree”,每个 UUID 将需要O(NlogN)RAM 内存位来表示N不同的 UUID,假设这些位的密度和随机分布。现在乘以 1,000,000 和您要运行实验的秒数。我认为这对于测试高质量 RNG 的碰撞所需的时间是不切实际的。甚至没有(假设的)巧妙的表示。

回答by Michael Borgwardt

UUID uses java.security.SecureRandom, which is supposed to be "cryptographically strong". While the actual implementation is not specified and can vary between JVMs (meaning that any concrete statements made are valid only for one specific JVM), it does mandate that the output must pass a statistical random number generator test.

UUID 使用java.security.SecureRandom,它应该是“加密强的”。虽然未指定实际实现并且可能因 JVM 而异(意味着所做的任何具体语句仅对一个特定的 JVM 有效),但它确实要求输出必须通过统计随机数生成器测试。

It's always possible for an implementation to contain subtle bugs that ruin all this (see OpenSSH key generation bug) but I don't think there's any concrete reason to worry about Java UUIDs's randomness.

实现总是可能包含破坏所有这些的微妙错误(请参阅 OpenSSH 密钥生成错误),但我认为没有任何具体理由担心 Java UUID 的随机性。

回答by sheki

Wikipedia has a very good answer http://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions

维基百科有一个很好的答案 http://en.wikipedia.org/wiki/Universally_unique_identifier#Collisions

the number of random version 4 UUIDs which need to be generated in order to have a 50% probability of at least one collision is 2.71 quintillion, computed as follows:

...

This number is equivalent to generating 1 billion UUIDs per second for about 85 years, and a file containing this many UUIDs, at 16 bytes per UUID, would be about 45 exabytes, many times larger than the largest databases currently in existence, which are on the order of hundreds of petabytes.

...

Thus, for there to be a one in a billion chance of duplication, 103 trillion version 4 UUIDs must be generated.

为了使至少发生一次碰撞的概率为 50%,需要生成的随机版本 4 UUID 的数量为 2.71 quintillion,计算如下:

...

这个数字相当于在大约 85 年内每秒生成 10 亿个 UUID,一个包含这么多 UUID 的文件,每个 UUID 16 个字节,大约为 45 艾字节,比目前存在的最大数据库大很多倍,这些数据库位于数百 PB 的数量级。

...

因此,要产生十亿分之一的重复机会,必须生成 103 万亿个第 4 版 UUID。

回答by Afsar

We have been using the Java's random UUID in our application for more than one year and that to very extensively. But we never come across of having collision.

我们已经在我们的应用程序中使用 Java 的随机 UUID 一年多了,而且非常广泛。但是我们从来没有遇到过碰撞。

回答by Giher

I play at lottery last year, and I've never won .... but it seems that there lottery has winners ...

我去年玩彩票,我从来没有赢过......但似乎彩票有中奖者......

doc : http://tools.ietf.org/html/rfc4122

文档:http: //tools.ietf.org/html/rfc4122

Type 1 : not implemented. collision are possible if the uuid is generated at the same moment. impl can be artificially a-synchronize in order to bypass this problem.

类型 1:未实施。如果 uuid 在同一时刻生成,则可能发生碰撞。impl 可以人为地 a-synchronize 以绕过这个问题。

Type 2 : never see a implementation.

类型 2:永远看不到实现。

Type 3 : md5 hash : collision possible (128 bits-2 technical bytes)

类型 3:md5 哈希:可能发生冲突(128 位 - 2 技术字节)

Type 4 : random : collision possible (as lottery). note that the jdk6 impl dont use a "true" secure random because the PRNG algorithm is not choose by developer and you can force system to use a "poor" PRNG algo. So your UUID is predictable.

类型 4:随机:可能发生碰撞(作为彩票)。请注意,jdk6 impl 不使用“真正的”安全随机数,因为 PRNG 算法不是由开发人员选择的,您可以强制系统使用“差”的 PRNG 算法。所以你的 UUID 是可预测的。

Type 5 : sha1 hash : not implemented : collision possible (160 bit-2 technical bytes)

类型 5:sha1 哈希:未实现:可能发生冲突(160 位 2 技术字节)

回答by Alex2Ustas

The original generation scheme for UUIDs was to concatenate the UUID version with the MAC address of the computer that is generating the UUID, and with the number of 100-nanosecond intervals since the adoption of the Gregorian calendar in the West. By representing a single point in space (the computer) and time (the number of intervals), the chance of a collision in values is effectively nil.

UUID 的原始生成方案是将 UUID 版本与生成 UUID 的计算机的 MAC 地址以及自西方采用公历以来的 100 纳秒间隔数连接起来。通过表示空间(计算机)和时间(间隔数)中的单个点,值发生冲突的可能性实际上为零。

回答by André Pinheiro

Since most answers focused on the theory I think I can add something to the discussion by giving a practical test I did. In my database I have around 4.5 million UUIDs generated using Java 8 UUID.randomUUID(). The following ones are just some I found out:

由于大多数答案都集中在理论上,我认为我可以通过进行实际测试来为讨论添加一些内容。在我的数据库中,我使用 Java 8 UUID.randomUUID() 生成了大约 450 万个 UUID。以下只是我发现的一些:

c0f55f62-b990-47bc-8caa-f42313669948

c0f55f62-b990-47bc-8caa-f42313669948

c0f55f62-e81e-4253-8299-00b4322829d5

c0f55f62-e81e-4253-8299-00b4322829d5

c0f55f62-4979-4e87-8cd9-1c556894e2bb

c0f55f62-4979-4e87-8cd9-1c556894e2bb


b9ea2498-fb32-40ef-91ef-0ba00060fe64


b9ea2498-fb32-40ef-91ef-0ba 00060fe64

be87a209-2114-45b3-9d5a-86d00060fe64

be87a209-2114-45b3-9d5a-86d 00060fe64


4a8a74a6-e972-4069-b480-bdea1177b21f


4a8a74a6-e972-4069-b480-b dea1177b21f

12fb4958-bee2-4c89-8cf8-edea1177b21f

12fb4958-bee2-4c89-8cf8-e dea1177b21f

If it was truly random, the probability of having these kind of similar UUIDs would be considerably low (see edit), since we're considering only 4.5 million entries. So, although this function is good, in terms of not having collisions, for me it doesn't seem thatgood as it would be in theory.

如果它真的是随机的,那么拥有这些类似 UUID 的可能性会相当低(请参阅编辑),因为我们只考虑了 450 万个条目。所以,虽然这个功能很好,但就没有碰撞而言,对我来说它似乎没有理论上的那么好。

Edit:

编辑

A lot of people seem to not understand this answer so I'll clarify my point: I know that the similarities are "small" and far from a full collision. However, I just wanted to compare the Java's UUID.randomUUID() with a true random number generator, which is the actual question.

很多人似乎不理解这个答案,所以我将澄清我的观点:我知道相似之处“很小”,远非完全冲突。但是,我只是想将 Java 的 UUID.randomUUID() 与真正的随机数生成器进行比较,这才是真正的问题。

In a true random number generator, the probability of the last case happening would be around = 0.007%. Therefore, I think my conclusion stands.

在真正的随机数生成器中,最后一种情况发生的概率约为= 0.007%。因此,我认为我的结论成立。

Formula is explained in this wiki article en.wikipedia.org/wiki/Birthday_problem

这篇 wiki 文章 en.wikipedia.org/wiki/Birthday_problem 中解释了公式

回答by rgoers

At a former employer we had a unique column that contained a random uuid. We got a collision the first week after it was deployed. Sure, the odds are low but they aren't zero. That is why Log4j 2 contains UuidUtil.getTimeBasedUuid. It will generate a UUID that is unique for 8,925 years so long as you don't generate more than 10,000 UUIDs/millisecond on a single server.

在一个前雇主,我们有一个包含随机 uuid 的独特列。我们在部署后的第一周发生了碰撞。当然,几率很低,但不是零。这就是 Log4j 2 包含 UuidUtil.getTimeBasedUuid 的原因。只要您在单个服务器上每秒生成的 UUID 不超过 10,000 个,它就会生成一个在 8,925 年内唯一的 UUID。

回答by erickson

Many of the answers discuss how many UUIDs would have to be generated to reach a 50% chance of a collision. But a 50%, 25%, or even 1% chance of collision is worthless for an application where collision must be (virtually) impossible.

许多答案讨论了必须生成多少 UUID 才能达到 50% 的碰撞几率。但是,对于必须(几乎)不可能发生碰撞的应用程序,50%、25% 甚至 1% 的碰撞几率是毫无价值的。

Do programmers routinely dismiss as "impossible" other events that can and do occur?

程序员是否经常将其他可能并且确实发生的事件视为“不可能”发生?

When we write data to a disk or memory and read it back again, we take for granted that the data are correct. We rely on the device's error correction to detect any corruption. But the chance of undetected errors is actually around 2-50.

当我们将数据写入磁盘或内存并再次读取时,我们理所当然地认为数据是正确的。我们依靠设备的纠错来检测任何损坏。但未检测到错误的几率实际上约为 2 -50

Wouldn't it make sense to apply a similar standard to random UUIDs? If you do, you will find that an "impossible" collision is possible in a collection of around 100 billion random UUIDs (236.5).

将类似的标准应用于随机 UUID 是否有意义?如果这样做,您会发现在大约 1000 亿个随机 UUID (2 36.5)的集合中可能发生“不可能”的碰撞。

This is an astronomical number, but applications like itemized billing in a national healthcare system, or logging high frequency sensor data on a large array of devices could definitely bump into these limits. If you are writing the next Hitchhiker's Guide to the Galaxy,don't try to assign UUIDs to each article!

这是一个天文数字,但诸如国家医疗保健系统中的逐项计费或在大量设备上记录高频传感器数据等应用程序肯定会遇到这些限制。如果您正在编写下一个银河系漫游指南,请不要尝试为每篇文章分配 UUID!