Making groups unique - fixing a duplicate key problem in SQL
I ran into a problem where updating a key column caused duplicate key errors. The table in question had several columns that formed the Primary Key. One of those columns was userd for a serial number making groups of keys unique.
Example
|
Group |
GroupSerial |
|
Apple |
1 |
|
Apple |
1 |
|
Pear |
1 |
|
Pear |
1 |
|
|
|
Here we see that each row is not unique.
So how can we fix this?
First we lay up the data like this, having a unique row number:
|
Row# |
Group |
GrpSerial |
GrpCnt |
GrpMax |
|
1 |
Apple |
1 |
|
|
|
2 |
Apple |
1 |
|
|
|
3 |
Pear |
1 |
|
|
|
4 |
Pear |
1 |
|
|
|
5 |
Pear |
1 |
|
|
Then we count the items in each group and set that value to each member of the group.
|
Row# |
Group |
GrpSerial |
GrpCnt |
GrpMax |
|
1 |
Apple |
1 |
2 |
|
|
2 |
Apple |
1 |
2 |
|
|
3 |
Pear |
1 |
3 |
|
|
4 |
Pear |
1 |
3 |
|
|
5 |
Pear |
1 |
3 |
|
Then we the the max row# for each group
|
Row# |
Group |
GrpSerial |
GrpCnt |
GrpMax |
|
1 |
Apple |
1 |
2 |
2 |
|
2 |
Apple |
1 |
2 |
2 |
|
3 |
Pear |
1 |
3 |
5 |
|
4 |
Pear |
1 |
3 |
5 |
|
5 |
Pear |
1 |
3 |
5 |
Now we can calculate the group serial for each row.
grp_serial = row_serial - grp_max + grp_cnt
|
Row# |
Group |
GrpSerial |
GrpCnt |
GrpMax |
|
1 |
Apple |
1 |
2 |
2 |
|
2 |
Apple |
2 |
2 |
2 |
|
3 |
Pear |
1 |
3 |
5 |
|
4 |
Pear |
2 |
3 |
5 |
|
5 |
Pear |
3 |
3 |
5 |
Simple as (apple) pie.
And here is an example in SQL:
1 /* 2 How to update a group_serial fixing uniqueness 3 */ 4 5 6 -- Create a table with a series of groups 7 drop TABLE matstest 8 go 9 10 CREATE TABLE matstest 11 ( 12 id numeric identity not null 13 ,row_serial int null 14 -- Value columns 15 ,grp int Null 16 ,grp_serial int null 17 -- helper columns 18 ,grp_cnt int null 19 ,grp_max int null 20 ) 21 go 22 23 -- having a unique index on grp,grp_serial would cause an error 24 truncate table matstest 25 insert into matstest (grp,grp_serial) values (1,1) 26 insert into matstest (grp,grp_serial) values (22,1) 27 insert into matstest (grp,grp_serial) values (22,1) 28 insert into matstest (grp,grp_serial) values (33,1) 29 insert into matstest (grp,grp_serial) values (33,1) 30 insert into matstest (grp,grp_serial) values (33,1) 31 32 -- Fake a high row_serial just to prove that low numbers 33 update matstest set row_serial = id +4311 34 35 -- Show the starting point 36 select * from matstest 37 38 -- 39 -- It is important to include all the keys that make up the group! In this example there is just one column (grp) 40 -- Find count per group (grp_count) 41 -- 42 select grp, count(*) as grp_cnt 43 into #tmp_grp_cnt 44 from matstest 45 group by grp 46 having count(*) > 1 -- Skip this to have count also on the groups with one row 47 48 update matstest 49 set grp_cnt = ga.grp_cnt 50 from matstest m 51 join #tmp_grp_cnt ga on ga.grp = m.grp 52 53 -- 54 -- Find max row row_serial number per group (grp_max) 55 -- 56 select grp, max(row_serial) as grp_max 57 into #tmp_grp_max 58 from matstest 59 group by grp 60 having count(*) > 1 61 62 update matstest 63 set grp_max = ga.grp_max 64 from matstest m 65 join #tmp_grp_max ga on ga.grp = m.grp 66 67 -- 68 -- Now we can calculate the serial number per group 69 -- Serial number per group = row_serial - (grp_max) + (grp_cnt) 70 -- 71 update matstest set grp_serial = row_serial - (grp_max) + (grp_cnt) where grp_cnt is not null 72 73 74 -- show the final result 75 select * from matstest 76 77 /* 78 79 -- Starting point 80 id row grp grps cnt max 81 1 1 4312 1 1 [NULL] [NULL] 82 2 2 4313 22 1 [NULL] [NULL] 83 3 3 4314 22 1 [NULL] [NULL] 84 4 4 4315 33 1 [NULL] [NULL] 85 5 5 4316 33 1 [NULL] [NULL] 86 6 6 4317 33 1 [NULL] [NULL] 87 88 89 -- Final result 90 id row grp grps cnt max 91 1 1 4312 1 1 [NULL] [NULL] 92 2 2 4313 22 1 2 4314 93 3 3 4314 22 2 2 4314 94 4 4 4315 33 1 3 4317 95 5 5 4316 33 2 3 4317 96 6 6 4317 33 3 3 4317 97 */