I ran into a problem where updating a key column caused duplicate key errors. The table in question had several columns that formed the Primary Key. One of those columns was userd for a serial number making groups of keys unique.
Example
|
Group
|
GroupSerial
|
|
Apple
|
1
|
|
Apple
|
1
|
|
Pear
|
1
|
|
Pear
|
1
|
|
|
|
Here we see that each row is not unique.
So how can we fix this?
First we lay up the data like this, having a unique row number:
|
Row#
|
Group
|
GrpSerial
|
GrpCnt
|
GrpMax
|
|
1
|
Apple
|
1
|
|
|
|
2
|
Apple
|
1
|
|
|
|
3
|
Pear
|
1
|
|
|
|
4
|
Pear
|
1
|
|
|
|
5
|
Pear
|
1
|
|
|
Then we count the items in each group and set that value to each member of the group.
|
Row#
|
Group
|
GrpSerial
|
GrpCnt
|
GrpMax
|
|
1
|
Apple
|
1
|
2
|
|
|
2
|
Apple
|
1
|
2
|
|
|
3
|
Pear
|
1
|
3
|
|
|
4
|
Pear
|
1
|
3
|
|
|
5
|
Pear
|
1
|
3
|
|
Then we the the max row# for each group
|
Row#
|
Group
|
GrpSerial
|
GrpCnt
|
GrpMax
|
|
1
|
Apple
|
1
|
2
|
2
|
|
2
|
Apple
|
1
|
2
|
2
|
|
3
|
Pear
|
1
|
3
|
5
|
|
4
|
Pear
|
1
|
3
|
5
|
|
5
|
Pear
|
1
|
3
|
5
|
Now we can calculate the group serial for each row.
grp_serial = row_serial - grp_max + grp_cnt
|
Row#
|
Group
|
GrpSerial
|
GrpCnt
|
GrpMax
|
|
1
|
Apple
|
1
|
2
|
2
|
|
2
|
Apple
|
2
|
2
|
2
|
|
3
|
Pear
|
1
|
3
|
5
|
|
4
|
Pear
|
2
|
3
|
5
|
|
5
|
Pear
|
3
|
3
|
5
|
Simple as (apple) pie.
And here is an example in SQL:
1 /*
2 How to update a group_serial fixing uniqueness
3 */
4
5
6 -- Create a table with a series of groups
7 drop TABLE matstest
8 go
9
10 CREATE TABLE matstest
11 (
12 id numeric identity not null
13 ,row_serial int null
14 -- Value columns
15 ,grp int Null
16 ,grp_serial int null
17 -- helper columns
18 ,grp_cnt int null
19 ,grp_max int null
20 )
21 go
22
23 -- having a unique index on grp,grp_serial would cause an error
24 truncate table matstest
25 insert into matstest (grp,grp_serial) values (1,1)
26 insert into matstest (grp,grp_serial) values (22,1)
27 insert into matstest (grp,grp_serial) values (22,1)
28 insert into matstest (grp,grp_serial) values (33,1)
29 insert into matstest (grp,grp_serial) values (33,1)
30 insert into matstest (grp,grp_serial) values (33,1)
31
32 -- Fake a high row_serial just to prove that low numbers
33 update matstest set row_serial = id +4311
34
35 -- Show the starting point
36 select * from matstest
37
38 --
39 -- It is important to include all the keys that make up the group! In this example there is just one column (grp)
40 -- Find count per group (grp_count)
41 --
42 select grp, count(*) as grp_cnt
43 into #tmp_grp_cnt
44 from matstest
45 group by grp
46 having count(*) > 1 -- Skip this to have count also on the groups with one row
47
48 update matstest
49 set grp_cnt = ga.grp_cnt
50 from matstest m
51 join #tmp_grp_cnt ga on ga.grp = m.grp
52
53 --
54 -- Find max row row_serial number per group (grp_max)
55 --
56 select grp, max(row_serial) as grp_max
57 into #tmp_grp_max
58 from matstest
59 group by grp
60 having count(*) > 1
61
62 update matstest
63 set grp_max = ga.grp_max
64 from matstest m
65 join #tmp_grp_max ga on ga.grp = m.grp
66
67 --
68 -- Now we can calculate the serial number per group
69 -- Serial number per group = row_serial - (grp_max) + (grp_cnt)
70 --
71 update matstest set grp_serial = row_serial - (grp_max) + (grp_cnt) where grp_cnt is not null
72
73
74 -- show the final result
75 select * from matstest
76
77 /*
78
79 -- Starting point
80 id row grp grps cnt max
81 1 1 4312 1 1 [NULL] [NULL]
82 2 2 4313 22 1 [NULL] [NULL]
83 3 3 4314 22 1 [NULL] [NULL]
84 4 4 4315 33 1 [NULL] [NULL]
85 5 5 4316 33 1 [NULL] [NULL]
86 6 6 4317 33 1 [NULL] [NULL]
87
88
89 -- Final result
90 id row grp grps cnt max
91 1 1 4312 1 1 [NULL] [NULL]
92 2 2 4313 22 1 2 4314
93 3 3 4314 22 2 2 4314
94 4 4 4315 33 1 3 4317
95 5 5 4316 33 2 3 4317
96 6 6 4317 33 3 3 4317
97 */
May 7th, 2009 | Tags: SQL, sqlcode | Category: Programming, SQL | Leave a comment