您好,欢迎来到三六零分类信息网!老站,搜索引擎当天收录,欢迎发信息

SQL 中删除重复记录

2024/4/13 1:45:46发布18次查看
这是一篇在mssql server 2008中的一种删除重复记录的sql语句,有需要的朋友可以参考一下哦。
在database中可能由于某种原因如用户输入,导入数据失败等 导致了重复记录. 如果你没有用主键,约束,或来其它机制实现数据完整性,那最后总是重复记录在你的中.现在让我们来看在sql server 2008中如何删除这些记录, 首先,可以模拟造一些简单重复记录:
 代码如下 复制代码
create table dbo.employee
([id] int primary key ,
[name] varchar(50),
[age] int,
[sex] bit default 1)
insert into employee ([id] , [name] , [age] , [sex] ) values(1,'james',25,default)
insert into employee ([id] , [name] , [age] , [sex] ) values(2,'james',25,default)
insert into employee ([id] , [name] , [age] , [sex] ) values(3,'james',25,default)
insert into employee ([id] , [name] , [age] , [sex] ) values(4,'lisa',24,0)
insert into employee ([id] , [name] , [age] , [sex] ) values(5,'lisa',24,0)
insert into employee ([id] , [name] , [age] , [sex] ) values(6,'lisa',24,0)
insert into employee ([id] , [name] , [age] , [sex] ) values(7,'mirsa',23,0)
insert into employee ([id] , [name] , [age] , [sex] ) values(8,'mirsa',23,0)
insert into employee ([id] , [name] , [age] , [sex] ) values(9,'mirsa',23,0)
insert into employee ([id] , [name] , [age] , [sex] ) values(10,'john',26,default)
insert into employee ([id] , [name] , [age] , [sex] ) values(11,'abraham',28,default)
insert into employee ([id] , [name] , [age] , [sex] ) values(12,'lincoln',30,default)
select * from dbo.employee
首先我们使用最常见的方法:
 代码如下 复制代码
delete from employee where name in (
name
from employee group by name having count(name)>1);
接着使用rownumber():
 代码如下 复制代码
delete t from(
   select row_number() over(partition by [name] order by (select 0)) as rownumber,* from employee) t
where t.rownumber > 1;
with du as
(
  select row_number() over(partition by [name] order by (select 0)) as rn
  from employee
)
delete from dups
where rn>1;
with dups as
(
  select [id],[name],[age],[sex]
    , row_number() over(partition by [name] order by (select 0)) as rn
    ,rank() over(partition by [name] order by (select 0)) as rnk
  from employee
)
delete from dups
where rnrnk;
下面是这四个t-sql查询的执行计划:
你可以看到没有用cte的方法开销最大, 主要是在table spool, 这里开销了44%, table spool 是一个物理运算符。
table spool 运算符扫描输入,并将各行的一个副本放入隐藏的假脱机表中,此表存储在 tempdb 数据库中并且仅在查询的生存期内存在。如果重绕该运算符(例如通过 nested loops 运算符重绕),但不需要任何重新绑定,则将使用假脱机数据,而不用重新扫描输入。
注意上面的方法只是在重复记录比较少的情况下, 如果重复记录多. delete将会非常慢, 最好的方法是复制目标数据到另一个新表,删除原来的表,重命名新表为原来的表. 或用临时表, 这样还可以减少数据库事务日志. 看下面的t-sql:
 代码如下 复制代码
with dups as
(
  select [id],[name],[age],[sex]
    , row_number() over(partition by [id] order by (select 0)) as rn
  from employee
)
select [id],[name],[age],[sex]
into dbo.employeedupstmp
from dups
where rn=1
drop table dbo.employee;
exec sp_rename 'dbo.employeedupstmp','employee'
该用户其它信息

VIP推荐

免费发布信息,免费发布B2B信息网站平台 - 三六零分类信息网 沪ICP备09012988号-2
企业名录 Product