我作为sql server dba工作了8年多,管理和优化服务器的性能。在我的空闲时间,我想为宇宙和我的同事做一些有用的事情。这就是我们最终为sql server和azure 提供免费索引维护工具的方法。
理念
每隔一段时间,人们在处理他们的优先事项时,可能就像一个手指式电池 – 一个激励充电只持续一闪,然后一切都消失了。直到最近,我在这一生活观察中也不例外。我经常被想法创造属于我自己的想法所困扰,但优先级从一个变为另一个并且没有完成任何事情。
devart开发用于开发和管理sql server,mysql和oracle数据库的软件,对我的动机和专业成长产生了很大的影响。
在他们来之前,我对创建自己的产品的具体细节知之甚少,但在此过程中,我获得了很多关于sql server内部结构的知识。一年多以来,我一直致力于优化产品线中的查询,逐渐开始了解市场上哪些功能比另一种功能更受欢迎。
在某个阶段,制作一个新的利基产品的想法出现在我面前,但由于某些情况,这个想法没有成功。那时,基本上我没有为公司内部的新项目找到足够的资源而不影响核心业务。
在一个崭新的地方工作,并试图自己创建一个项目让我不断妥协。制造一个拥有所有花里胡哨的大产品的最初想法很快就会停止并逐渐转变为一个不同的方向 – 将计划的功能分解为单独的迷你工具并相互独立地实现它们。
因此,sql index manager诞生了,它是sql server和azure的免费索引维护工具。主要想法是将redgate和devart公司的商业替代品作为基础,并尝试在我自己的项目中改进其功能。
履行
口头上说,一切听起来都很简单……只需观看几个激励视频,打开“rocky balboa”模式,开始制作一款很酷的产品。但让我们面对音乐,一切都不那么乐观,因为在使用系统表函数时存在许多陷阱,sys.dm_db_index_physical_stats
同时,它是唯一可以从中获取有关索引碎片的最新信息的地方。
从开发的最初几天起,就有很好的机会在标准方案中制造沉闷的方式,并复制已经调试过的竞争应用程序的逻辑,同时添加一些自组织。但在分析了元数据的查询后,我想做一些更优化的事情,由于大公司的官僚主义,它们永远不会出现在他们的产品中。
在分析redgate sql索引管理器(v1.1.9.1378 – 每个用户155美元)时,您可以看到应用程序使用一种非常简单的方法:使用第一个查询,我们获得用户表和视图的列表,然后第二个,我们返回所选数据库中所有索引的列表。
select objects.name as tableorviewname , objects.object_id as tableorviewid , schemas.name as schemaname , cast(isnull(lobs.numlobs, 0) as bit) as containslobs , o.is_memory_optimized from sys.objects as objects join sys.schemas as schemas on schemas.schema_id = objects.schema_id left join ( select object_id , count(*) as numlobs from sys.columns with (nolock) where system_type_id in (34, 35, 99) or max_length = -1 group by object_id ) as lobs on objects.object_id = lobs.object_id left join sys.tables as o on o.object_id = objects.object_id where objects.type = 'u' or objects.type = 'v' select i.object_id as tableorviewid , i.name as indexname , i.index_id as indexid , i.allow_page_locks as allowpagelocks , p.partition_number as partitionnumber , cast((c.numpartitions - 1) as bit) as belongstopartitionedindex from sys.indexes as i join sys.partitions as p on p.index_id = i.index_id and p.object_id = i.object_id join ( select count(*) as numpartitions , object_id , index_id from sys.partitions group by object_id , index_id ) as c on c.index_id = i.index_id and c.object_id = i.object_id where i.index_id > 0 -- ignore heaps and i.is_disabled = 0 and i.is_hypothetical = 0
接下来,在while
每个索引分区的循环中,发送请求以确定其大小和碎片级别。在扫描结束时,客户端上会显示重量小于进入阈值的索引。
exec sp_executesql n' select index_id, avg_fragmentation_in_percent, page_count from sys.dm_db_index_physical_stats(@databaseid, @objectid, @indexid, @partitionnr, null)' , n'@databaseid int,@objectid int,@indexid int,@partitionnr int' , @databaseid = 7, @objectid = 2133582639, @indexid = 1, @partitionnr = 1 exec sp_executesql n' select index_id, avg_fragmentation_in_percent, page_count from sys.dm_db_index_physical_stats(@databaseid, @objectid, @indexid, @partitionnr, null)' , n'@databaseid int,@objectid int,@indexid int,@partitionnr int' , @databaseid = 7, @objectid = 2133582639, @indexid = 2, @partitionnr = 1 exec sp_executesql n' select index_id, avg_fragmentation_in_percent, page_count from sys.dm_db_index_physical_stats(@databaseid, @objectid, @indexid, @partitionnr, null)' , n'@databaseid int,@objectid int,@indexid int,@partitionnr int' , @databaseid = 7, @objectid = 2133582639, @indexid = 3, @partitionnr = 1
在分析此应用程序的逻辑时,您可能会发现各种缺点。例如,在发送请求之前,不会检查当前分区是否包含任何行以从扫描中排除空分区。
但是问题在另一个方面表现得更加尖锐 – 对服务器的请求数量大约等于来自的总行数sys.partitions
。鉴于真实数据库可以包含数万个分区,这种细微差别可能导致对服务器的大量类似请求。在数据库位于远程服务器上的情况下,由于每个请求的执行中的网络延迟增加,扫描时间将更长,即使是最简单的一个。
与redgate不同,由devart开发的类似产品 – 用于sql server的dbforge索引管理器(v1.10.38 – 每用户99美元)在一个大型查询中接收信息,然后在客户端上显示所有内容:
select schema_name(o.[schema_id]) as [schema_name] , o.name as parent_name , o.[type] as parent_type , i.name , i.type_desc , s.avg_fragmentation_in_percent , s.page_count , p.partition_number , p.[rows] , isnull(lob.is_lob_legacy, 0) as is_lob_legacy , isnull(lob.is_lob, 0) as is_lob , case when ds.[type] = 'ps' then 1 else 0 end as is_partitioned from sys.dm_db_index_physical_stats(db_id(), null, null, null, null) s join sys.partitions p on s.[object_id] = p.[object_id] and s.index_id = p.index_id and s.partition_number = p.partition_number join sys.indexes i on i.[object_id] = s.[object_id] and i.index_id = s.index_id left join ( select c.[object_id] , index_id = isnull(i.index_id, 1) , is_lob_legacy = max(case when c.system_type_id in (34, 35, 99) then 1 end) , is_lob = max(case when c.max_length = -1 then 1 end) from sys.columns c left join sys.index_columns i on c.[object_id] = i.[object_id] and c.column_id = i.column_id and i.index_id > 0 where c.system_type_id in (34, 35, 99) or c.max_length = -1 group by c.[object_id], i.index_id ) lob on lob.[object_id] = i.[object_id] and lob.index_id = i.index_id join sys.objects o on o.[object_id] = i.[object_id] join sys.data_spaces ds on i.data_space_id = ds.data_space_id where i.[type] in (1, 2) and i.is_disabled = 0 and i.is_hypothetical = 0 and s.index_level = 0 and s.alloc_unit_type_desc = 'in_row_data' and o.[type] in ('u', 'v')
消除了竞争产品中类似请求的面纱的主要问题,但是这种实现的缺点是没有额外的参数传递给sys.dm_db_index_physical_stats
可以限制对明显不必要的索引的扫描的函数。实际上,这会导致获取系统中所有索引的信息以及扫描阶段不必要的磁盘负载。
值得一提的是,从中获取的数据sys.dm_db_index_physical_stats
并未永久缓存在缓冲池中,因此在获取有关索引碎片的信息时最小化物理读取是我的应用程序开发过程中的优先任务之一。
经过多次实验,我设法将扫描分为两部分,将两种方法结合起来。最初,一个大型请求通过过滤那些未包含在过滤范围中的分区来预先确定分区的大小:
insert into #allocationunits (containerid, reservedpages, usedpages) select [container_id] , sum([total_pages]) , sum([used_pages]) from sys.allocation_units with(nolock) group by [container_id] having sum([total_pages]) between @minindexsize and @maxindexsize
接下来,我们只获取包含数据的分区,以避免从空索引中进行不必要的读取。
select [object_id] , [index_id] , [partition_id] , [partition_number] , [rows] , [data_compression] into #partitions from sys.partitions with(nolock) where [object_id] > 255 and [rows] > 0 and [object_id] not in (select * from #excludelist)
根据设置,仅获取用户想要分析的索引类型(支持堆,群集/非群集索引和列存储)。
insert into #indexes select objectid = i.[object_id] , indexid = i.index_id , indexname = i.[name] , pagescount = a.reservedpages , unusedpagescount = a.reservedpages - a.usedpages , partitionnumber = p.[partition_number] , rowscount = isnull(p.[rows], 0) , indextype = i.[type] , isallowpagelocks = i.[allow_page_locks] , dataspaceid = i.[data_space_id] , datacompression = p.[data_compression] , isunique = i.[is_unique] , ispk = i.[is_primary_key] , fillfactorvalue = i.[fill_factor] , isfiltered = i.[has_filter] from #allocationunits a join #partitions p on a.containerid = p.[partition_id] join sys.indexes i with(nolock) on i.[object_id] = p.[object_id] and p.[index_id] = i.[index_id] where i.[type] in (0, 1, 2, 5, 6) and i.[object_id] > 255
之后,我们添加了一些魔法,并且……对于所有小的索引,我们通过重复调用sys.dm_db_index_physical_stats
具有所有参数的完整指示的函数来确定碎片的级别。
insert into #fragmentation (objectid, indexid, partitionnumber, fragmentation) select i.objectid , i.indexid , i.partitionnumber , r.[avg_fragmentation_in_percent] from #indexes i cross apply sys.dm_db_index_physical_stats_ (@dbid, i.objectid, i.indexid, i.partitionnumber, 'limited') r where i.pagescount <= @predescribesize and r.[index_level] = 0 and r.[alloc_unit_type_desc] = 'in_row_data' and i.indextype in (0, 1, 2)
接下来,我们通过过滤掉额外的数据将所有可能的信息返回给客户端:
select i.objectid , i.indexid , i.indexname , objectname = o.[name] , schemaname = s.[name] , i.pagescount , i.unusedpagescount , i.partitionnumber , i.rowscount , i.indextype , i.isallowpagelocks , u.totalwrites , u.totalreads , u.totalseeks , u.totalscans , u.totallookups , u.lastusage , i.datacompression , f.fragmentation , indexstats = stats_date(i.objectid, i.indexid) , isloblegacy = isnull(lob.isloblegacy, 0) , islob = isnull(lob.islob, 0) , issparse = cast(case when p.objectid is null then 0 else 1 end as bit) , ispartitioned = cast(case when dds.[data_space_id] _ is not null then 1 else 0 end as bit) , filegroupname = fg.[name] , i.isunique , i.ispk , i.fillfactorvalue , i.isfiltered , a.indexcolumns , a.includedcolumns from #indexes i join sys.objects o with(nolock) on o.[object_id] = i.objectid join sys.schemas s with(nolock) on s.[schema_id] = o.[schema_id] left join #aggcolumns a on a.objectid = i.objectid and a.indexid = i.indexid left join #sparse p on p.objectid = i.objectid left join #fragmentation f on f.objectid = i.objectid and f.indexid = i.indexid and f.partitionnumber = i.partitionnumber left join ( select objectid = [object_id] , indexid = [index_id] , totalwrites = nullif([user_updates], 0) , totalreads = nullif([user_seeks] + [user_scans] + [user_lookups], 0) , totalseeks = nullif([user_seeks], 0) , totalscans = nullif([user_scans], 0) , totallookups = nullif([user_lookups], 0) , lastusage = ( select max(dt) from ( values ([last_user_seek]) , ([last_user_scan]) , ([last_user_lookup]) , ([last_user_update]) ) t(dt) ) from sys.dm_db_index_usage_stats with(nolock) where [database_id] = @dbid ) u on i.objectid = u.objectid and i.indexid = u.indexid left join #lob lob on lob.objectid = i.objectid and lob.indexid = i.indexid left join sys.destination_data_spaces dds with(nolock) _ on i.dataspaceid = dds.[partition_scheme_id] and i.partitionnumber = dds.[destination_id] join sys.filegroups fg with(nolock) _ on isnull(dds.[data_space_id], i.dataspaceid) = fg.[data_space_id] where o.[type] in ('v', 'u') and ( f.fragmentation >= @fragmentation or i.pagescount > @predescribesize or i.indextype in (5, 6) )
之后,点请求确定大型索引的碎片级别。
exec sp_executesql n' declare @dbid int = db_id() select [avg_fragmentation_in_percent] from sys.dm_db_index_physical_stats(@dbid, @objectid, @indexid, @partitionnumber, ''limited'') where [index_level] = 0 and [alloc_unit_type_desc] = ''in_row_data''' , n'@objectid int,@indexid int,@partitionnumber int' , @objectid = 1044198770, @indexid = 1, @partitionnumber = 1 exec sp_executesql n' declare @dbid int = db_id() select [avg_fragmentation_in_percent] from sys.dm_db_index_physical_stats(@dbid, @objectid, @indexid, @partitionnumber, ''limited'') where [index_level] = 0 and [alloc_unit_type_desc] = ''in_row_data''' , n'@objectid int,@indexid int,@partitionnumber int' , @objectid = 1552724584, @indexid = 0, @partitionnumber = 1
由于这种方法,在生成请求时,我设法解决了竞争对手应用程序中遇到的扫描性能问题。这可能是它的终结,但在开发过程中,逐渐出现了各种新的想法,这使得扩大我的产品的应用范围成为可能。
最初,实现了对使用的支持wait_at_low_priority
,然后可以使用data_compression
和fill_factor
重建索引。
该应用程序已被“撒上”以前未计划的功能,如维护列存储:
select * from ( select indexid = [index_id] , partitionnumber = [partition_number] , pagescount = sum([size_in_bytes]) / 8192 , unusedpagescount = isnull(sum(case when [state] = 1 _ then [size_in_bytes] end), 0) / 8192 , fragmentation = cast(isnull(sum(case when [state] = 1 _ then [size_in_bytes] end), 0) * 100. / sum([size_in_bytes]) as float) from sys.fn_column_store_row_groups(@objectid) group by [index_id] , [partition_number] ) t where fragmentation >= @fragmentation and pagescount between @minindexsize and @maxindexsize
或者根据以下信息创建非聚簇索引的能力dm_db_missing_index
:
select objectid = d.[object_id] , userimpact = gs.[avg_user_impact] , totalreads = gs.[user_seeks] + gs.[user_scans] , totalseeks = gs.[user_seeks] , totalscans = gs.[user_scans] , lastusage = isnull(gs.[last_user_scan], gs.[last_user_seek]) , indexcolumns = case when d.[equality_columns] is not null _and d.[inequality_columns] is not null then d.[equality_columns] + ', ' + d.[inequality_columns] when d.[equality_columns] is not null and d.[inequality_columns] is null then d.[equality_columns] else d.[inequality_columns] end , includedcolumns = d.[included_columns] from sys.dm_db_missing_index_groups g with(nolock) join sys.dm_db_missing_index_group_stats gs with(nolock) _ on gs.[group_handle] = g.[index_group_handle] join sys.dm_db_missing_index_details d with(nolock) _ on g.[index_handle] = d.[index_handle] where d.[database_id] = db_id()
结果和计划
关键的是,开发计划并没有就此结束,因为我渴望进一步开发这个应用程序。下一步是添加查找重复(已完成)或未使用索引的功能,以及实现对在sql server中维护统计信息的完全支持。
现在市场上有很多付费解决方案。我想相信,由于自由定位,更优化的查询以及各种有用的gismos的可用性,这个产品肯定会在日常任务中变得有用。