For some reason my MDF file is 154gigs, however, I only loaded 7 gigs worth of data from flat files. Why is the MDF file so much larger than the actual source data?
More info:
Only a few tables with ~25 million rows. No large varchar fields (biggest is 300, most are less than varchar(50). Not very wide tables < 20 columns. Also, none of the large tables are indexed yet. Tables with indexes have less than 1 million rows. I don't use char, only varchar for strings. Datatype is not the issue.
Turned out it was the log file, not the mdf file. The MDF file is actually 24gigs which seems more reasonable, however still big IMHO.
UPDATE:
I fixed the problem with the LDF (log) file by changing the recovery model from FULL to simple. This is okay because this server is only used for internal development and ETL processing. In addition, before changing to SIMPLE I had to shrink the LOG file. Shrinking is not recommended in most cases, however, this was one of those cases where the log file should have never grown so big and so fast. For further reading see this
Shrink usually to be avoided UNLESS the file became that size for some unexpected / unusual activity. Otherwise the file will just grow back to the same size, and repeated Grow / Shrink will fragment the file and upset the performance.
Also, if the reason that the MDF file is large is because of index rebuild (to optimise them) the act of SHRINK will fragment the indexes, and destroy the optimisation
If the file is big because of one-off activity then OK to Shrink, but only Shrink to the original intended size, not to minimum possible size. Or shrink to minimum size and check what it grows to after, say, one week and note that the growth is "normal requirements" and do not shrink smaller in future to avoid fragmentation issues.
Just in case this is useful for someone out there, found this query in dba.stackexchange, it uses the sys.dm_db_database_page_allocations which counts the number of pages per object, this includes internal storage and gives you a real overview of the spaced used by your database.
SELECT sch.[name], obj.[name], ISNULL(obj.[type_desc], N'TOTAL:') AS [type_desc],
COUNT(*) AS [ReservedPages],
(COUNT(*) * 8) AS [ReservedKB],
(COUNT(*) * 8) / 1024.0 AS [ReservedMB],
(COUNT(*) * 8) / 1024.0 / 1024.0 AS [ReservedGB]
FROM sys.dm_db_database_page_allocations(DB_ID(), NULL, NULL, NULL, DEFAULT) pa
INNER JOIN sys.all_objects obj
ON obj.[object_id] = pa.[object_id]
INNER JOIN sys.schemas sch
ON sch.[schema_id] = obj.[schema_id]
GROUP BY GROUPING SETS ((sch.[name], obj.[name], obj.[type_desc]), ())
ORDER BY [ReservedPages] DESC;
Thanks to Solomon Rutzky:
https://dba.stackexchange.com/questions/175649/sum-of-table-sizes-dont-match-with-mdf-size
sp_spaceused
? Did you check sys.allocation_units
? Did you guess?