Wednesday, March 10, 2010
SSIS Email Chain
11:19 |
Posted by
Sean McCown |
Edit Post
Today's blog is actually an email chain between me and a user. It's only a single question and reply, but I think it's good info.
Q:
Hey, would you agree that a monthly load process is better served as an SSIS – even if you have to push/pull from text files for now – than as a series of SPs or DLLs?
A:
if you're staying on the same box for the load then SPs can be an attractive offer because they're very fast and the memory stays in sql and can be managed quite well... if you move that process to ssis, and ssis is on the same box, then you have to allocate memory away from sql to run the pkg and house the data while in the buffer...
if ssis is on another box, but the data is still being moved to different dbs on the same box... so if the data is being moved from server1.db1 to server1.db2 and ssis is on server2, then you don't have to fight sql for memory, but now you incur the network cost of moving the data from the box, and then back to it...
if you're moving between boxes, then y, ssis is a better choice because in SPs you have to manage linked servers or openrowset to make that happen and that's not cricket...
however, what makes ssis attractive in the single box scenario is that it handles errors easier and alerting is much richer and easier to come by... you can also more easily fix problems in the data itself and it's easier to extend... so if your requirements change and you need to switch it to another box, or if you need to send a copy somewhere else, etc then that's much easier in ssis... ssis also gives you parallelism that you cant do in sps... you can load several tables at once in ssis where they have to be serialized in sps...
a good compromise in the single box scenario is to keep things moving like they are if possible, and where not stupid... so if they've already got an sp that inserts data into the 2nd db, then just call that sp from ssis... this way you get the insert and select process in sql where it belongs, and the workflow and error handling of ssis, and everything else that goes with it... if a single sp inserts several tables serially though, i'd create several sps and call them individually in ssis tasks... even if you have to keep them seialized (like for PK/FK issues)... because now you can even add things like checkpoints and individual error flows to your pkg steps and have a richer experience...
these things are meant to allow your process to grow and change much easier than is possible with an sp... so if one of the DBs gets too big and has to move to another box, it's a simple matter in ssis... but in an sp, that means you have to create and manage a linked server, which is just one more thing and they come with their own problems...
as for the DLLs... these processes NEVER belong in a DLL... not only do you have to steal memory from sql, but it's not optimized for data flow tasks... it's just a plain DB connection... you can have parallelism though by calling the windows threading model, but they have to manage those threads manually and mistakes can be made... and that takes extra memory, etc... not only that, but the code isn't available for admins to fix problems, and making changes to the compiled code can be dangerous, right... so no, that's not a good idea...
does this answer... i realize you prob wanted more of a yes, do this, but it's not that simple...
Q:
Hey, would you agree that a monthly load process is better served as an SSIS – even if you have to push/pull from text files for now – than as a series of SPs or DLLs?
A:
if you're staying on the same box for the load then SPs can be an attractive offer because they're very fast and the memory stays in sql and can be managed quite well... if you move that process to ssis, and ssis is on the same box, then you have to allocate memory away from sql to run the pkg and house the data while in the buffer...
if ssis is on another box, but the data is still being moved to different dbs on the same box... so if the data is being moved from server1.db1 to server1.db2 and ssis is on server2, then you don't have to fight sql for memory, but now you incur the network cost of moving the data from the box, and then back to it...
if you're moving between boxes, then y, ssis is a better choice because in SPs you have to manage linked servers or openrowset to make that happen and that's not cricket...
however, what makes ssis attractive in the single box scenario is that it handles errors easier and alerting is much richer and easier to come by... you can also more easily fix problems in the data itself and it's easier to extend... so if your requirements change and you need to switch it to another box, or if you need to send a copy somewhere else, etc then that's much easier in ssis... ssis also gives you parallelism that you cant do in sps... you can load several tables at once in ssis where they have to be serialized in sps...
a good compromise in the single box scenario is to keep things moving like they are if possible, and where not stupid... so if they've already got an sp that inserts data into the 2nd db, then just call that sp from ssis... this way you get the insert and select process in sql where it belongs, and the workflow and error handling of ssis, and everything else that goes with it... if a single sp inserts several tables serially though, i'd create several sps and call them individually in ssis tasks... even if you have to keep them seialized (like for PK/FK issues)... because now you can even add things like checkpoints and individual error flows to your pkg steps and have a richer experience...
these things are meant to allow your process to grow and change much easier than is possible with an sp... so if one of the DBs gets too big and has to move to another box, it's a simple matter in ssis... but in an sp, that means you have to create and manage a linked server, which is just one more thing and they come with their own problems...
as for the DLLs... these processes NEVER belong in a DLL... not only do you have to steal memory from sql, but it's not optimized for data flow tasks... it's just a plain DB connection... you can have parallelism though by calling the windows threading model, but they have to manage those threads manually and mistakes can be made... and that takes extra memory, etc... not only that, but the code isn't available for admins to fix problems, and making changes to the compiled code can be dangerous, right... so no, that's not a good idea...
does this answer... i realize you prob wanted more of a yes, do this, but it's not that simple...
Subscribe to:
Post Comments (Atom)
About Me
- Sean McCown
- I am a Contributing Editor for InfoWorld Magazine, and a frequent contributor to SQLServerCentral.com as well as SSWUG.org. I live with my wife and 3 kids, and have practiced and taught Kenpo for 22yrs now.
Labels
Blogumulus by Roy Tanck and Amanda Fazani
0 comments:
Post a Comment