Teradata Vantage - MAPS: 1. Increasing the System Availability
This is Carrie Ballinger, I’m a Senior Technologist with the Technology & Innovation Office here at Teradata. That this is going to be the 1st of a sequence of 3 TechBytes: We’ll be talking about one of my favorite features in Teradata – the new Teradata MAPS feature. So let’s get started – As you probably know, Teradata is a parallel database. One of the things that is very important in the Teradata Database design is to ensure an even distribution of the data so that each parallel unit is working on about the same amount of data at the same time. One of the ways this happens is by means of a hash map. The hash map comes into play whenever a row is inserted into a table in a database. The row’s primary index is sent to a hashing algorithm and one of the outputs of the hashing algorithms is what we call a hashing bucket. That hash bucket goes through the hash map, and what comes out is the AMP that’s going to own the row. So the hash map is the means of associating a particular row to an AMP in the configuration. When you have a Teradata system – we are talking before you’ve expanded it – Let’s say in your Teradata system today, you got one hash map that essentially owns all user tables. We are calling it in this example TD_Map1. When you expand your system, you are going to get a second hash map that also will be covering your primary table data. But it includes more AMPs. It includes all the AMPs in the recently configured system. So now you got two of these maps that exist at the same time for your primary data. You have the choice when you extend your system whether or not to move the individual tables into the new hash map, or to leave them in the old hash map. The difference is – if you look at the slide, you can probably see it – TD_Map1 – your original hash map – covers the old AMPs: AMPs 0 through 87 in this case. After the reconfig you got more AMPs, and TD_MAP2 will cover all the AMPs – the old ones and the new ones. We call these types of maps “contiguous maps” because they cover a contiguous range of AMPs always starting with AMP 0. This capability of leaving the tables, or this option of leaving the tables, in the previous hash map after you’ve expanded the system is an important way of reducing the downtime during system expansion. Here you have an example of those 2 hash maps in a little more detail. Hash map 1 is the hash mapping used for the original TD_Map1 contiguous map. And when you move a table from one hash map to another, the rows are actually placed on different AMPs because that second hash map – the one that you are moving the table to – uses a different number of AMPs so the same primary index value will now be resulting in a hash map experience will point to a different AMP. So you can expect your rows to reside on different AMPs when you move a table from one contiguous hash map to another. When you first move to 16.10, before you’ve even expanded your system, you are going to see you have several different maps available to you. TD_MAP1, the purple map – showing you that spans across the AMPs in this graphic – That’s your TD_Map1 where your user tables are going to be if you chose not to move them during the reconfig. Then you are going to also see a dictionary map, TD_DataDictionaryMap, and that a map that is going to hold all of your DBC tables including things like ResUsage and DBQL tables. Then there’s a 3rd map which has internal purposes called TD_GlobalMap. And they all are going to be contiguous maps that cover the same number of AMPs. When you expand a system, you are going to get an extra TD_Map2 in there. So I want to go back this point about not needing or not having to experience and wait for the move of your tables across all the AMPs when you expand your hardware. The benefit here, of course, is reducing system downtime. And what happens is your administrator can then determine when to start moving those tables. We suggest that you have a goal of getting all of your tables moved into the larger contiguous map because you then benefit from the wider parallelism. But you can do this over several weekends or over several months, at times that are convenient for you. So how do you move your tables then, from TD_Map1 to into the new larger map, TD_Map2. This is done by means of a new version of the ALTER TABLE. And this ALTER TABLE, under the covers, is actually an INSERT-SELECT, which is really great for moving tables in this context because you can read the data in the source table while the target table is being loaded so availability of your data increases using this technique. When it’s complete, the ALTER TABLE will drop the source table and do whatever it has to update the dictionary and make sure that we now know that this is the table and its new instance; and it’s now on TD_Map2. There is a “Mover” Stored Procedure we have created for you that we recommend you use – I recommend you use – because it makes things a little easier and it does some of the few extra things for you – you don’t have to have all of your privileges exactly the way you would have to have under if you did the ALTER TABLE directly. So it’s easier with the privileges. And the Stored Procedure, the Mover Stored Procedure collects summary statistics for you. When the move is complete so your statistics will be up to date. I just want to emphasize there are two major benefits here when you choose the option of moving the tables after the expansion is complete. The one is during the expansion, instead of taking hours and hours and hours, or possibly days, to get the reconfig utility to move the data for you into the new map, you now can bring the system up sometimes in a matter of minutes or hours, and the data is available to you for update, insert, or delete – all kinds of processing you might require. And the second advantage is when you are moving the tables yourself using the new version of the ALTER TABLE, you are doing an INSERT_SELECT so you have, as I mentioned earlier, the ability to read that data while it’s being moved. So those are both important in terms of accessing the data. We mentioned the ALTER TABLE, and we mentioned that there’s a stored procedure that you can use to take care of the moving of the tables for you. But really – Think about that store procedure is something that you might want to use occasionally as one-offs, for special table moves. For most of the tables, you are going to want to use the Viewpoint MAPS Manager portlet. This portlet makes the moving not only simple for you but it gives recommendations. Recommendations of how to group tables when you move them, which tables you should move together. When you start using the MAPS Manager portlet, there is first an Analyze Job – and my suggestion is to scope your data – you probably got a lot of tables you are thinking about moving but think about a database at a time, and you can input that into the analysis job what scope of tables you’re interested in moving. And the Analysis Job will make recommendations about which maps those tables are appropriate to move to. And in my next section, I am going to talk about the Sparse Maps – but that’s one of the reasons that I mentioned that there may be more than one map that the Analyze considers when it looks at your tables that you want to move. It comes out with a list of recommended table moves, and you can review that and make sure that you want to move all of those tables. And then you simply start the Mover Job which is a second job within the Viewpoint portlet that does the actual moving. One of the really interesting things about the Mover Job is that you can set up parallel jobs – Tables that can be moved in parallel. And you do this by selecting a number of movers, or workers, that you want to be active for that Mover Job. If you select three workers, then you are going to have three tables moved in parallel. You can at any time increase or decrease the number of workers in that Mover Job. But this is all automated for you – What Viewpoint will do is each of the workers you have selected each of them will issue the same stored procedure but for a different table so you’re maximizing the parallelism of the move process if you want to get it done through the system faster. So let’s just summarize the benefits of the MAPS feature. As it relates to the availability of the data, the downtime when you expand your configuration – when you add nodes – has been significantly reduced if you choose the option to postpone the move of the tables. Greater availability of the data will take place for your end-users – Instead of waiting for hours or days, it will be the matter of minutes or may be a few hours to be able to access that data again. Then the repositioning of the tables across the larger number of AMPs can take place at your convenience at off-hours, on weekends, whenever it’s a helpful time for you to get that moved. And you can read the table data while it’s moved because the ALTER TABLE is under the covers an INSERT-SELECT statement which supports that. I’ll be talking in my one of my future segments on MAPS a little bit about how the processing power can be taken advantage of by the Optimizer even before you start moving tables. So please join me in part 2 and part 3 and hear more about the MAPS feature.