Mid Century Furniture | Accessories | Design
We specialize unique, one of a kind and rare Mid Century and Danish Modern furniture and accessories for your home. Design services and Home Staging are also available on request by our design professionals.
In the last 5-10 years we have experienced the development of what we call modern data stack today. In this context, n In today's video I would We are presenting some of the most common approaches, which are often chosen, and finally give you some considerations in hand, which you should consider when deciding on a job for your team. Today's video is sponsored by Census, one of the leading reverse ETL platforms on the market. Census is a central component of modern stacks. You can connect Census with your data warehouse and synchronize your proven data warehouse data into the same tool with all of its business applications. Why is data modeling so important? More than ever, today, we have so many different data sources available that companies use that it is important to realize these. As an engineer, you can best use a good data modeling strategy. Otherwise, you will be bombarded with a lot of information and do not know how to deal with it. The sad reality is that engineers often confront with so many sources that they just throw them over the fence so that they have to deal with them. Without a good strategy for the process, the whole thing can become very overwhelming. Secondly, there are a lot of data cons Probably more than ever. People have very different expectations of their data and how they want to use it. Without a good organization model, it is a bit more complicated to answer all these different questions or at least to have the certainty that this is possible. The next point is optimization. I talk about speed and costs both. The line of lines for data-based conventional data banks and modern cloud split data banks are less common. Since we are talking about the modern data stack, we mainly talk about cloud computing. In this case, the most expensive component is not the storage, but the data processing. Therefore, it is important to pay attention to which kind of requests you create and what you use for it. Of course, it can work much more than these conventional n But there is still a limit. With a good data model, you can compare and avoid these, which leads to many fallen requests. The third and most important thing is my opinion on the mental clarity that they get through a certain data model. It offers a clear strategy to which everyone keeps and the rules for the construction of tables or the writing of requests and so on are clear from the beginning. For the same reasons, it will also be easier to work as a member, especially if you choose a well-known and popular approach that the employees can understand. There are a n In this section, however, we introduce four of the most common approaches. First of all, we have the so-called normalized modeling. This was known by Bill Inman and has been around for a long time. In this scenario, they have followed their sources from a staging layer. This warehouse is so designed that it is the actual source system that is responsible for everything. That is normalized, D, H, there is no data redundancy. In addition, many connections are usually required to complete a final stage. Another disadvantage is that they change from this data warehouse into individual data formats. Therefore, they are often seen as different data formats for different departments, for example one for finances, one for personal resources and one for the operational area. Each of these departments has its own requirements for the report. The advantage is that the data warehouse represents a real source of truth. Because each of these sources represents exactly. The disadvantage is that many connections are required, which are for modern, split-based data warehouses. In addition, the possibility of data formats, which are in conflict with each other, depending on how they are designed, is present. In addition, you do not have access to everything from a place. Next, we have the so-called denormalized modeling, which you may know as dimensional modeling or from which you may have heard of. This was made famous by Ralph Kimmel. The main difference is that you use a so-called star scheme here. The idea behind it is that you do not have individual normalized tables, where everything is based on a single source table, but instead of using each of these tables as a split-based, denormalized model. This gives you a certain data redundancy in your warehouse. As soon as you reach your analysis level, there are fewer connections required to get the desired data and they can call everything out of a single source of truth. The other difference is that this method should be conceived on the basis of the business function in contrast to the in-man method, which concentrates on the source data. This is in my opinion the strategy that is spread the most, whose loyalty was a little questioned in the modern stack to determine whether this is still necessary for all these things. Another common approach is known as Data Vault. As you can see, this is a somewhat more complex approach. They still have their sources and their staging, but these are divided by so-called Hubs, links and satellites. The Hubs and links contain essentially important metadata and the satellites contain more detailed values and more context. One of the main goals and advantages of Data Vault is that they take their source data and load them into the raw data storage, which doesn't involve any future logic. It is simply a representation of their sources, without adding transformations. They take all the data and arrange them accordingly. The business vault, on the other hand, takes these, adds a few subtleties to the transformations, but follows the same algorithm that is used in the Hubs. This design is not suitable for analytical reports. It serves as a storage and organization of data. This is often used as a presentation layer, which is a kind of dimensional model that can then be used for analysis. The advantage is that it is very clear and a clear strategy for what is laid out. The disadvantage is that it is quite complicated and actually designed for environments with many different sources. The last one here is the Hubs. We could also call it OBT. This is one of the newer applications that you will see. Here you can directly access the very broad denormalized models. The most important theories behind this are that first, storage is really cheap, so that we can have very extensive tables, and second, the computing performance of modern cloud databases is so good that we can write requests and design things so that we don't have to really save them separately. We can directly take them from the staging area into the final reporting map that we need. In the middle you can see a intermediate layer, but that are often only short-lived concepts or things that are not really prepared as materialized tables. That's why I have them here. They are only there to support the following map, instead of being used again at several places. The advantage is that it is clearer that it is easier. They directly get from the staging area to the map and are then able to store them for March and are very quickly ready to use. The disadvantage is that the cost of the computer can rise extremely if you don't pay attention. Because each of these requests is written as extensive requests. In each single pipeline, we call it that, there is a lot of business logic. The more complicated these will be, the more expensive these calculations will be. It may still be okay at the beginning, but if you scale and have a really big infrastructure, it can be quite quickly impossible. In addition, the data redundancy. Of course, it is about individual Mars. And as a last one, I would Some of the other options offer you the option to take up modeling in the middle. With this option, you have to go very strategically with the modeling of the individual levels , because you otherwise have to be bombarded with data and have to create your own pipelines. So you should be careful that you don't let yourself be too much attacked and just take everything and create pipelines, because this can quickly be controlled at large scale. So you have to be careful. Now let's talk about some things that you should consider when modeling data. Many of the new tools and technologies in the modern stack let the debate lose power and save a little bit of meaning. But my opinion is that organization and spiritual clarity are a huge part of modeling data. They also offer you a good basis for the transition with all the different types of data sources that come to you. I think it's really crazy to want to go fast and just create models very quickly. In the long run, however, this leads to many problems that are avoided if you just go slower and build your data warehouse strategically. At the same time, it is important to take up the concept of a large tableau of the new technologies, which is usually followed by a hybrid approach. For example, you can use the broad data margins of the approach with a large tableau, but basically add the modeling and structure of a star screen. This is how you get a result I think that's very widespread and I've seen it a lot. It's important that you keep an eye on that this model is not only intended for individual users or report tools. The modern stack has brought a completely new concept, namely reverse ETL. And that's exactly where the MS layer is important in my opinion. You should be able to see all possible scenarios. The disadvantage is that you can double the storage space, so to speak. It's kind of unnecessary to have this additional layer. But at the same time, the storage is a less important factor. So that might not be so bad. The advantage is that you get a basic modeling approach behind these data sets. For the end user it's the same, but for the engineer there is a possibility and a structure to model the data. In addition, the potential problem is reduced, that it comes to complex questions and the process is divided into a few, so that it becomes a little more transparent in this regard. It is also important to consider the team dynamics and the responsibilities of their modeling decisions. Things have developed and the different professional names. Things are slowly disappearing. Let's take this example, whether it is about this approach or not. Their data engineers could focus on the data modeling and then analyze engineers or analysts or someone else on the data and the analysis. And both can be cut into similar points. It doesn't necessarily have to be this exact approach, but you can draw something If you make your modeling decision, instead of just starting quickly and finding everything out of the stack, please let me know in the comments how your experiences were. Which modeling approach did you choose? What do you recommend? What did you see? I would And as always,