Blog


X-ROCKET to the moon


Felix Brunner


These are the voyages of the encoder model X-ROCKET. Its continuing mission: to explore strange, new time series; to seek out new explanations and new interpretations; to boldly seek meaning where no one has sought before. Previously in this series, we completed our training in the basics of time series classification in part one and learned how to operate X-ROCKET in part two . But enough with all the talking, it is time to fire up the X-ROCKET engines and see this model in action. Let’s rocket! Data, prepare for takeoff! We will use the “ AsphaltPavementTypeCoordinates ” dataset from Souza (2018) as an example. This dataset consists of 2,111 examples of accelerometer data recorded from cars passing over various types of pavement. Every time series example in the dataset has three channels (corresponding to the X, Y, and Z directions), each of which is measured at 100 Hz. The length of recordings varies from 66 time observations up to 2,371. The classes are “flexible” (38.6%), “cobblestone” (25.0%), and “dirt road” (36.4%). According to the description, the best model achieved an accuracy of 80.66% on this task, which we will use as a benchmark. So, Houston, we have our problem — a relatively balanced three-way multivariate time series classification problem, to be precise. The aeon module provides a simple way to load this dataset for our machine learning task. We will also use scikit-learn to follow the original authors and divide the full dataset into equally-sized train and test splits: from aeon.datasets import load_classification from sklearn.model_selection import train_test_split X, y, meta = load_classification("AsphaltPavementTypeCoordinates") X_train, X_test, y_train, y_test = train_test_split( X, z, test_size=0.5, random_state=0 ) How to build a ROCKET Next, let’s put together a suitable vessel to encode this dataset. Having installed the xrocket module with its dependencies in our environment, we can immediately import the full encoder module. Then, all we have to do is initialize an instance of it with suitable parameters for our problem. Since our dataset has three channels the choice of in_channels is clear. Next, as the time series length varies widely within our dataset, it makes sense to set max_kernel_span to a value suitable also for the shorter examples, let’s do 100 in this case. Finally, we leave combination_order and feature_cap at its default values of one and 10,000 for now: from xrocket import XRocket encoder = XRocket( in_channels=3, max_kernel_span=100, combination_order=1, feature_cap=10_000, ) Given these inputs, our encoder is automatically set up to have the usual 84 MiniROCKET kernels at 12 distinct dilation values. With three data channels, X-ROCKET chooses to use three pooling thresholds for each kernel-dilation-channel combination to stay within the feature_cap . Hence, the embedding dimension is 84 12 3 * 3 = 9,072. To finally prepare this contraption for boarding, all we have to do is find suitable values for the 9,072 pooling thresholds. We do this by fitting our XRocket instance to a data example. As the model operates on PyTorch tensors, where the first dimension is reserved for stacking multiple examples in a batch, all we have to do is transform the data from a 2D numpy array into a 3D tensor and feed it to the encoder : from torch import Tensor encoder.fit(Tensor(X_train[0]).unsqueeze(0)) Punch it! Now that our X-ROCKET is calibrated, let’s start the countdown. Again, inputs need to be in the 3D tensor format, so we need to transform the examples to PyTorch tensors before passing them to the model. Due to the varying time series lengths, we can not concatenate multiple examples into a batch so easily. Therefore it is more convenient to encode the examples one by one and collect the embeddings in two lists, one for the training set and one for the test set. Time to go to full thrust, godspeed! embed_train, embed_test = [], [] for x in X_train: embed_train.append(encoder(Tensor(x).unsqueeze(0))) for x in X_test: embed_test.append(encoder(Tensor(x).unsqueeze(0))) 8.02 seconds on a moderately fast consumer-grade CPU later, the embeddings of both the train and the test set are ready. That is, we now have a representation of the varying-size input data in fixed-dimensional vectors. Hence, the time has come to make this a tabular problem with named features stored in a DataFrame . The encoder provides the attribute feature_names that readily contains the names of each embedding value as a tuple of (pattern, dilation, channel, threshold). Let’s put these tuples in an index and name them accordingly. Then finally, we create the frames to store the transformed datasets. Who said time series classification had to be rocket science? from torch import concat import pandas as pd feature_names = pd.Index(encoder.feature_names) df_train = pd.DataFrame(data=concat(embed_train), columns=feature_names) df_test = pd.DataFrame(data=concat(embed_test), columns=feature_names) Giving X-ROCKET a purpose As with so many things in the universe, X-ROCKET struggles to find its way without a head. To make sure it can follow its trajectory to the intended destination — time series classification — let’s find a suitable prediction head that delivers the payload. As mentioned before, any prediction model that fits the intended purpose is fine in principle. Note that in theory, this also includes deep PyTorch feed-forward neural networks, which allow to run backpropagation end to end back to the X-ROCKET weights to improve its embeddings. But don’t panic, it is possible to find answers even without Deep Thought! Since we are eventually interested in the explainability of the predictions, let’s pick a simple and explainable classification model instead. Scikit-learn’s RandomForestClassifier is a solid start on that end, all we have to do is load it and fit it on our training data: from sklearn.ensemble import RandomForestClassifier clf = RandomForestClassifier(random_state=0) clf.fit(df_train, y_train) Wow, it almost went off like a rocket! Just 3.13 seconds later, we have our classifier. Let’s see how it does on the dataset. Since the original work claims to achieve 80.66% accuracy, let’s score our model in the same way on the hold-out set as they did: from sklearn.metrics import accuracy_score pred_test = clf.predict(df_test) acc_test = accuracy_score(y_test, pred_test) And there we have it, our model achieves an accuracy of 90.19% on the test set! Not bad, but is it enough to make a little rocket man proud? To conclusively answer that question, of course, more rigorous comparisons are warranted. Nevertheless, this appears to have been a successful launch! Where no ROCKET man has gone before The time has come to take X-ROCKET to the final frontier on its ultimate quest for meaning. Since the model seems to work acceptably well, it is valid to also analyze the explanations it provides about its predictions. Luckily, the random forest classifier we chose provides an attribute called feature_importances_ , which ascribes importance scores to all features of the model. Since we have stored the corresponding index in feature_names , we can easily bring together both arrays: feature_importances = pd.Series( data=clf.feature_importances_, index=feature_names, ) As it is, analyzing this object is only so useful. For example, we can see that the most important embedding for our model is the pattern HLHLLLHLL at dilation two in the Y-channel with pooling threshold of -10.84. An H in the pattern indicates a high value, while an L indicates a low such that the pattern looks something like |_|___|__ . However, it is now easy to pool importance values to examine the relative importances of, say, the input channels. Summing over each channel we get the importance scores below. Given the way X-ROCKET removes the randomness in the way the embeddings are put together, the same features are extracted from each channel and each dilation value. Hence, comparing grouped feature importances this way offers a fair comparison. Relative importances of the input channels for the predictions. That is, the Y-channel seems to be the clear favorite, followed by the X-channel. Similarly, if we sum over the various dilation values, a clear insight is that higher frequencies are the ones that matter. With entries being recorded at 100 Hz, a dilation value of 2 means a frequency of 50 Hz, for example. As can be seen in the image below, most information is contained in these higher frequencies, that is, the ones with smaller dilation values. Relative importances of various frequency dilations for the predictions. What did the doctor say to the ROCKET? “Time to get your booster shot!” Accordingly, one might wonder what could be ways to provide an extra performance boost to this rocket ship. In machine learning space, of course, the possibilities are endless. For example, one could try alternative model heads such as gradient boosting algorithms, or better optimize the corresponding hyperparameters. On a different route, one could think about how to improve the data quality or augment the existing dataset with artificial examples. However, this is beyond the scope of this simple demonstration. What would be interesting to see though, is if the encoder can be further improved to gain additional insight into the drivers of predictiveness when also considering multi-channel features besides the previously seen univariate ones. So let’s leave everything unchanged, but only alter the encoder by setting combination_order=2 and increase the number of features slightly with feature_cap=15_000 when initializing X-ROCKET. The resulting embedding is now 12,096-dimensional with 6 channel combinations instead of only the 3 channels, and 2 pooling thresholds for each output. Besides a slight increase in test set accuracy to 91.13% , we again observe that the Y-channel again seems to be the most important, but now combinations of Y with the other channels carry increased importances: Relative importance of input channel combinations for the predictions. Conclusions In this series of articles, we have seen how an existing time series encoder framework can be restructured to derive new insight into the prediction drivers. Part one has shed light on some of the advances in machine learning for the time series domain. Then, part two and this third part presented X-ROCKET, an explainable time series encoder, both technically and with a practical usage example. While this construct has completed its mission in the example here, it is important to point out that the explanations provided by X-ROCKET are only as good as the model’s prediction capabilities on the respective problem. That is, there is no point in interpreting a model that does not perform well enough in terms of its predictions. Hence, there is no guarantee that the same approach works equally well in different settings, in particular, if there is little signal in the input data. Nonetheless, rockets are cool, there is no getting around that! References Dempster, A., Schmidt, D. F., & Webb, G. I. (2021, August). Minirocket: A very fast (almost) deterministic transform for time series classification. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining (pp. 248–257). Souza, V. M. (2018). Asphalt pavement classification using smartphone accelerometer and complexity invariant distance. Engineering Applications of Artificial Intelligence , 74, 198–211. This article was created within the “AI-gent3D — AI-supported, generative 3D-Printing” project , funded by the German Federal Ministry of Education and Research (BMBF) with the funding reference 02P20A501 under the coordination of PTKA Karlsruhe.

Inside X-ROCKET: Explaining the explainable ROCKET


Felix Brunner


Welcome to the bridge, pilot! In this second part of our three-part journey, we will have a detailed look at the interior of the X-ROCKET implementation . After setting the stage of time series classification and a basic introduction of the ROCKET model in part one , this article provides a tour of the necessary mechanisms for explainable embeddings, before part three will launch X-ROCKET into a bumpy space race on a real dataset. The blueprint to explainability Again, our goal is to add explainability to a potent time series encoder, the ROCKET. One way to achieve this is by tracing each element of the embedding vectors to its origins and thereby attaching meaning to them. Put differently, if we manage to meaningfully name each embedding element, we effectively transform downstream tasks into tabular problems. Now with the complexities and nonlinearities of neural networks, this is usually easier said than done. In the case of ROCKET, however, the architecture is shallow enough to shed light on its inner workings with a little bit of engineering and trickery. More precisely, the MiniROCKET of Dempster et al. (2021) will serve as a starting point, to which we add transparency by fully backtracking its encoding mechanisms. While convolutions do not necessarily need to be implemented in a deep-learning framework, doing so can help computational speed by leveraging GPUs. Accordingly, there already exist good implementations of various ROCKET variants in Python. For example, the original authors’ numpy code is part of the sktime library, and tsai contains a GPU-ready PyTorch version of it. However, although these implementations are already computationally very efficient, our endeavors require a few changes that are more easily achieved after restructuring the model. Let’s dive more into the technical details of the X-ROCKET implementation . As mentioned before, ROCKET architectures resemble very simple CNNs, so why not also structure their implementation like a neural network? That is, let’s treat the steps of the calculation as layer objects and plug them together in line with the ideas behind ROCKET. More precisely, we define modules for each calculation step such that it is easier to understand the underlying computational graph. The diagram below schematically presents the full architecture of X-ROCKET. An input time series is served to several dilation blocks in parallel, each of which consists of a convolutional module, a channel mixing module, and a threshold pooling module. After processing the data sequentially in its submodules, each dilation block outputs a vector of embeddings. Finally, these embeddings are concatenated together to form the full X-ROCKET output embedding, which downstream models can pick up to produce a prediction — in our case a classification. Note that the interpretability of the final prediction depends on how explainable the downstream prediction model is. While explainable AI (XAI) is a very active field of research with a whole literature dedicated to making algorithms explainable, we will follow the original authors’ suggestion to use relatively simple prediction heads that are explainable without any additional sophistication. Full overview of the X-ROCKET architecture. In what follows, I provide a more detailed look at the various modules that make up X-ROCKET. ROCKET convolutions The first step in processing the data is by applying convolutional kernels that scan for fixed patterns in the data. As we are dealing with time series, 1-dimensional kernels are the appropriate choice. The drawing below illustrates how the convolutions are applied. Given a sequence of input data, convolutional kernels are applied by sliding them over the input and summing element-wise products in the respective window. Effectively, this scans the input for the prevalence of the respective pattern and results in an output that has the same shape as the input. Note how in the image below, the output sequence always has large values, when there is a peak in the input. Conversely, the output is negative if there is a dip in the input. This is due to the fact that in this example, the input is filtered for the pattern [-1, 2, -1] , which has the shape of a spike itself. X-ROCKET uses the same 84 filters with a length of nine values as suggested in Dempster et al. (2021) , but in contrast to the original authors, we always pad the inputs to obtain identical-length output sequences. To maintain explainability in this step, it is enough to store the kernels corresponding to each output sequence. Illustration of a 1D convolution. Channel mixing When dealing with multivariate time series, that is, time series with multiple channels, one might want to consider correlations of patterns in multiple channels. While the original implementation mainly focuses on the univariate case and suggests naïvely adding random combinations of ROCKET convolutions together, we want to provide a balanced comparison of features. Therefore, X-ROCKET removes the randomness and instead provides the option to expand the feature pool with channel combinations up to a chosen order. As an additional option, channels can be combined multiplicatively instead of additively for closer resemblance to the concept of a correlation. Explainability in this step is ensured by remembering the channels the mixed outputs are built with. Illustration of channel combinations. PPV threshold pooling The transformations up to this point have anything but reduced the size of the data. That is, applying multiple convolutional filters to each channel and adding combinations of the input channels on top of single-channel convolutional outputs results in a far greater number of equal-length output channels than were originally put in. Therefore, it is time to collapse the time dimension through a pooling mechanism. Following the original paper’s suggestions, X-ROCKET applies proportion-of-positive-values pooling (PPV). More precisely, the values in each intermediary channel are thresholded at one or more bias values per channel, where the bias values are automatically chosen based on representative examples in an initial fitting step. Then, PPV counts the fraction of values that surpass the respective threshold across the timeline. Finally, the resulting percentages directly serve as feature values in the embedding vector. Hence, for explainability, elements in the embedding can be unambiguously linked to a combination of convolutional kernel, one or more input channels, and a threshold value. Illustration of proportion-of-positive-values pooling via thresholds. Dilation blocks With the considered convolutional kernels only spanning nine observations, the capacity of the model is so far limited to detect a very narrow set of input characteristics. To change that, multiple dilation values are applied to identical kernels simultaneously to widen their receptive fields. X-ROCKET achieves this in practice by executing the aforementioned sequence of convolution, channel mixing, and PPV thresholding in multiple dilation blocks in parallel. In principle, dilations are a standard procedure in the context of CNNs, but most architectures only use a single value at each step. Having said that, a similar idea has recently shown promise to drastically improve the contextual capabilities of LLMs by enlarging context windows through dilated attention (see Ding et al. (2023) ). To better understand how filter dilation works, consider the drawing below. Applying a dilation value is spreading the kernel over a longer period of time, and thereby scanning lower frequencies for the respective patterns. For example, the resulting activation with a dilation value of two indicates the occurrence of the pattern at half the data frequency. For explainability, it is therefore important to store the dilation value corresponding to each embedding element as well. Illustration of frequency dilations. The full model Coming back to the full model, we can now put the pieces together. To initialize the encoder, we need to choose a few hyperparameters that determine the exact structure of the model. First, the number of input channels in_channels needs to be specified according to the number of channels in the data. Second, to automatically choose the dilation values to consider, the model requires to set an upper bound for the width of the convolutional receptive fields, called the max_kernel_span . Typically, X-ROCKET then picks 20–30 distinct frequencies to consider. Next, the combination_order determines how many channels are combined together when looking for correlations. By default, this keyword argument is set to 1 for simplicity. Finally, the feature_cap limits the dimensionality of the output to 10,000 features by default. X-ROCKET then builds the feature pool deterministically, that is, it is careful to include all channel-dilation-kernel combinations. Hence, the resulting number of features needs to be a multiple of all possible combinations and is not necessarily close to the specified value. If there is room within the feature cap, multiple thresholds are applied to each channel-dilation-kernel combination in the pooling step to create additional features. Finally, to turn the embeddings into predictions, the encoder needs to be combined with a prediction model. As we are interested in interpretability, explainable models are the suggested choice here. Having effectively structured the problem tabularly through the X-ROCKET encoder, many models for tabular data are valid candidates. For example, scikit-learn offers a large selection of insightful algorithms for tabular data. Similarly, gradient boosting algorithms such as XGBoost are high-performance alternatives. Note that standardizing the embedding vectors may be an essential intermediary processing step to ensure the interpretability of some of these prediction algorithms. Finally, with the X-ROCKET code living in the PyTorch framework, it is also easy to combine the encoder with a deep feed-forward neural network. However, anything beyond a single linear layer might again be difficult to interpret in this case. In the next and final part , I will show a simple usage example of the X-ROCKET implementation that also illustrates what kind of insight one can derive from X-ROCKET besides pure predictive performance. References Dempster, A., Schmidt, D. F., & Webb, G. I. (2021, August). Minirocket: A very fast (almost) deterministic transform for time series classification. In Proceedings of the 27th ACM SIGKDD conference on knowledge discovery & data mining (pp. 248–257). Ding, J., Ma, S., Dong, L., Zhang, X., Huang, S., Wang, W., & Wei, F. (2023). Longnet: Scaling transformers to 1,000,000,000 tokens. arXiv preprint arXiv:2307.02486 . Drawings were created in excalidraw . This article was created within the “AI-gent3D — AI-supported, generative 3D-Printing” project , funded by the German Federal Ministry of Education and Research (BMBF) with the funding reference 02P20A501 under the coordination of PTKA Karlsruhe.

Computer Vision

View more ->

Early Classification of Crop Fields through Satellite Image Time Series


Tiago Sanona


In a fast paced and always changing global economy the ability to classify crop fields via remote sensing at the end of a growth cycle does not provide the much needed immediate insight required by decision makers. To address this problem we developed a model that allows continuous classification of crop fields at any point in time and improves predictions as more data becomes available. In practice, we developed a single model capable of delivering predictions about which crops are growing at any point in time based on satellite data. The data available at the time of inference could be a few images at the beginning of the year or a full time series of images from a complete growing cycle. This exceeds the capabilities of current deep learning solutions that either only offer predictions at the end of the growing cycle or have to use multiple models that are specialized to return results from pre-specified points in time. This article details the key changes we employed to the model described in a previous blog post “Classification of Crop fields through Satellite Image Time Series” that enlarges its functionality and performance. The results presented in this article are based on a research paper recently published by dida. For more detailed information about this topic and other experiments on this model please check out the original manuscript: “Early Crop Classification via Multi-Modal Satellite Data Fusion and Temporal Attention” .

Leveraging Machine Learning for Environmental Protection


Edit Szügyi


Machine Learning has been solving complex problems for decades. Just think about how Computer Vision methods can reliably predict life-threatening diseases, self-driving cars are on their way to revolutionize traffic safety, or automatic translation gives us the ability to talk to just about anyone on the planet. The power of Machine Learning has been embraced by many branches of industry and science. There are some areas however where the potential of Machine Learning is harder to see and also less utilized. One of these is environmental protection. Protecting the natural environment is one of the biggest challenges our generation is facing, with pressing issues such as climate change, plastic pollution or resource depletion. Let us now look at how Machine Learning has been and can be used as a tool in environmental protection.

Introductions

View more ->

LLM strategies part 1: Possibilities of implementing Large Language Models in your organization


David Berscheid


Large Language Models (LLMs) are a highly discussed topic in current strategy meetings of organizations across all industries. This article is the first part of two, providing some guidelines for organizations to determine their LLM strategy. It will help you identify the strategy with the most benefits while finding ways of solving associated complexities. For more content on LLMs, see our LLM hub .

How ChatGPT is fine-tuned using Reinforcement Learning


Thanh Long Phan


At the end of 2022, OpenAI released ChatGPT (a Transformer-based language model) to the public. Although based on the already widely discussed GPT-3, it launched an unprecedented boom in generative AI. It is capable of generating human-like text and has a wide range of applications, including language translation, language modeling, and generating text for applications such as chatbots. Feel free to also read our introduction to LLMs . ChatGPT seems to be so powerful that many people consider it to be a substantial step towards artificial general intelligence. The main reason for the recent successes of language models such as ChatGPT lies in their size (in terms of trainable parameters). But making language models bigger does not inherently make them better at following a user's intent. A bigger model can also become more toxic and more likely to "hallucinate". To mitigate these issues and to more generally align models to user intentions, one option is to apply Reinforcement Learning. In this blog post, we will present an overview of the training process of ChatGPT, and have a closer look at the use of Reinforcement Learning in language modeling. Also interesting: Our aggregated collection of LLM content .

Natural Language Processing

View more ->

LLM strategies part 1: Possibilities of implementing Large Language Models in your organization


David Berscheid


Large Language Models (LLMs) are a highly discussed topic in current strategy meetings of organizations across all industries. This article is the first part of two, providing some guidelines for organizations to determine their LLM strategy. It will help you identify the strategy with the most benefits while finding ways of solving associated complexities. For more content on LLMs, see our LLM hub .

Extend the knowledge of your Large Language Model with RAG


Thanh Long Phan, Fabian Dechent


Large Language Models (LLMs) have rapidly gained popularity in Natural Language tasks due to their remarkable human-like ability to understand and generate text. Amidst great advances, there are still challenges to be solved on the way to building perfectly reliable assistants. LLMs are known to make up answers, often producing text that adheres to the expected style, but lacks accuracy or factual grounding. Generated words and phrases are chosen as they are likely to follow previous text, where the likelihood is adjusted to fit the training corpus as closely as possible. This gives rise to the possibility that a piece of information is outdated, if the corpus is not updated and the model retrained. Or that it is just factually incorrect, while the generated words have the quality of sounding correct and can be matched to the required genre. The core problem here is that the LLM does not know, what it does not know. In addition, even if a piece of information is correct, it is hard to track its source in order to enable fact-checking. In this article, we introduce RAG (Retrieval-Augmented Generation) as a method to address both problems and which thus aims to enhance the reliability and accuracy of information generated by LLMs.


Extracting information from technical drawings


Frank Weilandt (PhD)


Did you ever need to combine data about an object from two different sources, say, images and text? We are often facing such challenges during our work at dida. Here we present an example from the realm of technical drawings. Such drawings are used in many fields for specialists to share information. They consist of drawings that follow very specific guidelines so that every specialist can understand what is depicted on them. Normally, technical drawings are given in formats that allow indexing, such as svg, html, dwg, dwf, etc. but many, especially older ones, only exist in image format (jpeg, png, bmp, etc.), for example from book scans. This kind of drawings is hard to access automatically which makes its use hard and time consuming. In this regard, automatic detection tools could be used to facilitate the search. In this blogpost, we will demonstrate how both traditional and deep-learning based computer vision techniques can be applied for information extraction from exploded-view drawings. We assume that such a drawing is given together with some textual information for each object on the drawing. The objects can be identified by numbers connected to them. Here is a rather simple example of such a drawing: An electric drill machine. There are three key components on each drawing: The numbers, the objects and the auxiliary lines. The auxiliary lines are used to connect the objects to the numbers. The task at hand will be to find all objects of a certain kind / class over a large number of drawings , e.g. the socket with number 653 in the image above appears in several drawings and even in drawings from other manufacturers. This is a typical classification task, but with a caveat: Since there is additional information for each object accessible through the numbers, we need to assign each number on the image to the corresponding object first. Next we describe this auxiliary task can be solved by using traditional computer vision techniques.

21 questions we ask our clients: Starting a successful ML project


Emilius Richter


Automating processes using machine learning (ML) algorithms can increase the efficiency of a system beyond human capacity and thus becomes more and more popular in many industries. But between an idea and a well-defined project there are several points that need to be considered in order to properly assess the economic potential and technical complexity of the project. Especially for companies like dida that offer custom workflow automation software, a well-prepared project helps to quickly assess the feasibility and the overall technical complexity of the project goals -which, in turn, makes it possible to deliver software that fulfills the client's requirements. In this article, we discuss which topics should be considered in advance and why the questions we ask are important to start a successful ML software project.

Remote Sensing

View more ->

Early Classification of Crop Fields through Satellite Image Time Series


Tiago Sanona


In a fast paced and always changing global economy the ability to classify crop fields via remote sensing at the end of a growth cycle does not provide the much needed immediate insight required by decision makers. To address this problem we developed a model that allows continuous classification of crop fields at any point in time and improves predictions as more data becomes available. In practice, we developed a single model capable of delivering predictions about which crops are growing at any point in time based on satellite data. The data available at the time of inference could be a few images at the beginning of the year or a full time series of images from a complete growing cycle. This exceeds the capabilities of current deep learning solutions that either only offer predictions at the end of the growing cycle or have to use multiple models that are specialized to return results from pre-specified points in time. This article details the key changes we employed to the model described in a previous blog post “Classification of Crop fields through Satellite Image Time Series” that enlarges its functionality and performance. The results presented in this article are based on a research paper recently published by dida. For more detailed information about this topic and other experiments on this model please check out the original manuscript: “Early Crop Classification via Multi-Modal Satellite Data Fusion and Temporal Attention” .

The best (Python) tools for remote sensing


Emilius Richter


An estimated number of 906 Earth observation satellites are currently in orbit, providing science and industry with many terabytes of data every day. The satellites operate with both radar as well as optical sensors and cover different spectral ranges with varying spectral, spatial, and temporal resolutions. Due to this broad spectrum of geospatial data, it is possible to find new applications for remote sensing methods in many industrial and governmental institutions. On our website, you can find some projects in which we have successfully used satellite data and possible use cases of remote sensing methods for various industries . Well-known satellite systems and programs include Sentinel-1 (radar) and Sentinel-2 (optical) from ESA, Landsat (optical) from NASA, TerraSAR-X and TanDEM-X (both radar) from DLR, and PlanetScope (optical) from Planet. There are basically two types of geospatial data: raster data and vector data . Raster data Raster data are a grid of regularly spaced pixels, where each pixel is associated with a geographic location, and are represented as a matrix. The pixel values depend on the type of information that is stored, e.g., brightness values for digital images or temperature values for thermal images. The size of the pixels also determines the spatial resolution of the raster. Geospatial raster data are thus used to represent satellite imagery. Raster images usually contain several bands or channels, e.g. a red, green, and blue channel. In satellite data, there are also often infrared and/or ultraviolet bands. Vector data Vector data represent geographic features on the earth's surface, such as cities, country borders, roads, bodies of water, property rights, etc.. Such features are represented by one or more connected vertices, where a vertex defines a position in space by x-, y- and z-values. A single vertex is a point, multiple connected vertices are a line, and multiple (>3) connected and closed vertices are called polygons. The x-, y-, and z-values are always related to the corresponding coordinate reference system (CRS) that is stored in vector files as meta information. The most common file formats for vector data are GeoJSON, KML, and SHAPEFILE. In order to process and analyze these data, various tools are required. In the following, I will present the tools we at dida have had the best experience with and which are regularly used in our remote sensing projects. I present the tools one by one, grouped into the following sections: Requesting satellite data EOBrowser Sentinelsat Sentinelhub Processing raster data Rasterio Pyproj SNAP pyroSAR Rioxarray Processing vector data Shapely Python-geojson Geojson.io Geopandas Fiona Providing geospatial data QGIS GeoServer Leafmap Processing meteorological satellite data Wetterdienst Wradlib

Software Development

View more ->

Managing layered requirements with pip-tools


Augusto Stoffel (PhD)


When building Python applications for production, it's good practice to pin all dependency versions, a process also known as “freezing the requirements”. This makes the deployments reproducible and predictable. (For libraries and user applications, the needs are quite different; in this case, one should support a large range of versions for each dependency, in order to reduce the potential for conflicts.) In this post, we explain how to manage a layered requirements setup without forgoing the improved conflict resolution algorithm introduced recently in pip. We provide a Makefile that you can use right away in any of your projects!

Project proposals - the first step to a successful ML project


Emilius Richter


Many machine learning (ML) projects are doomed to fail. This can be due to various reasons and often they occur in combination. To avoid failure, all involved stakeholders need to understand the technical and organizational requirements of the project. Besides all preliminary discussions that define the project, it is important to summarize the project-relevant information in a comprehensive proposal. It should cover the technical and organizational requirements, possible problem areas and technical restrictions. In this article, I will describe the most important modules in machine learning project proposals. For a software provider like dida, the project proposal is the first step towards meeting the needs of the customer.

Talks & Events

View more ->


Theory & Algorithms

View more ->

Deep Learning vs Machine Learning: What is the difference?


Serdar Palaoglu


In the realm of artificial intelligence, two fundamental concepts, Machine Learning and Deep Learning, have emerged as key components in the advancement of computer-based learning systems. Machine Learning serves as a foundational principle where computers gain the ability to learn from data without explicit programming. Deep Learning, an evolution within the Machine Learning framework, utilizes artificial neural networks inspired by the human brain to achieve complex data analysis. This article delves into a comprehensive exploration of these domains, elucidating their differences, practical applications, and significance in artificial intelligence.

How ChatGPT is fine-tuned using Reinforcement Learning


Thanh Long Phan


At the end of 2022, OpenAI released ChatGPT (a Transformer-based language model) to the public. Although based on the already widely discussed GPT-3, it launched an unprecedented boom in generative AI. It is capable of generating human-like text and has a wide range of applications, including language translation, language modeling, and generating text for applications such as chatbots. Feel free to also read our introduction to LLMs . ChatGPT seems to be so powerful that many people consider it to be a substantial step towards artificial general intelligence. The main reason for the recent successes of language models such as ChatGPT lies in their size (in terms of trainable parameters). But making language models bigger does not inherently make them better at following a user's intent. A bigger model can also become more toxic and more likely to "hallucinate". To mitigate these issues and to more generally align models to user intentions, one option is to apply Reinforcement Learning. In this blog post, we will present an overview of the training process of ChatGPT, and have a closer look at the use of Reinforcement Learning in language modeling. Also interesting: Our aggregated collection of LLM content .


Managing layered requirements with pip-tools


Augusto Stoffel (PhD)


When building Python applications for production, it's good practice to pin all dependency versions, a process also known as “freezing the requirements”. This makes the deployments reproducible and predictable. (For libraries and user applications, the needs are quite different; in this case, one should support a large range of versions for each dependency, in order to reduce the potential for conflicts.) In this post, we explain how to manage a layered requirements setup without forgoing the improved conflict resolution algorithm introduced recently in pip. We provide a Makefile that you can use right away in any of your projects!

The best (Python) tools for remote sensing


Emilius Richter


An estimated number of 906 Earth observation satellites are currently in orbit, providing science and industry with many terabytes of data every day. The satellites operate with both radar as well as optical sensors and cover different spectral ranges with varying spectral, spatial, and temporal resolutions. Due to this broad spectrum of geospatial data, it is possible to find new applications for remote sensing methods in many industrial and governmental institutions. On our website, you can find some projects in which we have successfully used satellite data and possible use cases of remote sensing methods for various industries . Well-known satellite systems and programs include Sentinel-1 (radar) and Sentinel-2 (optical) from ESA, Landsat (optical) from NASA, TerraSAR-X and TanDEM-X (both radar) from DLR, and PlanetScope (optical) from Planet. There are basically two types of geospatial data: raster data and vector data . Raster data Raster data are a grid of regularly spaced pixels, where each pixel is associated with a geographic location, and are represented as a matrix. The pixel values depend on the type of information that is stored, e.g., brightness values for digital images or temperature values for thermal images. The size of the pixels also determines the spatial resolution of the raster. Geospatial raster data are thus used to represent satellite imagery. Raster images usually contain several bands or channels, e.g. a red, green, and blue channel. In satellite data, there are also often infrared and/or ultraviolet bands. Vector data Vector data represent geographic features on the earth's surface, such as cities, country borders, roads, bodies of water, property rights, etc.. Such features are represented by one or more connected vertices, where a vertex defines a position in space by x-, y- and z-values. A single vertex is a point, multiple connected vertices are a line, and multiple (>3) connected and closed vertices are called polygons. The x-, y-, and z-values are always related to the corresponding coordinate reference system (CRS) that is stored in vector files as meta information. The most common file formats for vector data are GeoJSON, KML, and SHAPEFILE. In order to process and analyze these data, various tools are required. In the following, I will present the tools we at dida have had the best experience with and which are regularly used in our remote sensing projects. I present the tools one by one, grouped into the following sections: Requesting satellite data EOBrowser Sentinelsat Sentinelhub Processing raster data Rasterio Pyproj SNAP pyroSAR Rioxarray Processing vector data Shapely Python-geojson Geojson.io Geopandas Fiona Providing geospatial data QGIS GeoServer Leafmap Processing meteorological satellite data Wetterdienst Wradlib