Kylberg Texture and Noise Dataset

During my Ph.D. studies at Uppsala University, I acquired and published three texture datasets, I have had requests on hosting these datasets on additional locations other than the university server (address Here is an overview: link

In my PhD thesis (Section 4.4) I used a dataset with two similar textures where I changed the ISO setting on my camera to introduce different levels of noise. The figure below shows examples of each class and ISO level.

Number of texture classes2
Number of noise levels8
Number of texture samples per class and noise level486
Size of texture sample (w x h [pixels])192 x 192
Total number of texture samples7 776
File format8 bit PNG
Size on disk~193 MB


How to reference

If you use this dataset in your research or other work, please give a reference to my PhD thesis:

Kylberg, G. Automatic Virus Identification using TEM : Image Segmentation and Texture Analysis PhD dissertation, Uppsala University, Uppsala 2014. Available from:

From my thesis

Link to my thesis.

Section 4.4 touches upon the topic of invariant texture features, overfitting and generalization performance in texture recognition.

Using this dataset I show that the applied texture descriptors can differentiate not only between the two texture classes but also between the noise levels. Subsampling the textures to 1/4 of the original scale the descriptors start to only see the two texture classes.

By considering different decision boundaries in the feature space projections in Fig. 4.9 below we can easily find situations where a classifier would generalize badly to new noise levels. For example, consider using a horizontal line as a decision boundary only looking at the ISO 100 data in the top-left plot. That boundary would very badly handle the noisier data,

Another interesting situation that can occure is that a classifier could pick up difference in noise levels between classes rather than differences between the textures. Without careful model validation and/or understanding of the generated feature space it is very difficult to know what characteristic properties the discrimination is actually based on.

Figure 4.9. 2D LDA feature subspaces for each descriptor and scale. Each coloured dot represents a sample. The hue indicates ISO-level while saturation indicates the texture class. When applying LDA both texture class and noise level is regarded.