Hi experts,
we are currently working on a predictive analytics use case where we want to do a clustering of persons depending on attributes that are assigned to these persons. The basic data we have, looks schematically like this:
ID | Attribute Value |
---|---|
1 | A |
1 | B |
1 | C |
1 | E |
2 | A |
2 | B |
2 | C |
3 | D |
3 | E |
4 | A |
In order to perfrom the clustering we transform this representation into a matrix consisting of boolean entries (0/1)
ID | Attribute A | Attribute B | Attribute C | Attribute D | Attribute E |
---|---|---|---|---|---|
1 | 1 | 1 | 1 | 0 | 1 |
2 | 1 | 1 | 1 | 0 | 0 |
3 | 0 | 0 | 0 | 1 | 1 |
4 | 1 | 0 | 0 | 0 | 0 |
Let's call this table input table.
In order to call the PAL algorithm (like hierarchical clustering) we have to persist this input table as a column table that we hand over to the PAL procedure. And here we think that we face a limitation: According to the HANA guides the number of columns is limited up to 1000. If we would create the input data table across all attributes we would have more than 1000 attributes and hence more columns which would not be applicable.
Now my question(s):
- Am I completly wrong with that statement concnerning the limitation or did I misunderstand something in PAL guideline?
- Did anybody face the same restrictions and if yes how did he overcome it
Any hints are appreciated
BR
Christian