Search code examples
azurecluster-analysistext-miningalteryx

Creating a DTM on Alteryx Designer


I am new to Alteryx and am trying to use it for analysing unstructured data. I have a column of description in text form and I intend to use the K-Means Clustering tool for topic modelling. For K-means to work on text, I will need to convert my text into a Document Term Matrix (DTM) so that they appear as continuous variables to the clustering tool. However, I am struggling to find a way I can convert my text to a DTM.

Does anyone know a way to do so? I am currently looking at the R tool but am not exactly sure how to start too. Hoping that all of you experts here can help me out!

I have looked through posts on text analysis and realized that most fell back on the Microsoft Azure ML Text Analysis Macro. However, I would like to avoid using the macro (to not be restricted to limited runs every month for scalability) and instead use tools that are available in Alteryx.

Thanks to everyone in advance!


Solution

  • with Alteryx being more of a pictoral drag-and-drop workflow, it's not trivial to explain here, however I've created the following workflow and included the actual workflow itself on the Alteryx forum here. The workflow utilizes term frequencies from Inauguration speeches but should apply to any collection of documents. It just splits the words based on various non-numeric characters and does a summary. This is what the workflow looks like:

    enter image description here