Search code examples
tensorflowmachine-learningtext-recognition

Deciding how meaningful title / text is


I am trying to tackle issue of meaningless commit messages and pr descriptions and TensorFlow came to mind in combination with a GitHub action.enter link description here

However I am struggling with figuring out how to define "meaningless" of tex i.e.

Meaningless Description Adde new folder to repository

Meaningful Description Added assets folder to house image files

Any pointers in right directions are appreciated.


Solution

  • Well, clearly this is a text classification problem and your use case is pretty classic. To classify a github commit description as meaningful or meaningless you'd have to have a ton of training data. The data would constitute of description strings that are labeled as meaningful / meaningless. The way I picture it and the normal approach to this kind of classification problem using Tensorflow and other deep learning libraries like Keras is to have your training data in the form of a .csv file with 2 columns say,

    1. description (contains a commit's description string)
    2. result (contains a verdict like meaningful / meaningless or 1 / 0)

    You can then train a text classifier using this data and the trained model can then be used to predict whether the given description is good or not.

    I'd recommend you give Ludwig a try. This is Uber's open source deep learning library and is extremely easy to use for tasks like text classification. It's build atop TensorFlow and is really easy to use.

    Hope that answers your query. Thanks!