Search code examples
khan-academy

How can one get all khan academy topics for one subject and/or grade only


Using the khan academy API, I would like to retrieve a list of all math topics and sub-topics for a certain grade (and related video ids), similar to what you can see here - https://www.khanacademy.org/math/cc-seventh-grade-math

Ideally, I would like to pass the grade (7th) and subject (math) as a parameter in an API call to do this? Is this possible?

Looking at the full topic tree, 'domain-slug' appears to be the closest thing to 'subject' in the way that I'm using the word, but it doesn't appear to be set consistently. I also don't see a dedicated field for grade.

How would you go about achieving this? Any advice would be most appreciated. Thanks.


Solution

  • I don't use the topic tree API call - it returns about 50 MB of data. I rather traverse the nodes of the tree individually using the API call "http://www.khanacademy.org/api/v1/topic/%s" where %s is the "node_slug" field, starting with a "node_slug" of "root".

    From there you use the "children" and "child_data" entries to traverse the sub-nodes. "children" has the details and "child_data" basically just the order in which they appear.

    For each node, there are two important fields to look at, "kind" and "render_type".

    "kind" can have the values:

    • "Topic"
    • "Video"
    • "Exercise"
    • "Article"
    • "Scratchpad"
    • "Separator"

    "render_type" can have the values:

    • "Root"
    • "Domain"
    • "Subject"
    • "Topic"
    • "Tutorial"
    • "UncuratedTutorial"

    So, from "root" you iterate through the child nodes looking for nodes with "render_type" = "Domain". That will give you stuff like "math", "science", etc. Now you can use the "math" node to iterate through the subjects under it, looking for "render_type" = "Subject". Among those you will find 7th Grade, etc.

    Note: Both domain and subject nodes have "kind" = "Topic", so you should make sure you check for these and then use the "render_type" to find the domain or subject distinction.

    What I also do is to cache the JSON responses so that the application doesn't have to reload them from the website. I have an option to refresh them from the website when needed.

    Then you can use the subject node to further iterate through its children for the videos, exercises, articles, etc.