Search code examples

Getting the list of ALL topic names from Freebase

According to Freebase, they have 23,407,174 topics. What is the easiest way to get the UI friendly names (essentially the 'text' attribute of the topic JSON, example of a single topic JSON is here) of ALL of these TOPICs? I don't need any other meta information.


  • wget -O - | bunzip2 | cut -f 2 > freebase-topic-names.txt

    although you probably want the Freebase IDs as well so that you know what the names refer to:

    wget -O - | bunzip2 | cut -f 1,2

    Two additional bits of postprocessing are needed:

    1. Tabs are escaped as \t
    2. The string \N represents a null (non-existent) name