Search code examples
githubdeep-learningopenai-apigpt-3github-copilot

How to fine tune fine tune GitHub Copilot?


We can fine tune language models like BERT, GPT-3.

Can I fine tune GitHub Copilot model?

I have already looked into examples from https://copilot.github.com/ but cant find the details.

Would really appreciate if someone had fine tuned Github Copilot.


Solution

  • There does not seem to be a client-facing feature allowing you to fine-tune Copilot directly.

    Here are two illustration as to why this feature is, for now (Q2 2022) missing.

    The Copilot feature page initially included this:

    How will GitHub Copilot get better over time?

    GitHub Copilot doesn’t actually test the code it suggests, so the code may not even compile or run. GitHub Copilot can only hold a very limited context, so even single source files longer than a few hundred lines are clipped and only the immediately preceding context is used. And GitHub Copilot may suggest old or deprecated uses of libraries and languages. You can use the code anywhere, but you do so at your own risk.

    As Tomek Korbak explains on Twitter:

    Actually, Copilot's completions will always be optimised for human's liking, not necessarily compiler's liking.

    That's because the language model training objective (predicting the next token in text) is great at capturing short-term dependencies (which explains the human feel of generated snippets).

    But it struggles to capture long-term, global, semantic properties of generated sequences such as compilability. And there's no easy way of including compilability as a signal for their training.

    The standard way -- fine-tuning language models using RL with compilability as a reward -- notoriously leads to catastrophic forgetting: less diverse and less accurate completions.

    Tomek references "Energy-Based Models for Code Generation under Compilability Constraints (pdf)"

    https://pbs.twimg.com/media/E5NHqGjXIAYRtwa?format=png&name=small

    Our solution (KL-DPG) boosts compilability rate of generated sequences from 55% to 70%.
    RL fine-tuning can do better but at a cost of catastrophic forgetting.

    Overall, energy-based models (EBMs) turn out to be great at expressing weird, sequence-level constraints that would be super hard as to express as normalised priors for autoregressive language models.

    EBMs provide a way of injecting our structured, symbolic knowledge into large language models without breaking them down or sacrificing their uncanny abilities.
    The space of further applications in controllable generation is huge.

    So not so easy.

    Tanishq Mathew Abraham explains in "Coding with GitHub Copilot"

    I wonder if the GitHub team might also develop a way of perhaps fine-tuning GitHub Copilot to specific use-cases.

    For example, there may be a specific GitHub Copilot models for fastai, JAX, etc. They would be fine-tuned on the source code of of these libraries and codebases that use these libraries.

    But making sure that the tool does not provide outdated suggestions would still be a challenge.
    I don’t think it would be possible to provide suggestions for a brand-new library that does not have enough codebases using it to train on.

    Additionally, for situations like fastai where there are older APIs and newer APIs, when fine-tuning a model, the codebases using the older APIs would have to be filtered out.