Internship Projects/Mentors


Title

Use of NLP for Telco Data (logs) Anonymization

Status

NOT ACCEPTING APPLICATIONS - NOW IN EVALUATING STAGE

Difficulty

MEDIUM


Description 

This work explores the possibility and effectiveness of NLP techniques for anonymizing Telco Data (logs). The goal is to answer the following questions?

  1. Are there sufficient and usable dataset available in the public domain to carry out this work?
  2. What are the current techniques that are used for anonymizing log-data?
  3. Do NLP-based techniques provide any efficiency (compared against existing techniques).?
  4. Are available libraries and tools available (Ex: presidio) sufficient? 
  5. What types of log-data are applicable for NLP-based techniques?
  6. Does anonymizing log-data affect (ex: Predictability power, detection accuracy, etc.) any of the ML-techniques ?


Apart from answering the above questions, the outcome of this work also includes a tool that will take log-data, and anonymize it using an NLP-based approach.

Additional Information

Due to the request being part time, Toth is teaming with the general Anuket project so that the idea would be that the intern would be able to work on both projects.  

Learning Objectives

Working on this project will help the Student to:

1. Understand the Telco-Data, and the need for anonymization.

2. Understand different techniques and methodologies of anonymization

3. Master the use of NLP for anonymization 

Expected Outcome

A tool that takes original data and outputs anonymized data.

Relation to LF Networking 

Anuket, Thoth

Education Level

Undergrad (BE)

Skills

  1. Python
  2. Basics of Data Analytics and ML.
  3. Basics of NLP.

Future plans

This tool will get merged with other anonymization techniques.

Preferred Hours and Length of Internship

3 Months Part-Time or 1.5 Months Full-Time (½ of the LF Mentorship Program duration).

Mentor(s) Names and Contact Info

Mentor: Sridhar Rao  srao@linuxfoundation.org, sridharkn, The Linux Foundation.

To apply, please do the following:

  • Send an email to the following:
  • Include your name, resume, and a statement of why you would be best for this project.
  • Due to the volume of applications, we may not respond until up to April 23rd.
    • Please be patient. 




3 Comments

  1. Respected Mentors Casey CainSridhar Rao, Gergely Csatari 
    Can you please check if the application link is working or not, because when I am trying to press "Click here to apply" the link isn't redirecting to any page.

    Thanking You,
    Shashank Shekhar Singh

    1. Seems the mailto link macro isn't working with multiple email addresses.

      I've updated the instructions.

      Thanks!

  2. Respected Mentors Sridhar Rao Casey Cain 
    I wanted to ask if there are any tasks to be done for the selection ?
    I have read all the prerequisites and i think i am good fit for the project, what are the other steps which i could do to increase my chances of selection?
    Thanks!