Image for post
Image for post
Data is chaos and perhaps in this chaos you would find the most wonderful patterns — venali sonone

A guide to optimization problems with Google OR-Tools for Python

人間には『善の心』ともう1つの『悪の心』が存在しているんです!
“Every human being has a “good heart” and another “evil heart”!
Optimization of both is required to excel, it is a non-trivial quest.”

Topics Covered

  1. Optimization problem, what is it?
  2. Guide to Linear Optimization — Solver glop and Simplex algorithm
  3. Guide to Linear Optimization — Guide to Linear Optimization -
  4. Guide to Integer Optimization — MIP Solver
  5. Guide to Integer Optimization — Solving a MIP Problem
  6. Guide to Integer Optimization — Using Arrays to Define a Model
  7. Constraint Optimization — CP-SAT Solver
  8. Constraint Optimization — Using a CP-SAT Problem
  9. Constraint Optimization — Solving a CP-SAT Problem
  10. Constraint Optimization — Cryptarithmetic Puzzles
  11. Constraint Optimization — The N-queens…


Image for post
Image for post
Data is chaos and perhaps in this chaos you would find the most wonderful patterns — venali sonone

In the complete guide to NLP with fastai

Follow the link to the entire series by clicking here: The complete guide to NLP with fastai

See also The Annotated Transformer from Harvard NLP.

Attention and the Transformer

Nvidia AI researcher Chip Huyen wrote a great post Top 8 trends from ICLR 2019 in which one of the trends is that RNN is losing its luster with researchers.

There’s a good reason for this, RNNs can be a pain: parallelization can be tricky and they can be difficult to debug. …


Image for post
Image for post
Data is chaos and perhaps in this chaos you would find the most wonderful patterns — venali sonone

Part 4: Bidirectional and Attention RNN

In the complete guide to NLP with fastai

Follow the link to the entire series by clicking here: The complete guide to NLP with fastai

This post will put everything we learned until this point together and then introduce attention in translation with an RNN

This is exciting because the performance over our this journey of learning NLP can be summarized as below:

Image for post
Image for post

Excited???
Let’s get started…

Translation with an RNN

In this post, we will be tackling the task of translation. We will be translating from French to English, and to keep our task a manageable size, we will limit ourselves to translating questions.

This task is an example of a sequence to sequence (seq2seq). …


Image for post
Image for post
Data is chaos and perhaps in this chaos you would find the most wonderful patterns — venali sonone

Part 3: Teacher Forcing

In the complete guide to NLP with fastai

Follow the link to the entire series by clicking here: The complete guide to NLP with fastai

This will be a long post where we will build Rnn and explain the coded Rnn. Next using our Rnn we will introduce the new concept of Teacher Forcing and why one may use it.

So let’s get started….

Predicting the English word version of numbers using an RNN

We were using RNNs as part of our language model in the previous lesson. Today, we will dive into more details of what RNNs are and how they work. We will do this using the problem of trying to predict the English word version of numbers.

Let’s predict what should come next in this sequence:

eight thousand one , eight thousand two , eight thousand three , eight thousand four , eight thousand five , eight thousand six , eight thousand seven , eight thousand eight , eight thousand nine , eight thousand ten , eight thousand eleven , eight thousand…


Image for post
Image for post
Data is chaos and perhaps in this chaos you would find the most wonderful patterns — venali sonone

Part 1: Review Embeddings

In the complete guide to NLP with fastai

Follow the link to the entire series by clicking here: The complete guide to NLP with fastai

Regex workflow

In [6]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline
import pandas as pd
import re

Jeremy Howard’s lecture is the basis for this review: a three-part review plan

* regex workflow
* svd
* transfer learning.

regex is used every day in NLP work, and that it is essential for machine learning practitioners to develop a working knowledge of regex. Since we've already done deep dives into svd and into transfer learning, we'll focus on the regex part of this review.

A simple regex exercise

To illustrate the power of regex and familiarize us with the following problem:
Let's extract all the phone numbers from the Austin Public Health Locations database and create a list of the phone numbers in the standard format (ddd) ddd dddd. …


Image for post
Image for post
Data is chaos and perhaps in this chaos you would find the most wonderful patterns — venali sonone

Part 2: Bleu metrics

In the complete guide to NLP with fastai

Follow the link to the entire series by clicking here: The complete guide to NLP with fastai

What is the BLEU metric?

The BLEU metric has been introduced in this article to come with some kind of way to evaluate the performance of translation models. It's based on the precision you hit with n-grams in your prediction compared to your target. Let's see this as an example. Imagine you have the target sentence

the cat is walking in the garden

and your model gives you the following output

the cat is running in the fields

We are going to compute the precision, which is the number of correctly predicted n-grams divided by the number of predicted n-grams for n going from 1 to 4. …


Image for post
Image for post
Data is chaos and perhaps in this chaos, you would find the most wonderful patterns — venali sonone

Part 1: Language Models

In the complete guide to NLP with fastai

Follow the link to the entire series by clicking here: The complete guide to NLP with fastai

Transfer Learning for Natural Language Modeling

Constructing a Language Model and a Sentiment Classifier for IMDB movie reviews

Transfer learning has been widely used with great success in computer vision for several years, but only in the last year or so has it been successfully applied to NLP (beginning with ULMFit, which we will use here, which was built upon by BERT and GPT-2).

As Sebastian Ruder wrote in The Gradient last summer, NLP’s ImageNet moment has arrived.

We will first build a language model for IMDB movie reviews. Next, we will build a sentiment classifier, which will predict whether a review is negative or positive, based on its text. For both of these tasks, we will use transfer learning. Starting with the pre-trained weights from the wikitext-103 language model, we will tune these weights to specialize in the language of IMDb movie reviews. …


Image for post
Image for post
Data is chaos and perhaps in this chaos, you would find the most wonderful patterns — venali sonone

Part 2: Transfer learning

In the complete guide to NLP with fastai

Follow the link to the entire series by clicking here: The complete guide to NLP with fastai

Transfer Learning

We are going to create an IMDb language model starting with the pre-trained weights from the wikitext-103 language model.

Now let’s grab the full IMDb dataset for what follows.

In [4]:
path = untar_data(URLs.IMDB)
path.ls()
Out[4]:
[WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/data_clas.pkl'),
....
WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/vocab_lm.pkl')]In [12]:(path/'train').ls()Out[12]:
[WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/train/labeledBow.feat'),
...
WindowsPath('C:/Users/cross-entropy/.fastai/data/imdb/train/unsupBow.feat')]

The reviews are in a training and test set following an imagenet structure. The only difference is that there is an unsup folder in train that contains the unlabelled data.

We’re not going to train a model that classifies the reviews from scratch. Like in computer vision, we’ll use a model pre-trained on a bigger dataset (a cleaned subset of Wikipedia called wikitext-103). That model has been trained to guess what the next word, its input being all the previous words. It has a recurrent structure and a hidden state that is updated each time it sees a new word. This hidden state thus contains information about the sentence up to that point. …


Image for post
Image for post
Data is chaos and perhaps in this chaos, you would find the most wonderful patterns — venali sonone

Part 3: Sentiment Classification

In the complete guide to NLP with fastai

Follow the link to the entire series by clicking here: The complete guide to NLP with fastai

Building an IMDb Sentiment Classifier

We’ll now use transfer learning to create a classifier, again starting from the pretrained weights of the wikitext-103 language model. We'll also need the IMDb language model an encoder that we saved previously.

A. Load and preprocess the data, and form a databunch

Using fastai’s flexible API, we will now create a different kind of databunch object, one that is suitable for a classifier rather than a for language model (as we did in 2A). This time we'll keep the labels for the IMDb movie reviews data.

Add the try-except wrapper workaround for the bug in the fastai Text…


Image for post
Image for post
Data is a chaos and perhaps in this chaos you would find the most wonderful patterns — venali sonone

In the complete guide to NLP with fastai

Follow the link to the entire series by clicking here:
The complete guide to NLP with fastai

Regex

In this blog, we’ll learn about a useful tool in the NLP toolkit: regex.
Let’s consider two motivating examples:

1. The phone number problem

Suppose we are given some data that includes phone numbers:

123–456–7890
123 456 7890
101 Howard

Some of the phone numbers have different formats (hyphens, no hyphens). Also, there are some errors in the data — 101 Howard isn’t a phone number! How can we find all the phone numbers?

2. Creating our own tokens

In the previous lessons, we used sklearn or fastai to tokenize our text. …

About

Venali Sonone

Data Scientist by profession and just lazy by nature.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store