Video Classification using CNN+ RNN


This article explores how to perform video classification using CNN+RNN models. The use case here is to have a system to monitor the driver on the car, and check if the driver is drowsy or not. The complete system would include a device with camera that hosts the built model that can be installed on a car. The hardware deployment part is out of the scope of this article. If you are interested in it, feel free to put in comments. I’ll share the link to my code there. The focus of this article is video classification model building and comparison.

The nature of videos is a sequence a consecutive frames. When it comes to video classification or event recognition, it is often necessary to process multiple frames together to make sense of what is happening. This article use an approach that combines the image feature extraction of convolutions and temporal information processing of RNN. There can be different approaches. Initially, I started with ConvLSTM (Convolutional Long Short-Term Memory). However, I realised that the overall accuracy is low initially and it overfits. While I tried different methods to optimise the same model, I also explored further and tried LRCN (Long-term Recurrent Convolutional Network). LRCN produced better stability and there is much less sign of overfitting. It also trains faster. I’ll give an overview of both approaches before sharing the step-by-step code.

Approach 1: ConvLSTM (Convolutional Long Short-Term Memory)

This is an existing Keras class (ConvLSTM2D layer, n.d.) that can be used. The structure of ConvLSTM network is essentially the same as a typical LSTM. The difference is that inside each cell, the matrix multiplication between input and weights in the input, forget and output gates is changed to convolutions. This way, it can take image inputs directly, extracting the features inside the LSTM cells and feeding them to the next time step to be combined with the convolution result of the next input frame. There is a research paper that provided a good grasp of the concept and technical details (Medel, 2016).

The diagram below was drawn based on my understanding of the ConvLSTM model and our implementation in this project.

ConvLSTM model

Approach 2: LRCN (Long-term Recurrent Convolutional Network)

Compared to the ConvLSTM approach, it requires the image to be processed first before feeding into the LSTM cell. Compared to ConvLSTM, it does perform less rounds of convolution computation as what is fed into the input of LSTM is the flattened vector output from the earlier convolutional layers. The diagram below illustrates the architecture of the model.

LRCN Model

One key difference between ConvLSTM and LRCN in input is that the inputs (x0, x1, x2…xt) in ConvLSTM are picture data directly while the input in LRCN are vectors as the result of CNN layers.


Data Acquisition and Preprocessing

To start with, we need to get the data needed for training. Fortunately enough, there is a good labelled dataset from National Tsing Hua University (Ching-Hua Weng, 2016). They were very kind and shared the data generously.

Dataset Info

Each video is about 100 seconds long and there is a mixture of drowsy sections and alert sections. For example, one of the videos has the following frames:

Frame-wise Labelling

As you can see that each video has frames of different classes. For model training, you need videos of the same type to the model for training. Therefore, the dataset cannot be used as-is. Therefore, I decided to make video clips of the same label (all “0” frames or “1” frames). The design is to build 10-second videos clips of the same type out of the given dataset. The assumption (human judgement) here is that a 10-second video is long enough to give good information to determine whether the driver is drowsy or not, while it is short enough to be processed with limited compute power.

Therefore, the training and testing data was produced by processing the videos obtained and made into 10-second clips. To do this, I developed a function based on OpenCV to read 10 seconds of frames (10s * 30 fps=300 frames) of the same label and save as a video file. Frame by frame, the corresponding label in the txt file is checked before it is saved. This makes sure the video clip is of the same type (drowsy or non-drowsy).

# Import opencv
import cv2
# Import operating sys
import os
# Import matplotlib
from matplotlib import pyplot as plt# Using batch to track videos
batch = 26# Establish capture object
cap = cv2.VideoCapture(os.path.join('data','sleepyCombination.avi'))

# Properties that can be useful later.
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
fps = cap.get(cv2.CAP_PROP_FPS)
total_frames = cap.get(cv2.CAP_PROP_FRAME_COUNT)

#The length of each clip in seconds
num_sec = 10
# 1 for drowsy, 0 for alert.
capture_value = str(1)

# Open the Text File for tagging
with open(os.path.join('data','009_sleepyCombination_drowsiness.txt')) as f:
content = f.read()

# Helper counters.
# i tracking how many frames have been saved into clips
# j tracking where the current frame is.
i=0
j=0

# To make sure the number of labels is exactly the same as the number of frames before proceeding.
print(len(content))
print(total_frames)# Video Writer
# If there are still enough frames to be processed. Making the 1st video clip.
if j< total_frames - num_sec*fps:
video_writer = cv2.VideoWriter(os.path.join('data','class'+capture_value,'class'+capture_value + '_batch_'+str(batch)+'_video'+'1.avi'), cv2.VideoWriter_fourcc('P','I','M','1'), fps, (width, height))
# Loop through each frame
for frame_idx in range(int(cap.get(cv2.CAP_PROP_FRAME_COUNT))):
# Read frame
ret, frame = cap.read()
j+=1
# Show image
#cv2.imshow('Video Player', frame)

if ret==True:
if content[frame_idx] == capture_value:
# Write out frame
video_writer.write(frame)
i+=1
if i > num_sec*fps:
break
# Breaking out of the loop
if cv2.waitKey(10) & 0xFF == ord('q'):
break

# Release video writer
video_writer.release()# Making the 2nd clip
if j< total_frames - num_sec*fps:
video_writer = cv2.VideoWriter(os.path.join('data','class'+capture_value,'class'+capture_value+ '_batch_'+str(batch)+'_video'+'2.avi'), cv2.VideoWriter_fourcc('P','I','M','1'), fps, (width, height))
for frame_idx in range(i+1, int(cap.get(cv2.CAP_PROP_FRAME_COUNT))):
ret, frame = cap.read()
j+=1
#cv2.imshow('Video Player', frame)
if ret==True:
if content[frame_idx] == capture_value:
video_writer.write(frame)
i+=1
if i > 2*num_sec*fps:
break

# Release video writer
video_writer.release()

# Making the 3rd clip
if j< total_frames - num_sec*fps:
video_writer = cv2.VideoWriter(os.path.join('data','class'+capture_value,'class'+capture_value+ '_batch_'+str(batch)+'_video'+'3.avi'), cv2.VideoWriter_fourcc('P','I','M','1'), fps, (width, height))
for frame_idx in range(i+1, int(cap.get(cv2.CAP_PROP_FRAME_COUNT))):
ret, frame = cap.read()
j+=1
#cv2.imshow('Video Player', frame)
if ret==True:
if content[frame_idx] == capture_value:
video_writer.write(frame)
i+=1
if i > 3*num_sec*fps:
break
# Release video writer
video_writer.release()

### Continue making 4th, 5th..in the same way. Based on the data given,
### each video can be made into 8-9 clips.

# Close down everything at the end.
cap.release()
cv2.destroyAllWindows()

After preparation, we have the following set of videos for training and testing:

Video Clips Prepared

One thing important to mention here is about frame sampling. On a video that is 30 frames per second, if you take 20 consecutive frames, for example, it is less than 1 second of content. It does not contain sufficient information for inference. Each frame should look similar as well because in a car-driving setting, the changes are not drastic. Therefore, you can consider to do some down sampling. What I did was to take the 20 frames out of 200 frames, skipping 10 frames at each step, this captures 6–7 seconds of information and good enough to determine if the driver is drowsy or not. One key consideration of the design is also about the eventual deployment of the target solution. If we were to deploy the model on an edge device, what is a good number of frames that can be processed as one sequence. Based on the experiments with sample videos, 20 frames as one sequence is a good balance between sufficient information and the limited compute power. Just think about the nature of the type of videos we need to eventually. It is camera captures of drivers. This type of videos usually does not have drastic changes across frames. When a driver becomes drowsy, it is also a gradual change process. So, it is not critical to capture very single frame.


Model Building

The detailed model building code and explanation can be found in my other blog: https://medium.com/@junfeng142857/real-time-video-classification-using-cnn-rnn-5a4d3fe2b955

Iterations and findings

In the initial model, I tried feeding the entire 10-second video (300) frames into the model. This turned out too big for the compute power available to us. The biggest computer we are using is the one provided by Google Colab pro+ subscription which provides:

  • RAM: 90GB
  • GPU: NVIDIA-SMI 460.32.03; Driver Version: 460.32.03; CUDA Version: 11.2; GPU Memory size: 40GB

Even with the compute power above, the model was not able to train as it crashes the machine. Therefore, I decided to reduce the number of frames per sequence. This consideration is not just for training. We also need to consider eventual deployment which is on an edge device that has much less compute power. To decide what number of frames is good, I took the following factors into consideration:

  1. Deploy-ability. After quick deployment test on Raspberry Pi, a TensorFlow model with 20 frames per sequence can run at near full CPU. 40 frames and above had the risk of device failure.
  2. Model performance. I also compared how the number of frames per sequence impacts the model stability and accuracy, and noticed that 20 frames yields better accuracy than 40 frames across both approaches!

On top of the above, 20 frames per second also has a higher training speed. Therefore, it is good to keep the model at 20 frames per second.

In addition, through running iterations of training with both models, it was noticed that ConvLSTM has overall better accuracy at 20 frames per sequence, but LRCN has clear advantages.

  1. ConvLSTM has better test accuracy. In V1-40-frame, LRCN has a higher accuracy than ConvLSTM (ConvLSTM 70% vs LRCN 84%). In V2-20-frame training, LRCN has a lower accuracy (ConvLSTM 93% vs LRCN 84%). This shows that LRCN is more stable and ConvLSTM’s accuracy varies based on data. This difference can come from the complexity of the ConvLSTM model. Convolution is done at the input, forget and output gates. So longer sequence may give the model too much information to process so that it negatively impacted the performance.
  2. ConvLSTM also has higher tendency to overfit. Based on the training history, ConvLSTM has a higher tendency of overfitting. This should also be from the model complexity of ConvLSTM.
  3. LRCN has a higher training and inference speed. For a batch_size of 4, LRCN trains at 52 ms/step while ConvLSTM is at 884 ms/step.

To optimise the model, to following were also done during the training process.

  1. When preparing data, the data between the classes were balanced. There are the same number of videos in the dataset for each class.
  2. Dropout layers were added to the models. This helped reduce overfit.

Reference:

Video tutorials and documentations referenced:

Dataset:

Ching-Hua Weng, Ying-Hsiu Lai, Shang-Hong Lai, Driver Drowsiness Detection via a Hierarchical Temporal Deep Belief Network, In Asian Conference on Computer Vision Workshop on Driver Drowsiness Detection from Video, Taipei, Taiwan, Nov. 2016

Literature Mentioned:

  • Medel, J. R. (2016). Anomaly Detection Using Predictive Convolutional Long Short Term Memory Units . Rochester Institute of Technology.

IoT LoRaWAN Payload Decoding

IoT devices typically transmit data over a long distance with limited power available. LoRaWAN protocol is design for this purpose. It also brings other advantages such as

  • Scalability: LoRaWAN networks can support millions of devices, making it suitable for wide-scale deployments.
  • Security: LoRaWAN has built-in encryption and security at the network and application layers. This ensures that data transmitted between devices and the network server is secure.

Recently, I’ve had some fun playing with some IoT devices and looking closely into how the payload can be encoded and decoded. This article is meant to focus on the payload decoding. However, it is good to give an overview the setup as background information.

The system setup consists of the following main components.

  1. Weather station set up to collect 8 metrics.
  2. Helium Network setup for collection raw data
  3. Data Parsing and Machine Learning Model Building on Colab
  4. Model Deployment on www.pythonanywhere.com

Data is collected through 8-in-1 weather station. Purchased from Seeed Studio.

It collects the following metrics:

Data is transmitted through Helium Network and collected into Google Sheet.

This is an example of debug payload on the Helium console:

The payload is base64 encoded. For example, the string AQEwQAAAJVYAAAg= when decoded into Hex it is:

01 01 30 40 00 00 25 56 00 00 08

It carries the actual weather data we are collecting such as temperature, humidity, wind speed etc. The encoding logic is defined by the product provider, and is shared in JavaScript code: S2120-Helium decoder. In my project, I need to convert the JavaScript code into Python for Machine Learning model building. That process allowed me to understand the logic in-depth.

Data is interpreted in the following structure:

The dataId at the beginning determines the type of data the subsequent numbers encode.

If dataId=01, the subsequent numbers encode Temperature, Humidity, Light Intensity, UV Index, and Wind Speed.

If dataId=02, the subsequent numbers encode Wind Direction, Rain Gauge, and Barometric Pressure.

The tables below illustrate how the numbers are decoded layer by layer following the logic.

General Regression Neural Network (GRNN) Illustrated in Excel

Unlike other popular Machine Learning algorithms, there is not as much content on the Internet for beginners on General Regression Neural Network (GRNN). Wikipedia: https://lnkd.in/g2zktChp

I’ve built an Excel workbook to illustrate the idea of GRNN for those who are getting started. I hope this is helpful and I look forward to your comments.

What does a General Regression Neural Network (GRNN) do?

It is a function approximation that calculates the predicted ŷ from existing training data X and Y, letting the output of each training data sample contribute a certain weighted amount to the predicted ŷ. Once all training data are loaded, the prediction can be done simply by calculating the distance between input x and all the inputs in the training data X (x1, x2, x3,…, xi). Through an activation function, the distance turns into a weightage value that determines how much the corresponding yi contributes to ŷ.

Some experiments to try with the Excel:

1. Change training data. You can see that in the training data, Y is actually a simple calculation from X. This is good for illustration. Feel free to change the relationship between X and Y. For example, from Y = X+3 to Y = X+10 or Y = X*5. See how the model predicts ŷ accordingly.

2. Play with the only hyperparameter in the model — std (σ). You can see that the bigger the value of σ, the more sample data are involved in contributing to the final result. Please note that Excel has a limit of precision to numerical values. When the numbers are small enough, they become 0. Note that the farther away from the input, the smaller the weight.

3. Special case — x is equal to one training data point. If your input x is the same as one of the training data, you can see the weight value becomes 1, and other weights are very small.

Embeddings — everything can be a vector

To computers, everything is numeric. Any object can be a vector for computers to process. Here I mean any. An image, a piece of music, a piece of text, anything. Imagine every human individual can also just be represented by a (super long) vector, with all bio- and socio- info encoded in a series of numbers…

This is the same for word tokens in Natural Language Processing. Since earlier models such as Google’s Word2Vec, there have been methods to represent word tokens in vectors that carries information of word meanings. The “word token” here is used loosely. Depending on how you training your tokenizer, it can be at word level, but also other levels of different “granularity”: unigram bytes level, subword level, multi-word phrase level, and sentence level and even document level.

Every embedding method has an objective. For word tokens, it is for encoding similarity and distance between the tokens. This determines how they are trained. Word embeddings are trained from word contexts. For example, in Word2Vec [T. Mikolov et al.], it is through Continuous Bag of Words (CBOW) and Skip-grams. This makes sure that words appear in similar context are closer to each other in distance.

Instead of more theories, let’s examine some real examples. We can use the gensim library and download pretrained embeddings.import gensim.downloader

# Download the pretrained "word2vec-google-news-300" embeddings.
W2V_vectors = gensim.downloader.load('word2vec-google-news-300')

[===============================] 100.0% 1662.8/1662.8MB downloaded

Then we can query the embedding vector by using word tokens as keys.# Use the downloaded vectors as usual:

dog_vector = W2V_vectors['dog']
print("The embedding vector:\n", dog_vector)
print("The shape of the vector is ", dog_vector.shape)The embedding vector:

[ 5.12695312e-02 -2.23388672e-02 -1.72851562e-01 1.61132812e-01
-8.44726562e-02 5.73730469e-02 5.85937500e-02 -8.25195312e-02
-1.53808594e-02 -6.34765625e-02 1.79687500e-01 -4.23828125e-01
-2.25830078e-02 -1.66015625e-01 -2.51464844e-02 1.07421875e-01
-1.99218750e-01 1.59179688e-01 -1.87500000e-01 -1.20117188e-01
1.55273438e-01 -9.91210938e-02 1.42578125e-01 -1.64062500e-01
-8.93554688e-02 2.00195312e-01 -1.49414062e-01 3.20312500e-01
3.28125000e-01 2.44140625e-02 -9.71679688e-02 -8.20312500e-02
-3.63769531e-02 -8.59375000e-02 -9.86328125e-02 7.78198242e-03
-1.34277344e-02 5.27343750e-02 1.48437500e-01 3.33984375e-01
1.66015625e-02 -2.12890625e-01 -1.50756836e-02 5.24902344e-02
-1.07421875e-01 -8.88671875e-02 2.49023438e-01 -7.03125000e-02
-1.59912109e-02 7.56835938e-02 -7.03125000e-02 1.19140625e-01
2.29492188e-01 1.41601562e-02 1.15234375e-01 7.50732422e-03
2.75390625e-01 -2.44140625e-01 2.96875000e-01 3.49121094e-02
2.42187500e-01 1.35742188e-01 1.42578125e-01 1.75781250e-02
2.92968750e-02 -1.21582031e-01 2.28271484e-02 -4.76074219e-02
-1.55273438e-01 3.14331055e-03 3.45703125e-01 1.22558594e-01
-1.95312500e-01 8.10546875e-02 -6.83593750e-02 -1.47094727e-02
2.14843750e-01 -1.21093750e-01 1.57226562e-01 -2.07031250e-01
1.36718750e-01 -1.29882812e-01 5.29785156e-02 -2.71484375e-01
-2.98828125e-01 -1.84570312e-01 -2.29492188e-01 1.19140625e-01
1.53198242e-02 -2.61718750e-01 -1.23046875e-01 -1.86767578e-02
-6.49414062e-02 -8.15429688e-02 7.86132812e-02 -3.53515625e-01
5.24902344e-02 -2.45361328e-02 -5.43212891e-03 -2.08984375e-01
-2.10937500e-01 -1.79687500e-01 2.42187500e-01 2.57812500e-01
1.37695312e-01 -2.10937500e-01 -2.17285156e-02 -1.38671875e-01
1.84326172e-02 -1.23901367e-02 -1.59179688e-01 1.61132812e-01
2.08007812e-01 1.03027344e-01 9.81445312e-02 -6.83593750e-02
-8.72802734e-03 -2.89062500e-01 -2.14843750e-01 -1.14257812e-01
-2.21679688e-01 4.12597656e-02 -3.12500000e-01 -5.59082031e-02
-9.76562500e-02 5.81054688e-02 -4.05273438e-02 -1.73828125e-01
1.64062500e-01 -2.53906250e-01 -1.54296875e-01 -2.31933594e-02
-2.38281250e-01 2.07519531e-02 -2.73437500e-01 3.90625000e-03
1.13769531e-01 -1.73828125e-01 2.57812500e-01 2.35351562e-01
5.22460938e-02 6.83593750e-02 -1.75781250e-01 1.60156250e-01
-5.98907471e-04 5.98144531e-02 -2.11914062e-01 -5.54199219e-02
-7.51953125e-02 -3.06640625e-01 4.27734375e-01 5.32226562e-02
-2.08984375e-01 -5.71289062e-02 -2.09960938e-01 3.29589844e-02
1.05468750e-01 -1.50390625e-01 -9.37500000e-02 1.16699219e-01
6.44531250e-02 2.80761719e-02 2.41210938e-01 -1.25976562e-01
-1.00585938e-01 -1.22680664e-02 -3.26156616e-04 1.58691406e-02
1.27929688e-01 -3.32031250e-02 4.07714844e-02 -1.31835938e-01
9.81445312e-02 1.74804688e-01 -2.36328125e-01 5.17578125e-02
1.83593750e-01 2.42919922e-02 -4.31640625e-01 2.46093750e-01
-3.03955078e-02 -2.47802734e-02 -1.17187500e-01 1.61132812e-01
-5.71289062e-02 1.16577148e-02 2.81250000e-01 4.27734375e-01
4.56542969e-02 1.01074219e-01 -3.95507812e-02 1.77001953e-02
-8.98437500e-02 1.35742188e-01 2.08007812e-01 1.88476562e-01
-1.52343750e-01 -2.37304688e-01 -1.90429688e-01 7.12890625e-02
-2.46093750e-01 -2.61718750e-01 -2.34375000e-01 -1.45507812e-01
-1.17187500e-02 -1.50390625e-01 -1.13281250e-01 1.82617188e-01
2.63671875e-01 -1.37695312e-01 -4.58984375e-01 -4.68750000e-02
-1.26953125e-01 -4.22363281e-02 -1.66992188e-01 1.26953125e-01
2.59765625e-01 -2.44140625e-01 -2.19726562e-01 -8.69140625e-02
1.59179688e-01 -3.78417969e-02 8.97216797e-03 -2.77343750e-01
-1.04980469e-01 -1.75781250e-01 2.28515625e-01 -2.70996094e-02
2.85156250e-01 -2.73437500e-01 1.61132812e-02 5.90820312e-02
-2.39257812e-01 1.77734375e-01 -1.34765625e-01 1.38671875e-01
3.53515625e-01 1.22070312e-01 1.43554688e-01 9.22851562e-02
2.29492188e-01 -3.00781250e-01 -4.88281250e-02 -1.79687500e-01
2.96875000e-01 1.75781250e-01 4.80957031e-02 -3.38745117e-03
7.91015625e-02 -2.38281250e-01 -2.31445312e-01 1.66015625e-01
-2.13867188e-01 -7.03125000e-02 -7.56835938e-02 1.96289062e-01
-1.29882812e-01 -1.05957031e-01 -3.53515625e-01 -1.16699219e-01
-5.10253906e-02 3.39355469e-02 -1.43554688e-01 -3.90625000e-03
1.73828125e-01 -9.96093750e-02 -1.66015625e-01 -8.54492188e-02
-3.82812500e-01 5.90820312e-02 -6.22558594e-02 8.83789062e-02
-8.88671875e-02 3.28125000e-01 6.83593750e-02 -1.91406250e-01
-8.35418701e-04 1.04003906e-01 1.52343750e-01 -1.53350830e-03
4.16015625e-01 -3.32031250e-02 1.49414062e-01 2.42187500e-01
-1.76757812e-01 -4.93164062e-02 -1.24511719e-01 1.25976562e-01
1.74804688e-01 2.81250000e-01 -1.80664062e-01 1.03027344e-01
-2.75390625e-01 2.61718750e-01 2.46093750e-01 -4.71191406e-02
6.25000000e-02 4.16015625e-01 -3.55468750e-01 2.22656250e-01]
The shape of the vector is (300,)

Find the top 10 most similar words.

# Find most similar words
W2V_vectors.most_similar('dog')

[('dogs', 0.8680489659309387),
('puppy', 0.8106428384780884),
('pit_bull', 0.780396044254303),
('pooch', 0.7627376914024353),
('cat', 0.7609457969665527),
('golden_retriever', 0.7500901818275452),
('German_shepherd', 0.7465174198150635),
('Rottweiler', 0.7437615394592285),
('beagle', 0.7418621778488159),
('pup', 0.740691065788269)]

Then we can do some interesting arithmetic operations to get new words based on distance.

# Getting most similar workds based on a given distance.
W2V_vectors.most_similar(positive=['woman', 'king'], negative=['man'],topn=1)

[('queen', 0.7118193507194519)]

If the input parameters looks a bit confusing, here is an alternative view:

If the following holds:

woman – man = queen – king

then, the following must be true:

woman – man + king = queen

You can see from the above. On the left side of the equation, “woman” and “king” are the positive values and “man” is the negative value, and that’s how we get “queen” on the right side.

How to hyperlink to a specific location on a long web page?

For those looking for a quick answer, here it is:

Put in the pattern in the address bar: <URL>#<HTML Element ID>

For example:

https://technet.microsoft.com/en-SG/library/cc262787.aspx#ContentDB

https://msdn.microsoft.com/en-sg/library/ff877884.aspx#AvailabilityModes

If you would like to read a discussion on this topic, feel free to move on. Otherwise, the answer above is all you need to know. 😉

When you are sharing a webpage with others, it may be frustrating for the reader to find the exact content you are sharing when the webpage is long. What if you can direct the reader to the exact location through the hyperlink you are sharing? For example, on a lengthy TechNet Article about SharePoint limitations, the reader is directed to the Content Database limitations directly when opening the page with this hyperlink: https://technet.microsoft.com/en-SG/library/cc262787.aspx#ContentDB

The trick lies in the suffix “#ContentDB” in the URL. So the question this post is trying to answer is how to determine what to add to the end of the URL for navigating the users to a specific location on the webpage directly?

We know in HTML we can assign an ID to a tag such as <p id=”something”></p>, which is the unique identifier of the this is specific element. You can then use this ID to locate the content to share. Not every element on a webpage has an ID attribute though. So having an ID is the prerequisite for locating the content directly.

How to find the ID of the location you are sharing if any? There are two ways, the easy way and the hard way.

The easy way exists when there are internal hyperlinks on a webpage, i.e. the hyperlink points to a location on the same page. In this case you can copy the hyperlink directly and share with others. For example, on TechNet articles, you constantly see hyperlinks to the same page.

 Hyperlinks

The hard way comes when there is no internal Hyperlink on the webpage. You will need to check the source code of the webpage for any ID that can be used.

One-pic-4-a-thousand-words