Docs
  1. Official Endpoint - v1beta
Docs
  • Introduction
  • Quick Start Guide
  • Make a request
  • YEScale System API Tutorial
  • Chat Models
    • ChatGpt
      • ChatGPT (Audio)
        • Create transcription by gpt-4o-mini-transcribe & gpt-4o-transcribe
        • Create a voice with gpt-4o-mini-tts
        • Create a voice
        • Create a transcript
        • Create translation
      • ChatGPT (Chat)
        • Chat completion object
        • Create chat completion (streaming)
        • Create chat completion (non-streaming)
        • Create chat image recognition (streaming)
        • Create chat image recognition (streaming) base64
        • Create chat image recognition (non-streaming)
        • Function calling
        • N choices
        • Create chat function call (only non-streaming)
        • Create structured output
      • ChatGPT (Completions)
        • Completion object
        • Creation completed
      • ChatGPT(Embeddings)
        • Embedded Object
        • Create embed
    • Anthropic Claude
      • Offical Format
        • Messages (official Anthropic format)
        • Messages(Image Recognition)
        • Messages(function call)
        • Messages(Web search)
      • Create chat completion (streaming)
      • Create chat completion (non-streaming)
      • Create chat image recognition (streaming)
      • Create chat image recognition (non-streaming)
    • Gemini
      • Official Endpoint - v1beta
        • Text generation
          POST
        • Thinking with Gemini 2.5
          POST
        • Structured output
          POST
        • Function calling
          POST
        • Grounding with Google Search
          POST
        • URL context
          POST
        • Gemini TTS
          POST
      • OpenAI Endpoint Chat
        • Gemini Image creation interface (gemini-2.0-flash-exp-image-generation)
        • Chat interface
        • Image recognition interface
        • Function calling - Google Search
        • Function calling - codeExecution
    • Deepseek
      • Deepseek v3.1
  • Image Models
    • GPT-IMAGE-1
      • Generate Image by gpt-image-1
      • Edit Image by gpt-image-1
    • QWEN IMAGE
      • Generate Image by qwen-image
      • Edit Image by qwen-image-edit
    • MJ
      • Submit Imagine task (mj_imagine)
      • Submit Blend task (mj_blend)
      • Submit Describe task (mj_describe)
      • Submit Change task (mj_variation, mj_upscale,mj_reroll)
      • Query task status based on task ID
    • Ideogram
      • Generate with Ideogram 3.0
      • Edit with Ideogram 3.0
      • Remix with Ideogram 3.0
      • Ideogram Upscale
    • Kling Image
      • Submit Image Generation
      • Get Image by Task ID
      • Submit Kolors Virtual Try On
      • Get Kolors Virtual Try On by Task ID
    • Flux
      • Flux on Replicate
        • Submit Image by flux-kontext-pro
        • Submit Image by flux-kontext-max
        • Submit Image by flux-pro
        • Submit Image by flux-pro-1.1-ultra
        • Get Image by ID
    • Recraft API
      • Recraft Image
      • Generate Image
      • Generate Vector Image
      • Remove Background
      • Clarity Upscale
      • Generative Upscale
    • Models use Dall-e Format
      • Google Imagen
      • Bytedance - seedream-3.0
      • Recraftv3 use Dall-e endpoint
      • Flux use Dall-e endpoint
      • Bytedance - Seedream-4.0
    • Google Imagen
      • Google/imagen-4 on Replicate
      • Get Imagen 4 Task
    • Gemini-2.5-flash-image
      • Official Endpoint
        • Image generation with Gemini (aka Nano Banana)
        • Image editing (text-and-image-to-image)
      • OpenAI Chat
        • Create Text to Image
        • Edit Image - Base64 Image -> Image
      • Create Text to Image (Dall-e endpoint)
      • Edit Image (Gpt-image-1 Endpoint)
    • DALL·E 3
      POST
  • Video Models
    • Kling Video
      • Create Video by Text
      • Get Video by Task ID(text2video)
      • Create Video by Image
      • Get Video by Task ID(image2video)
    • Runway ML Video
      • Create Video by Runway
      • Get Video by Task ID
    • Luma Video
      • Create Video by Luma
      • Get Video by Task ID
    • Pika Video
      • Create Video by Pika
      • Get Video by Task ID
    • Google Veo
      • Submit Video Request
      • Submit Video Request with Frames
      • Get Video by ID
    • Minimax - Hailuo
      • Submit Video Request
      • Get Video
    • Seedance
      • Submit Video Request
      • Get Video by Task ID
    • Mj Video
      • Submit Mj Video Request
      • Get Mj Video by task id
  • FAL-AI Models
    • Images Models
      • Ideogram/v3/remix
      • Flux-pro/kontext/max
      • Fal-bytedance-seededit-v3-edit-image
      • Fal-recraft-v3-text-to-image
      • Fal-recraft-v3-image-to-image
      • Fal-recraft-upscale-crisp
    • Audio Models
      • Minimax/speech-02-hd
      • Minimax/speech-02-turbo
      • Elevenlabs/tts/turbo-v2.5
      • Elevenlabs/tts/multilingual-v2
      • Elevenlabs/tts/eleven-v3
    • Video Models
      • Topaz/upscale/video
      • Luma-dream-machine/ray-2-flash/reframe
      • Luma-dream-machine/ray-2/reframe
      • Kling/Lipsync- Audio2Video
      • Kling/Lipsync- Text2Video
    • Get FAL-AI tasks
  • Music Model - Suno
    • Illustrate
    • Parameter
    • Task submission
      • Generate songs (inspiration, customization, continuation)
      • Generate lyrics
    • Query interface
      • Query a single task
  • Python Samples
    • python openai official library (using AutoGPT, langchain, etc.)
    • Python uses speech to text
    • Python uses text to speech
    • Python uses Embeddings
    • python calls DALL·E
    • python simple call openai function-calling demo
    • python langchain
    • python llama_index
    • Python uses gpt-4o to identify pictures-local pictures
    • python library streaming output
    • Python uses gpt-4o to identify images
  • Plug-in/software usage tutorials
    • Setting HTTP for Make.com with Yescale
    • Sample Code for gpt-4o-audio/gpt-4o-mini-audio
  • Help Center
    • HTTP status codes
  • Tutorials
    • GPT-Image-1 API: A Step-by-Step Guide With Examples
    • Claude Code via YEScale API
    • Task Sync Endpoint Usage Guide
  1. Official Endpoint - v1beta

Thinking with Gemini 2.5

POST
/v1beta/models/{model_name}:generateContent

Request

Path Params

Header Params

Body Params application/json

Example
{
    "contents": [
      {
        "parts": [
          {
            "text": "How does AI work?"
          }
        ]
      }
    ],
    "generationConfig": {
      "thinkingConfig": {
        "thinkingBudget": 0
      }
    }
  }

Request Code Samples

Shell
JavaScript
Java
Swift
Go
PHP
Python
HTTP
C
C#
Objective-C
Ruby
OCaml
Dart
R
Request Request Example
Shell
JavaScript
Java
Swift
curl --location --request POST '/v1beta/models/:generateContent' \
--header 'x-goog-api-key;' \
--header 'Content-Type: application/json' \
--data-raw '{
    "contents": [
      {
        "parts": [
          {
            "text": "How does AI work?"
          }
        ]
      }
    ],
    "generationConfig": {
      "thinkingConfig": {
        "thinkingBudget": 0
      }
    }
  }'

Responses

🟢200OK
application/json
Body

Example
{
    "candidates": [
        {
            "content": {
                "parts": [
                    {
                        "text": "That's a fantastic question, and one that many people are curious about! The truth is, \"AI\" is a very broad term, and there are many different types and approaches. However, I can give you a good overview of the core concepts and how they generally work.\n\nAt its heart, **AI works by training computer programs to recognize patterns and make decisions, often by learning from data, much like humans do (but in a very different way).**\n\nHere's a breakdown of the key components and processes:\n\n---\n\n## The Core Components of AI\n\n1.  **Data:** This is the fuel for AI. It can be anything from images, text, numbers, audio, videos, or sensor readings. The more relevant and diverse the data, the better an AI can learn.\n\n2.  **Algorithms:** These are the step-by-step instructions or mathematical formulas that the AI uses to process data, identify patterns, and make decisions. Think of them as the \"brains\" of the AI.\n\n3.  **Computational Power:** AI, especially modern AI, requires a lot of processing power (CPUs and especially GPUs) to handle large datasets and complex algorithms.\n\n---\n\n## The General Process: Learning and Inference\n\nMost AI systems follow a two-phase process:\n\n### Phase 1: Training (Learning)\n\nThis is where the AI \"learns\" from the data.\n\n1.  **Input Data:** You feed the algorithm a vast amount of data. For example:\n    *   **Image Recognition:** Thousands of images labeled \"cat\" or \"dog.\"\n    *   **Language Translation:** Millions of pairs of sentences in two languages.\n    *   **Recommendation System:** User purchase history and ratings.\n\n2.  **Feature Extraction:** The algorithm identifies relevant characteristics (features) within the data. For images, these might be edges, shapes, colors. For text, it could be word frequencies or grammatical structures.\n\n3.  **Pattern Recognition:** The algorithm looks for correlations and patterns between the input features and the desired output. It tries to build a mathematical model that explains these relationships.\n\n4.  **Model Adjustment (Learning):** The AI makes predictions based on its current model. It then compares its predictions to the actual \"correct\" answers (if available, this is called \"supervised learning\"). If there's a discrepancy, the algorithm adjusts its internal parameters (weights and biases) to reduce the error. This process is repeated thousands or millions of times.\n\n    *   **Analogy:** Imagine a child learning to distinguish between apples and oranges. You show them many examples, tell them \"this is an apple,\" \"this is an orange.\" When they make a mistake, you correct them. Over time, they learn the distinguishing features. The AI does something similar, but with complex mathematical adjustments.\n\n### Phase 2: Inference (Prediction/Action)\n\nOnce the AI is trained, it can be put to work.\n\n1.  **New Input:** You provide the trained AI with new, unseen data.\n2.  **Feature Extraction (again):** The AI extracts features from this new data using the same methods it learned during training.\n3.  **Prediction/Decision:** Using its learned model, the AI makes a prediction or takes an action based on the new input.\n    *   **Image Recognition:** Given a new image, it predicts \"cat.\"\n    *   **Language Translation:** Given a new sentence, it translates it.\n    *   **Recommendation System:** Suggests a new product to a user.\n\n---\n\n## Key Paradigms and Techniques\n\nWhile the above is a general framework, different AI fields employ specific techniques:\n\n1.  **Machine Learning (ML):** This is the broadest subset of AI. It focuses on building systems that can learn from data without being explicitly programmed for every possible scenario.\n    *   **Supervised Learning:** Learning from labeled data (e.g., \"this is a picture of a dog\").\n    *   **Unsupervised Learning:** Finding patterns in unlabeled data (e.g., grouping similar news articles).\n    *   **Reinforcement Learning:** Learning through trial and error, receiving rewards or penalties for actions (e.g., an AI learning to play a game).\n\n2.  **Deep Learning (DL):** A subfield of ML that uses **Artificial Neural Networks (ANNs)** with many layers (hence \"deep\"). These networks are inspired by the structure of the human brain.\n    *   **How it works:** Each \"neuron\" in a layer receives input, processes it, and passes it to the next layer. Deeper layers learn more complex and abstract features.\n    *   **Why it's powerful:** Deep learning excels at tasks like image recognition, natural language processing, and speech recognition because it can automatically learn incredibly complex and hierarchical features from raw data, reducing the need for human-engineered feature extraction.\n\n3.  **Natural Language Processing (NLP):** Enables computers to understand, interpret, and generate human language (e.g., chatbots, translation, sentiment analysis). Many modern NLP models use deep learning (like Transformers, which power models like ChatGPT).\n\n4.  **Computer Vision (CV):** Enables computers to \"see\" and interpret visual information from images and videos (e.g., facial recognition, self-driving cars, medical image analysis). Also heavily reliant on deep learning.\n\n5.  **Robotics:** Integrates AI with physical machines to perform tasks in the real world.\n\n---\n\n## Analogy with the Human Brain (and why it's a loose one)\n\nPeople often compare AI to the human brain. While inspiration is drawn, it's crucial to understand the differences:\n\n*   **Human Brain:** Billions of neurons, incredible plasticity, energy-efficient, learns from limited data, understands context, common sense, emotions.\n*   **AI (Neural Networks):** Millions/billions of *parameters* (not actual neurons), requires vast amounts of data, computational power, lacks true understanding or common sense. It's a highly sophisticated pattern-matching machine.\n\n---\n\n## In Summary\n\nAI works by **training algorithms on large datasets to identify patterns and make predictions or decisions.** The more sophisticated the AI (especially deep learning), the more complex the patterns it can learn. It's not magic, but rather highly advanced statistics, mathematics, and computer science, building systems that can perform tasks that traditionally required human intelligence."
                    }
                ],
                "role": "model"
            },
            "finishReason": "STOP",
            "index": 0
        }
    ],
    "usageMetadata": {
        "promptTokenCount": 6,
        "candidatesTokenCount": 1353,
        "totalTokenCount": 1359,
        "promptTokensDetails": [
            {
                "modality": "TEXT",
                "tokenCount": 6
            }
        ]
    },
    "modelVersion": "gemini-2.5-flash",
    "responseId": "GOG6aO3iCMuumtkPl8L8wAQ"
}
Modified at 2025-09-05 13:12:00
Previous
Text generation
Next
Structured output
Built with